AMD Radeon Instinct MI25 Professional Graphics Card

Product status: Official | Last Update: 2017-08-02 | Report Error
Overview
Manufacturer
AMD
Original Series
Instinct
Release Date
June 20th, 2017
PCB Code
109-D05157
Board Model
AMD D051
Graphics Processing Unit
GPU Model
Vega 10
Architecture
GCN 5.0 (Vega)
Fabrication Process
14 nm
Die Size
486 mm2
Transistors Count
12.5B
Transistors Density
25.7M TRAN/mm2
Stream Processors
4096
TMUs
256
Clocks
Boost Clock
1501 MHz
Memory Clock
945 MHz
Effective Memory Clock
1890 Mbps
Memory Configuration
Memory Size
16384 MB
Memory Type
HBM2
Memory Bus Width
2048-bit
Memory Bandwidth
483.8 GB/s

Physical
Interface
PCI-Express 3.0 x16
Height
2-slot
Power Connectors
2× 8-pin
TDP/TBP
300 W
Recommended PSU
600 W
API Support
DirectX
12.0
Vulkan
1.0
OpenGL
4.5
OpenCL
2.0

Performance
Texture Fillrate
384.3 GTexel/s
Peak FP32
12.3 TFLOPS
FP32 Perf. per Watt
41 GFLOPS/W
FP32 Perf. per mm2
25.3 GFLOPS/mm2




 ModelCoresBoost ClockMemory ClockMemory Config.
Thumbnail
AMD Instinct MI300X
 
19456
 
2100 MHz
 
5.2 Gbps
 
192 GB 8192b
Thumbnail
AMD Instinct MI250X
 
14080
 
1700 MHz
 
3.2 Gbps
 
128 GB 8192b
Thumbnail
AMD Instinct MI250
 
13312
 
1700 MHz
 
3.2 Gbps
 
128 GB 8192b
Thumbnail
AMD Instinct MI100
 
7680
 
1504 MHz
 
2.4 Gbps
 
32 GB HB2 4096b
Thumbnail
AMD Instinct MI210
 
6656
 
1700 MHz
 
3.2 Gbps
 
64 GB 4096b
Thumbnail
AMD Radeon Instinct MI60
 
4096
 
1800 MHz
 
2 Gbps
 
32 GB HB2 4096b
Thumbnail
AMD Radeon Instinct MI25
 
4096
 
1501 MHz
 
1.9 Gbps
 
16 GB HB2 2048b
Thumbnail
AMD Vega Cube
 
4096
 
1501 MHz
 
1.9 Gbps
 
16 GB HB2 2048b
Thumbnail
AMD Radeon Instinct MI8
 
4096
 
1000 MHz
 
1 Gbps
 
4 GB HB1 4096b
Thumbnail
AMD Radeon Instinct MI50 32GB
 
3840
 
1725 MHz
 
2 Gbps
 
32 GB HB2 4096b
Thumbnail
AMD Radeon Instinct MI50
 
3840
 
1746 MHz
 
2 Gbps
 
16 GB HB2 4096b
Thumbnail
AMD Radeon Instinct MI6
 
2304
 
1237 MHz
 
7 Gbps
 
16 GB GD5 256b
 ModelCoresBoost ClockMemory ClockMemory Config.
Thumbnail
AMD Radeon Instinct MI25
 
4096
 
1501 MHz
 
1.9 GB/s
 
16 GB HB2 2048b
Thumbnail
AMD Vega Cube
 
4096
 
1501 MHz
 
1.9 GB/s
 
16 GB HB2 2048b
Thumbnail
AMD Radeon Pro SSG (Vega)
 
4096
 
1500 MHz
 
1.9 GB/s
 
16 GB HB2 2048b
Thumbnail
AMD Radeon Pro WX 9100
 
4096
 
1500 MHz
 
1.9 GB/s
 
16 GB HB2 2048b
Thumbnail
AMD Radeon Vega Frontier Edition
 
4096
 
1600 MHz
 
1.9 GB/s
 
16 GB HB2 2048b
Thumbnail
AMD Radeon Vega Frontier Liquid Edition
 
4096
 
1600 MHz
 
1.9 GB/s
 
16 GB HB2 2048b
Thumbnail
AMD Radeon Pro Vega 64 TBC TBC TBC TBC
Thumbnail
AMD Radeon RX Vega 64 Liquid Edition
 
4096
 
1677 MHz
 
1.9 GB/s
 
8 GB HB2 2048b
Thumbnail
AMD Radeon RX Vega 64
 
4096
 
1546 MHz
 
1.9 GB/s
 
8 GB HB2 2048b
Thumbnail
AMD Radeon RX Vega 64 (Vega 10) Engineering Sample
 
4096
 
1546 MHz
 
1.9 GB/s
 
8 GB HB2 2048b
Thumbnail
AMD Radeon RX Vega 64 Limited Edition
 
4096
 
1546 MHz
 
1.9 GB/s
 
8 GB HB2 2048b
Thumbnail
AMD Radeon RX Vega Nano
 
4096
 
1471 MHz
 
1.6 GB/s
 
8 GB HB2 2048b
Thumbnail
AMD Radeon Pro WX 8200
 
3584
 
1500 MHz
 
2 GB/s
 
8 GB HB2 2048b
Thumbnail
AMD Radeon Pro Vega 56 TBC TBC TBC TBC
Thumbnail
AMD Radeon RX Vega 56
 
3584
 
1471 MHz
 
1.6 GB/s
 
8 GB HB2 2048b
Thumbnail
AMD Radeon RX Vega 56 (Vega 10) Engineering Sample
 
3584
-
 
1.9 GB/s
 
8 GB HB2 2048b

PERFORMANCE

Unmatched Half and Single Precision Floating-Point Performance

  • 24.6 TFLOPS FP16 or 12.3 TFLOPS FP32 peak GPU compute performance.

    With 24.6 TFLOPS FP16 or 12.3 TFLOPS FP32 peak GPU compute performance on a single board, the Radeon Instinct MI25 server accelerator provides single precision performance leadership for compute intensive machine intelligence and deep learning training applications. 1 The MI25 provides a powerful solution for the most parallel HPC workloads. The MI25 also provides 768 GFLOPS peak double precision (FP64) at 1/16th rate.

  • 16GB ultra high-bandwidth HBM2 ECC  GPU memory.

    With 2X data-rate improvements over previous generations on a 512-bit memory interface, next generation High Bandwidth Cache and controller, and ECC memory reliability; the Radeon Instinct MI25’s 16GB of HBM2 GPU memory provides a professional-level accelerator solution capable of handling the most demanding data intensive machine intelligence and deep learning training applications. 3

  • Up to 82 GFLOPS/watt FP16 or 41 GFLOPS/watt FP32 peak GPU compute performance.

    With up to 82 GFLOPS/watt FP16 or 41 GFLOPS/watt FP32 peak GPU compute performance, the Radeon Instinct MI25 server accelerator provides unmatched performance per watt for machine intelligence and deep learning training applications in the datacenter where performance and efficient power usage is crucial to ROI. The MI25 also provides 2.5 GFLOPS/watt of FP64 peak performance.

  • 64 Compute Units each with 64 Stream Processors.

    The Radeon Instinct™ MI25 server accelerator has 64 Compute Units, each consisting of 64 stream processors, for a total of 4,096 stream processors and is based on the next generation “Vega” architecture with a newly designed compute engine built on flexible new compute units (nCUs) allowing 16-bit, 32-bit and 64-bit processing at higher frequencies to supercharge today’s emerging dynamic workloads. The Radeon Instinct MI25 provides superior single-precision performance and flexibility for the most demanding compute intensive parallel machine intelligence and deep learning applications in an efficient package.

FEATURES

Built on AMD’s Next-Generation “Vega” Architecture with World’s Most Advanced GPU Memory

  • Passively cooled GPU server accelerator based on next-generation “Vega” architecture using a 14nm FinFET Process. The Radeon Instinct MI25 server accelerator, based on the new “Vega” architecture with a 14nm FinFET process, is a professional-grade accelerator designed for compute density optimized for datacenter server deployments. The MI25 server accelerator is the ideal solution for single-precision compute intensive training applications in machine intelligence and deep learning and other HPC-class workloads, where performance per watt is important.
  • 300W TDP board power, full-height, dual-slot, 10.5” PCIe® Gen 3 x16 GPU server card. The Radeon Instinct MI25 server PCIe® Gen 3 x16 GPU card is a full-height, dual-slot card designed to fit in most standard server designs providing a performance driven server solution for heterogeneous machine intelligence and deep learning training and HPC-class system deployments.
  • Ultra high-bandwidth HBM2 ECC memory with up to 484 GB/s memory bandwidth. The Radeon Instinct MI25 server accelerator is designed with 16GB of the latest high bandwidth HBM2 memory for handling the larger data set requirements of the most demanding machine intelligence and deep learning neural network training systems efficiently. The MI25 accelerator’s 16GB of ECC HBM2 memory also makes it an ideal solution for data intensive HPC-class workloads.
  • MxGPU SR-IOV Hardware Virtualization. The Radeon Instinct MI25 server accelerator is designed with support of AMD’s MxGPU SRIOV hardware virtualization technology to drive greater utilization and capacity in the data center.
  • Updated Remote Manageability Capabilities. The Radeon Instinct MI25 accelerator has advanced out-of-band manageability circuitry for simplified GPU monitoring in large scale systems. The MI25’s manageability capabilities provide accessibility via I2C, regardless of what state the GPU is in, providing advanced monitoring of a range of static and dynamic GPU information using PMCI compliant data structures including board part detail, serial numbers, GPU temperature, power and other information.

USE CASES

Machine Intelligence & Deep Learning Neural Network Training

Training techniques used today on neural networks in machine intelligence and deep learning applications in data centers have become very complex and require the handling of massive amounts of data when training those networks to recognize patterns within that data. This requires lots of floating point computation spread across many cores, and traditional CPUs can’t handle this type of computation as efficiently as GPUs handle it. What can take CPUs weeks to compute, can be handled in days with the use of GPUs. The Radeon Instinct MI25, combined with AMD’s new Epyc server processors and our ROCm open software platform, deliver superior performance for machine intelligence and deep learning applications.

The MI25’s superior 24.6 TFLOPS of native half-precision (FP16) or 12.3 TFLOPS single-precision (FP32) peak floating point performance running across 4,096 stream processors; combined with its advanced High Bandwidth Cache (HBC) and controller and 16GB of high-bandwidth HBM2 memory, brings customers a new level of computing capable to meet today’s demanding system requirements of handling large data efficiently for training these complex neural networks used in deep learning. 1 The MI25 accelerator, based on AMD’s Next-Gen “Vega” architecture with the world’s most advanced memory architecture, is optimized for handling large sets of data and has vast improvements in throughput-per clock over previous generations delivering up to 82 GFLOPS per watt of FP16 or 41 GFLOPS per watt of FP32 peak GPU compute performance for outstanding performance per watt for machine intelligent deep learning training deployments in the data center where performance and efficiency are mandatory.

Benefits for Machine Intelligence & Deep Learning Neural Network Training:

  • Unmatched FP16 and FP32 Floating-Point Performance
  • Open Software ROCm Platform for HPC-Class Rack Scale
  • Optimized MIOpen Deep Learning Framework Libraries
  • Large BAR Support for mGPU peer to peer
  • Configuration advantages with Epyc server processors
  • Superior compute density and performance per node when combining new AMD Epyc™ processor-based servers and Radeon Instinct “Vega” based products
  • MxGPU SR-IOV Hardware Virtualization Driving enabling greater utilization and capacity in data center

HPC Heterogeneous Compute

The HPC industry is creating immense amounts of unstructured data each year and a portion of HPC system configurations are being reshaped to enable the community to extract useful information from that data. Traditionally, these systems were predominantly CPU based, but with the explosive growth in the amount and different types of data being created, along with the evolution of more complex codes, these traditional systems don’t meet all the requirements of today’s data intensive HPC workloads. As these types of codes have become more complex and parallel, there has been a growing use of heterogeneous computing systems with different mixes of accelerators including discrete GPUs and FPGAs. The advancements of GPU capabilities over the last decade have allowed them to be used for a growing number of these parallel codes like the ones being used for training neural networks for deep learning. Scientists and researchers across the globe are now using accelerators to more efficiently process HPC parallel codes across several industries including life sciences, energy, financial, automotive and aerospace, academics, government and defense.

The Radeon Instinct MI25, combined with AMD’s new “Zen”-based Epyc server CPUs and our revolutionary ROCm open software platform provide a progressive approach to open heterogeneous compute from the metal forward. AMD’s next-generation HPC solutions are designed to deliver maximum compute density and performance per node with the efficiency required to handle today’s massively parallel data-intensive codes; as well as, to provide a powerful, flexible solution for general purpose HPC deployments. The ROCm software platform brings a scalable HPC-class solution that provides fully open-source Linux drivers, HCC compilers, tools and libraries to give scientists and researchers system control down to the metal. The Radeon Instinct’s open ecosystem approach supports various architectures including x86, Power8 and ARM, along with industry standard interconnect technologies providing customers with the ability to design optimized HPC systems for a new era of heterogeneous compute that embraces the HPC community’s open approach to scientific advancement.

Key Benefits for HPC Heterogeneous Compute:

  • Outstanding Compute Density and Performance Per Node
  • Open Software ROCm Platform for HPC Class Rack Scale
  • Open Source Linux Drivers, HCC Compiler, Tools and Libraries from the Metal Forward
  • Open Industry Standard Support of Multiple Architectures and Industry Standard Interconnect Technologies

AMD speeds deep learning inference and training with high-performance Radeon Instinct accelerators and MIOpen open-source GPU-accelerated library

SUNNYVALE, CA — (Marketwired) — 12/12/16 — AMD (NASDAQ: AMD) today unveiled its strategy to accelerate the machine intelligence era in server computing through a new suite of hardware and open-source software offerings designed to dramatically increase performance, efficiency, and ease of implementation of deep learning workloads. New Radeon™ Instinct accelerators will offer organizations powerful GPU-based solutions for deep learning inference and training. Along with the new hardware offerings, AMD announced MIOpen, a free, open-source library for GPU accelerators intended to enable high-performance machine intelligence implementations, and new, optimized deep learning frameworks on AMD’s ROCm software to build the foundation of the next evolution of machine intelligence workloads.

Inexpensive high-capacity storage, an abundance of sensor driven data, and the exponential growth of user-generated content are driving exabytes of data globally. Recent advances in machine intelligence algorithms mapped to high-performance GPUs are enabling orders of magnitude acceleration of the processing and understanding of that data, producing insights in near real time. Radeon Instinct is a blueprint for an open software ecosystem for machine intelligence, helping to speed inference insights and algorithm training.

“Radeon Instinct is set to dramatically advance the pace of machine intelligence through an approach built on high-performance GPU accelerators, and free, open-source software in MIOpen and ROCm,” said AMD President and CEO, Dr. Lisa Su. “With the combination of our high-performance compute and graphics capabilities and the strength of our multi-generational roadmap, we are the only company with the GPU and x86 silicon expertise to address the broad needs of the datacenter and help advance the proliferation of machine intelligence.”

At the AMD Technology Summit held last week, customers and partners from 1026 Labs, Inventec, SuperMicro, University of Toronto’s CHIME radio telescope project and Xilinx praised the launch of Radeon Instinct, discussed how they’re making use of AMD’s machine intelligence and deep learning technologies today, and how they can benefit from Radeon Instinct.

Radeon Instinct accelerators feature passive cooling, AMD MultiGPU (MxGPU) hardware virtualization technology conforming with the SR-IOV (Single Root I/O Virtualization) industry standard, and 64-bit PCIe addressing with Large Base Address Register (BAR) support for multi-GPU peer-to-peer support.

Radeon Instinct accelerators are designed to address a wide-range of machine intelligence applications:

  • The Radeon Instinct MI6 accelerator based on the acclaimed Polaris GPU architecture will be a passively cooled inference accelerator optimized for jobs/second/Joule with 5.7 TFLOPS of peak FP16 performance at 150W board power and 16GB of GPU memory
  • The Radeon Instinct MI8 accelerator, harnessing the high-performance, energy-efficient “Fiji” Nano GPU, will be a small form factor HPC and inference accelerator with 8.2 TFLOPS of peak FP16 performance at less than 175W board power and 4GB of High-Bandwidth Memory (HBM)
  • The Radeon Instinct MI25 accelerator will use AMD’s next-generation high-performance Vega GPU architecture and is designed for deep learning training, optimized for time-to-solution

A variety of open source solutions are fueling Radeon Instinct hardware:

  • MIOpen GPU-accelerated library: To help solve high-performance machine intelligence implementations, the free, open-source MIOpen GPU-accelerated library is planned to be available in Q1 2017 to provide GPU-tuned implementations for standard routines such as convolution, pooling, activation functions, normalization and tensor format
  • ROCm deep learning frameworks: The ROCm platform is also now optimized for acceleration of popular deep learning frameworks, including Caffe, Torch 7, and Tensorflow*, allowing programmers to focus on training neural networks rather than low-level performance tuning through ROCm’s rich integrations. ROCm is intended to serve as the foundation of the next evolution of machine intelligence problem sets, with domain-specific compilers for linear algebra and tensors and an open compiler and language runtime

AMD is also investing in developing interconnect technologies that go beyond today’s PCIe Gen3 standards to further performance for tomorrow’s machine intelligence applications. AMD is collaborating on a number of open high-performance I/O standards that support broad ecosystem server CPU architectures including X86, OpenPOWER, and ARM AArch64. AMD is a founding member of CCIX, Gen-Z and OpenCAPI working towards a future 25 Gbit/s phi-enabled accelerator and rack-level interconnects for Radeon Instinct.

Radeon Instinct products are expected to ship in 1H 2017. For more information, visit Radeon.com/Instinct.

 

* Tensorflow support is expected to be available January 2017.