24.6 TFLOPS FP16 or 12.3 TFLOPS FP32 peak GPU compute performance.
With 24.6 TFLOPS FP16 or 12.3 TFLOPS FP32 peak GPU compute performance on a single board, the Radeon Instinct MI25 server accelerator provides single precision performance leadership for compute intensive machine intelligence and deep learning training applications. 1 The MI25 provides a powerful solution for the most parallel HPC workloads. The MI25 also provides 768 GFLOPS peak double precision (FP64) at 1/16th rate.
16GB ultra high-bandwidth HBM2 ECC GPU memory.
With 2X data-rate improvements over previous generations on a 512-bit memory interface, next generation High Bandwidth Cache and controller, and ECC memory reliability; the Radeon Instinct MI25’s 16GB of HBM2 GPU memory provides a professional-level accelerator solution capable of handling the most demanding data intensive machine intelligence and deep learning training applications. 3
PERFORMANCE
Unmatched Half and Single Precision Floating-Point Performance
Up to 82 GFLOPS/watt FP16 or 41 GFLOPS/watt FP32 peak GPU compute performance.
With up to 82 GFLOPS/watt FP16 or 41 GFLOPS/watt FP32 peak GPU compute performance, the Radeon Instinct MI25 server accelerator provides unmatched performance per watt for machine intelligence and deep learning training applications in the datacenter where performance and efficient power usage is crucial to ROI. The MI25 also provides 2.5 GFLOPS/watt of FP64 peak performance.
64 Compute Units each with 64 Stream Processors.
The Radeon Instinct™ MI25 server accelerator has 64 Compute Units, each consisting of 64 stream processors, for a total of 4,096 stream processors and is based on the next generation “Vega” architecture with a newly designed compute engine built on flexible new compute units (nCUs) allowing 16-bit, 32-bit and 64-bit processing at higher frequencies to supercharge today’s emerging dynamic workloads. The Radeon Instinct MI25 provides superior single-precision performance and flexibility for the most demanding compute intensive parallel machine intelligence and deep learning applications in an efficient package.
FEATURES
Built on AMD’s Next-Generation “Vega” Architecture with World’s Most Advanced GPU Memory
- Passively cooled GPU server accelerator based on next-generation “Vega” architecture using a 14nm FinFET Process. The Radeon Instinct MI25 server accelerator, based on the new “Vega” architecture with a 14nm FinFET process, is a professional-grade accelerator designed for compute density optimized for datacenter server deployments. The MI25 server accelerator is the ideal solution for single-precision compute intensive training applications in machine intelligence and deep learning and other HPC-class workloads, where performance per watt is important.
- 300W TDP board power, full-height, dual-slot, 10.5” PCIe® Gen 3 x16 GPU server card. The Radeon Instinct MI25 server PCIe® Gen 3 x16 GPU card is a full-height, dual-slot card designed to fit in most standard server designs providing a performance driven server solution for heterogeneous machine intelligence and deep learning training and HPC-class system deployments.
- Ultra high-bandwidth HBM2 ECC memory with up to 484 GB/s memory bandwidth. The Radeon Instinct MI25 server accelerator is designed with 16GB of the latest high bandwidth HBM2 memory for handling the larger data set requirements of the most demanding machine intelligence and deep learning neural network training systems efficiently. The MI25 accelerator’s 16GB of ECC HBM2 memory also makes it an ideal solution for data intensive HPC-class workloads.
- MxGPU SR-IOV Hardware Virtualization. The Radeon Instinct MI25 server accelerator is designed with support of AMD’s MxGPU SRIOV hardware virtualization technology to drive greater utilization and capacity in the data center.
- Updated Remote Manageability Capabilities. The Radeon Instinct MI25 accelerator has advanced out-of-band manageability circuitry for simplified GPU monitoring in large scale systems. The MI25’s manageability capabilities provide accessibility via I2C, regardless of what state the GPU is in, providing advanced monitoring of a range of static and dynamic GPU information using PMCI compliant data structures including board part detail, serial numbers, GPU temperature, power and other information.
USE CASES
Machine Intelligence & Deep Learning Neural Network Training
Training techniques used today on neural networks in machine intelligence and deep learning applications in data centers have become very complex and require the handling of massive amounts of data when training those networks to recognize patterns within that data. This requires lots of floating point computation spread across many cores, and traditional CPUs can’t handle this type of computation as efficiently as GPUs handle it. What can take CPUs weeks to compute, can be handled in days with the use of GPUs. The Radeon Instinct MI25, combined with AMD’s new Epyc server processors and our ROCm open software platform, deliver superior performance for machine intelligence and deep learning applications.
The MI25’s superior 24.6 TFLOPS of native half-precision (FP16) or 12.3 TFLOPS single-precision (FP32) peak floating point performance running across 4,096 stream processors; combined with its advanced High Bandwidth Cache (HBC) and controller and 16GB of high-bandwidth HBM2 memory, brings customers a new level of computing capable to meet today’s demanding system requirements of handling large data efficiently for training these complex neural networks used in deep learning. 1 The MI25 accelerator, based on AMD’s Next-Gen “Vega” architecture with the world’s most advanced memory architecture, is optimized for handling large sets of data and has vast improvements in throughput-per clock over previous generations delivering up to 82 GFLOPS per watt of FP16 or 41 GFLOPS per watt of FP32 peak GPU compute performance for outstanding performance per watt for machine intelligent deep learning training deployments in the data center where performance and efficiency are mandatory.
Benefits for Machine Intelligence & Deep Learning Neural Network Training:
- Unmatched FP16 and FP32 Floating-Point Performance
- Open Software ROCm Platform for HPC-Class Rack Scale
- Optimized MIOpen Deep Learning Framework Libraries
- Large BAR Support for mGPU peer to peer
- Configuration advantages with Epyc server processors
- Superior compute density and performance per node when combining new AMD Epyc™ processor-based servers and Radeon Instinct “Vega” based products
- MxGPU SR-IOV Hardware Virtualization Driving enabling greater utilization and capacity in data center
HPC Heterogeneous Compute
The HPC industry is creating immense amounts of unstructured data each year and a portion of HPC system configurations are being reshaped to enable the community to extract useful information from that data. Traditionally, these systems were predominantly CPU based, but with the explosive growth in the amount and different types of data being created, along with the evolution of more complex codes, these traditional systems don’t meet all the requirements of today’s data intensive HPC workloads. As these types of codes have become more complex and parallel, there has been a growing use of heterogeneous computing systems with different mixes of accelerators including discrete GPUs and FPGAs. The advancements of GPU capabilities over the last decade have allowed them to be used for a growing number of these parallel codes like the ones being used for training neural networks for deep learning. Scientists and researchers across the globe are now using accelerators to more efficiently process HPC parallel codes across several industries including life sciences, energy, financial, automotive and aerospace, academics, government and defense.
The Radeon Instinct MI25, combined with AMD’s new “Zen”-based Epyc server CPUs and our revolutionary ROCm open software platform provide a progressive approach to open heterogeneous compute from the metal forward. AMD’s next-generation HPC solutions are designed to deliver maximum compute density and performance per node with the efficiency required to handle today’s massively parallel data-intensive codes; as well as, to provide a powerful, flexible solution for general purpose HPC deployments. The ROCm software platform brings a scalable HPC-class solution that provides fully open-source Linux drivers, HCC compilers, tools and libraries to give scientists and researchers system control down to the metal. The Radeon Instinct’s open ecosystem approach supports various architectures including x86, Power8 and ARM, along with industry standard interconnect technologies providing customers with the ability to design optimized HPC systems for a new era of heterogeneous compute that embraces the HPC community’s open approach to scientific advancement.
Key Benefits for HPC Heterogeneous Compute:
- Outstanding Compute Density and Performance Per Node
- Open Software ROCm Platform for HPC Class Rack Scale
- Open Source Linux Drivers, HCC Compiler, Tools and Libraries from the Metal Forward
- Open Industry Standard Support of Multiple Architectures and Industry Standard Interconnect Technologies