AMD Radeon Instinct MI8 Professional Graphics Card

Product status: Official | Last Update: 2017-06-21 | Report Error
Overview
Manufacturer
AMD
Original Series
Instinct
Release Date
June 20th, 2017
PCB Code
109-C88237
Board Model
AMD C882
Graphics Processing Unit
GPU Model
Fiji
Architecture
GCN 3.0
Fabrication Process
28 nm
Die Size
596 mm2
Transistors Count
8.9B
Transistors Density
14.9M TRAN/mm2
Stream Processors
4096
TMUs
256
ROPs
64
Clocks
Boost Clock
1000 MHz
Memory Clock
500 MHz
Effective Memory Clock
1000 Mbps
Memory Configuration
Memory Size
4096 MB
Memory Type
HBM
Memory Bus Width
4096-bit
Memory Bandwidth
512.0 GB/s

Physical
Interface
PCI-Express 3.0 x16
Height
2-slot
Power Connectors
1× 8-pin
TDP/TBP
175 W
Recommended PSU
500 W
API Support
DirectX
11.2
Vulkan
1.0
OpenGL
4.4
OpenCL
2.0

Performance
Pixel Fillrate
64 GPixels/s
Texture Fillrate
256 GTexel/s
Peak FP32
8.2 TFLOPS
FP32 Perf. per Watt
46.8 GFLOPS/W
FP32 Perf. per mm2
13.7 GFLOPS/mm2




 ModelCoresBoost ClockMemory ClockMemory Config.
Thumbnail
AMD Instinct MI300X
 
19456
 
2100 MHz
 
5.2 Gbps
 
192 GB 8192b
Thumbnail
AMD Instinct MI250X
 
14080
 
1700 MHz
 
3.2 Gbps
 
128 GB 8192b
Thumbnail
AMD Instinct MI250
 
13312
 
1700 MHz
 
3.2 Gbps
 
128 GB 8192b
Thumbnail
AMD Instinct MI100
 
7680
 
1504 MHz
 
2.4 Gbps
 
32 GB HB2 4096b
Thumbnail
AMD Instinct MI210
 
6656
 
1700 MHz
 
3.2 Gbps
 
64 GB 4096b
Thumbnail
AMD Radeon Instinct MI60
 
4096
 
1800 MHz
 
2 Gbps
 
32 GB HB2 4096b
Thumbnail
AMD Radeon Instinct MI25
 
4096
 
1501 MHz
 
1.9 Gbps
 
16 GB HB2 2048b
Thumbnail
AMD Vega Cube
 
4096
 
1501 MHz
 
1.9 Gbps
 
16 GB HB2 2048b
Thumbnail
AMD Radeon Instinct MI8
 
4096
 
1000 MHz
 
1 Gbps
 
4 GB HB1 4096b
Thumbnail
AMD Radeon Instinct MI50 32GB
 
3840
 
1725 MHz
 
2 Gbps
 
32 GB HB2 4096b
Thumbnail
AMD Radeon Instinct MI50
 
3840
 
1746 MHz
 
2 Gbps
 
16 GB HB2 4096b
Thumbnail
AMD Radeon Instinct MI6
 
2304
 
1237 MHz
 
7 Gbps
 
16 GB GD5 256b
 ModelCoresBoost ClockMemory ClockMemory Config.
Thumbnail
AMD Project Quantum
 
16384
-
 
1 GB/s
 
64 GB HB1 4096b
Thumbnail
AMD Radeon Pro Duo
 
8192
 
1000 MHz
 
1 GB/s
 
16 GB HB1 4096b
Thumbnail
AMD FirePro S9300 X2
 
8192
-
 
1 GB/s
 
16 GB HB1 4096b
Thumbnail
AMD Radeon Pro SSG
 
4096
 
1050 MHz
 
1 GB/s
 
4 GB HB1 4096b
Thumbnail
AMD Radeon Instinct MI8
 
4096
 
1000 MHz
 
1 GB/s
 
4 GB HB1 4096b
Thumbnail
AMD Radeon R9 Fury X
 
4096
 
1050 MHz
 
1 GB/s
 
4 GB HB1 4096b
Thumbnail
AMD Radeon R9 Nano
 
4096
-
 
1 GB/s
 
4 GB HB1 4096b
Thumbnail
AMD Radeon R9 Fury
 
3584
-
 
1 GB/s
 
4 GB HB1 4096b

PERFORMANCE

8.2 TFLOPS of Peak Half or Single Precision Performance with 4GB HBM1 1

  • 8.2 TFLOPS peak FP16 | FP32 GPU compute performance.

    With 8.2 TFLOPS peak compute performance on a single board, the Radeon Instinct MI8 server accelerator provides superior single-precision performance per dollar for machine and deep learning inference applications, along with providing a cost-effective solution for HPC development systems. 1

  • 4GB high-bandwidth HBM1 GPU Memory on 512-bit memory interface.

    With 4GB of HBM1 GPU memory and up to 512GB/s of memory bandwidth, the Radeon Instinct MI8 server accelerator provides the perfect combination of single-precision performance and memory system performance to handle the most demanding machine intelligence and deep learning inference applications to abstract meaningful results from new data applied to trained neural networks in a cost-effective, efficient manner.

  • 47 GFLOPS/watt peak FP16|FP32 GPU compute performance.

    With up to 47 GFLOPS/watt peak FP16|FP32 GPU compute performance, the Radeon Instinct MI8 server accelerator provides superior performance per watt for machine intelligence and deep learning inference applications. 2

  • 64 Compute Units (4,096 Stream Processors).

    The Radeon Instinct MI8 server accelerator has 64 Compute Units each containing 64 stream processors, for a total of 4,096 stream processors that are available for running many smaller batches of data simultaneously against a trained neural network to get answers back quickly. Single-precision performance is crucial to these types of system installations, and MI8 accelerator provides superior single-precision performance in a single GPU card.

FEATURES

Passively Cooled Accelerator Using <175 Watts TDP for Scalable Server Deployments

  • Passively cooled server accelerator based on “Fiji” Architecture. The Radeon Instinct MI8 server accelerator, based on the “Fiji” architecture with a 28nm HPX process and is designed for highly-efficient, scalable server deployments for single-precision inference applications in machine intelligence and deep learning. This GPU server accelerator provides customers with great performance while consuming only 175W TDP board power.
  • 175W TDP board power, dual-slot, 6” GPU server card. The Radeon Instinct MI8 server PCIe® Gen 3 x16 GPU card is a full-height, dual-slot card designed to fit in most standard server designs providing a highly-efficient server solution for heterogeneous machine intelligence and deep learning inference system deployments.
  • High Bandwidth Memory (HBM1) with up to 512GB/s memory bandwidth. The Radeon Instinct MI8 server accelerator is designed with 4GB of high bandwidth HBM1 memory allowing numerous batches of data to be quickly handled simultaneously for the most demanding machine intelligence and deep learning inference applications, allowing meaningful results to be quickly abstracted from new data applied to trained neural networks.
  • MxGPU SR-IOV HW Virtualization. The Radeon Instinct MI8 server accelerator is designed with support of AMD’s MxGPU SR-IOV hardware virtualization technology designed to drive greater utilization and capacity in the data center.

USE CASES

Inference for Deep Learning

Today’s exponential data growth and dynamic nature of that data has reshaped the requirements of data center system configurations. Data center designers need to build systems capable of running workloads more complex and parallel in nature, while continuing to improve system efficiencies. Improvements in the capabilities of discrete GPUs and other accelerators over the last decade are providing data center designers with new options to build heterogeneous computing systems that help them meet these new challenges.

 

Datacenter deployments running inference applications, where lots of new smaller data set inputs are being run at half precision (FP16) or single precision (FP32) against trained neural networks to discover new knowledge, require parallel compute capable systems that can quickly run data inputs across lots of smaller cores in a power-efficient manner.

 

The Radeon Instinct™ MI8 accelerator is an efficient, cost-sensitive solution for machine intelligent and deep learning inference deployments in the datacenter delivering 8.2 TFLOPS of peak half or single precision (FP16|FP32) floating point performance in a single 175 watt TDP card. 1 The Radeon Instinct™ MI8 accelerator, based on AMD’s “Fiji” architecture with 4GB high-bandwidth HBM1 memory and up to 512 GB/s bandwidth, combined with the Radeon Instinct’s open ecosystem approach with the ROCm platform, provides data center designers with a highly-efficient, flexible solution for inference deployments.

Key Benefits for Inference:

  • 8.2 TFLOPS peak half or single precision compute performance 1
  • 47 GFLOPS/watt peak half or single precision compute performance 2
  • 4GB HBM1 on 512-bit memory interface provides high bandwidth memory performance
  • Passively cooled accelerator using under 175 watts TDP for scalable server deployments
  • ROCm software platform provides open source Hyperscale platform
  • Open source Linux drivers, HCC compiler, tools and libraries for full control from the metal forward
  • Optimized MIOpen Deep Learning framework libraries 3
  • Large BAR Support for mGPU peer to peer
  • MxGPU SR-IOV hardware virtualization for optimized system utilizations
  • Open industry standard support of multiple architectures and open standard interconnect technologies 4

 

Heterogeneous Compute for HPC General Purpose and Development

The HPC industry is creating immense amounts of unstructured data each year and a portion of HPC system configurations are being reshaped to enable the community to extract useful information from that data. Traditionally, these systems were predominantly CPU based, but with the explosive growth in the amount and different types of data being created, along with the evolution of more complex codes, these traditional systems don’t meet all the requirements of today’s data intensive HPC workloads. As these types of codes have become more complex and parallel, there has been a growing use of heterogeneous computing systems with different mixes of accelerators including discrete GPUs and FPGAs. The advancements of GPU capabilities over the last decade has allowed them to be used for a growing number of these mixed precision parallel codes like the ones being used for training neural networks for deep learning. Scientists and researchers across the globe are now using accelerators to more efficiently process HPC parallel codes across several industries including life sciences, energy, financial, automotive and aerospace, academics, government and defense.

 

The Radeon Instinct™ MI8 accelerator, combined with AMD’s revolutionary ROCm open software platform, is an efficient entry-level heterogeneous computing solution delivering 8.2 TFLOPS peak single precision compute performance in an efficient GPU card with 4GB of high-bandwidth HBM1 memory. 1 The MI8 accelerator is the perfect open solution for cost-effective general purpose and development systems being deployed in the Financial Services, Energy, Life Science, Automotive and Aerospace, Academic (Research & Teaching), Government Labs and other HPC industries.