NVIDIA Tesla P4 Professional Graphics Card

Product status: Official | Last Update: 2016-12-09 | Report Error
Overview
Manufacturer
NVIDIA
Original Series
Tesla Pascal
Release Date
September 13th, 2016
Graphics Processing Unit
GPU Model
GP104
Architecture
Pascal
Fabrication Process
16 nm
Die Size
314 mm2
Transistors Count
7.2B
Transistors Density
22.9M TRAN/mm2
CUDAs
2560
TMUs
128
ROPs
64
Clocks
Base Clock
1000 MHz
Boost Clock
1075 MHz
Memory Clock
1500 MHz
Effective Memory Clock
6000 Mbps
Memory Configuration
Memory Size
8192 MB
Memory Type
GDDR5
Memory Bus Width
256-bit
Memory Bandwidth
192.0 GB/s

Physical
Interface
PCI-Express 3.0 x16
Height
1-slot
TDP/TBP
50 W
Recommended PSU
300 W
API Support
DirectX
12.0
Vulkan
1.0
OpenGL
4.5
OpenCL
3.0

Performance
Pixel Fillrate
68.8 GPixels/s
Texture Fillrate
137.6 GTexel/s
Peak FP32
5.5 TFLOPS
FP32 Perf. per Watt
110.1 GFLOPS/W
FP32 Perf. per mm2
17.5 GFLOPS/mm2




 ModelCoresBoost ClockMemory ClockMemory Config.
Thumbnail
NVIDIA DGX-1 (Pascal)
 
28672
 
1480 MHz
 
1.4 Gbps
 
1024 GB HB2 4096b
Thumbnail
NVIDIA Tesla P40
 
3840
 
1531 MHz
 
7.2 Gbps
 
24 GB GD5 384b
Thumbnail
NVIDIA Tesla P100 SMX2
 
3584
 
1480 MHz
 
1.4 Gbps
 
16 GB HB2 4096b
Thumbnail
NVIDIA Tesla P100 PCIe
 
3584
 
1328 MHz
 
1.4 Gbps
 
16 GB HB2 4096b
Thumbnail
NVIDIA Tesla P4
 
2560
 
1075 MHz
 
6 Gbps
 
8 GB GD5 256b
 ModelCoresBoost ClockMemory ClockMemory Config.
Thumbnail
NVIDIA P102-100
 
3200
 
1683 MHz
 
8 MB/s
 
5 GB G5X 320b
Thumbnail
NVIDIA Quadro P5000
 
2560
 
1733 MHz
 
4.5 GB/s
 
16 GB GD5 256b
Thumbnail
NVIDIA GeForce GTX 1080
 
2560
 
1733 MHz
 
10 GB/s
 
8 GB G5X 256b
Thumbnail
NVIDIA GeForce GTX 1080 11Gbps
 
2560
 
1733 MHz
 
11 GB/s
 
8 GB G5X 256b
Thumbnail
NVIDIA GeForce GTX 1080 Mobile
 
2560
 
1733 MHz
 
10 GB/s
 
8 GB G5X 256b
Thumbnail
NVIDIA GeForce GTX 1080 Mobile Max-Q
 
2560
 
1468 MHz
 
10 GB/s
 
8 GB G5X 256b
Thumbnail
NVIDIA Tesla P4
 
2560
 
1075 MHz
 
6 GB/s
 
8 GB GD5 256b
Thumbnail
NVIDIA GeForce GTX 1070 Ti
 
2432
 
1683 MHz
 
8 GB/s
 
8 GB GD5 256b
Thumbnail
NVIDIA Quadro P5000 Mobile
 
2048
-
 
6 GB/s
 
16 GB GD5 256b
Thumbnail
NVIDIA GeForce GTX 1070 Mobile
 
2048
 
1645 MHz
 
8 GB/s
 
8 GB GD5 256b
Thumbnail
NVIDIA GeForce GTX 1070 Mobile Max-Q
 
2048
 
1379 MHz
 
8 GB/s
 
8 GB GD5 256b
Thumbnail
NVIDIA GeForce GTX 1070
 
1920
 
1683 MHz
 
8 GB/s
 
8 GB GD5 256b
Thumbnail
NVIDIA GeForce GTX 1070 8GB GDDR5X
 
1920
 
1683 MHz
 
16 GB/s
 
8 GB G5X 256b
Thumbnail
NVIDIA P104-100
 
1920
 
1733 MHz
 
10 GB/s
 
4 GB G5X 256b
Thumbnail
NVIDIA Quadro P4000 Mobile
 
1792
-
 
6 GB/s
 
8 GB GD5 256b
Thumbnail
NVIDIA Quadro P4000
 
1792
 
1480 MHz
 
7.6 GB/s
 
8 GB GD5 256b
Thumbnail
NVIDIA GeForce GTX 1060 6GB GDDR5X
 
1280
 
1708 MHz
 
8 GB/s
 
6 GB G5X 192b
Thumbnail
NVIDIA Quadro P3000 Mobile
 
1280
-
 
7 GB/s
 
6 GB GD5 192b
Thumbnail
NVIDIA GeForce GTX 1060 3GB (GP104) TBC TBC TBC TBC

Tesla P4, P40 Accelerators Deliver 45x Faster AI; TensorRT and DeepStream Software Boost AI for Video Inferencing

GTC China – NVIDIA today unveiled the latest additions to its Pascal™ architecture-based deep learning platform, with new NVIDIA® Tesla® P4 and P40 GPU accelerators and new software that deliver massive leaps in efficiency and speed to accelerate inferencing production workloads for artificial intelligence services.

Modern AI services such as voice-activated assistance, email spam filters, and movie and product recommendation engines are rapidly growing in complexity, requiring up to 10x more compute compared to neural networks from a year ago. Current CPU-based technology isn’t capable of delivering real-time responsiveness required for modern AI services, leading to a poor user experience.

The Tesla P4 and P40 are specifically designed for inferencing, which uses trained deep neural networks to recognize speech, images or text in response to queries from users and devices. Based on the Pascal architecture, these GPUs feature specialized inference instructions based on 8-bit (INT8) operations, delivering 45x faster response than CPUs1and a 4x improvement over GPU solutions launched less than a year ago.2

The Tesla P4 delivers the highest energy efficiency for data centers. It fits in any server with its small form-factor and low-power design, which starts at 50 watts, helping make it 40x more energy efficient than CPUs for inferencing in production workloads.3 A single server with a single Tesla P4 replaces 13 CPU-only servers for video inferencing workloads,4 delivering over 8x savings in total cost of ownership, including server and power costs.

The Tesla P40 delivers maximum throughput for deep learning workloads. With 47 tera-operations per second (TOPS) of inference performance with INT8 instructions, a server with eight Tesla P40 accelerators can replace the performance of more than 140 CPU servers.5 At approximately $5,000 per CPU server, this results in savings of more than $650,000 in server acquisition cost.

“With the Tesla P100 and now Tesla P4 and P40, NVIDIA offers the only end-to-end deep learning platform for the data center, unlocking the enormous power of AI for a broad range of industries,” said Ian Buck, general manager of accelerated computing at NVIDIA. “They slash training time from days to hours. They enable insight to be extracted instantly. And they produce real-time responses for consumers from AI-powered services.”

 

Software Tools for Faster Inferencing
Complementing the Tesla P4 and P40 are two software innovations to accelerate AI inferencing: NVIDIA TensorRT and the NVIDIA DeepStream SDK.

TensorRT is a library created for optimizing deep learning models for production deployment that delivers instant responsiveness for the most complex networks. It maximizes throughput and efficiency of deep learning applications by taking trained neural nets — defined with 32-bit or 16-bit operations — and optimizing them for reduced precision INT8 operations.

NVIDIA DeepStream SDK taps into the power of a Pascal server to simultaneously decode and analyze up to 93 HD video streams in real time compared with seven streams with dual CPUs.6 This addresses one of the grand challenges of AI: understanding video content at-scale for applications such as self-driving cars, interactive robots, filtering and ad placement. Integrating deep learning into video applications allows companies to offer smart, innovative video services that were previously impossible to deliver.

Leap Forward for Customers
NVIDIA customers are delivering increasingly more innovative AI services that require the highest compute performance.

“Delivering simple and responsive experiences to each of our users is very important to us,” said Greg Diamos, senior researcher at Baidu. “We have deployed NVIDIA GPUs in production to provide AI-powered services such as our Deep Speech 2 system and the use of GPUs enables a level of responsiveness that would not be possible on un-accelerated servers. Pascal with its INT8 capabilities will provide an even bigger leap forward and we look forward to delivering even better experiences to our users.”

Specifications
Specifications of the Tesla P4 and P40 GPUs include:

SpecificationTesla P4Tesla P40
Single Precision FLOPS*5.512
INT8 TOPS* (Tera-Operations Per Second)2247
CUDA Cores2,5603,840
GPU GDDR5 Memory8GB24GB
Memory Bandwidth192GB/s346GB/s
Power50 Watt (or higher)250 Watt

* With boost clock on

Availability
The NVIDIA Tesla P4 and P40 are planned to be available in November and October, respectively, in qualified servers offered by ODM, OEM and channel partners.