Volta GPU Architecture
Based on state-of-the-art 12nm FFN (FinFET NVIDIA) high-performance manufacturing process customized for NVIDIA to incorporate 5120 CUDA cores, the Quadro GV100 GPU is the most powerful computing platform for HPC, AI, VR and graphics workloads on professional desktops. It includes 21.1 billion transistors on die size of 815 mm2. Able to deliver more than 7.4 TFLOPS of double precision (FP64), 14.8 TFLOPS of single-precision (FP32), 29.6 TFLOPS of half-precision (FP16), 59.3 TOPS of integer-precision (INT8), and 118.5 TFLOPs of tensor operation capability, it supports a wide range of compute-intensive workloads flawlessly.
Tensor Cores
New mixed-precision cores purpose-built for deep learning matrix arithmetic, delivering 8x TFLOPS for training, compared to previous generation. Quadro GV100 utilizes 640 Tensor Cores; each Tensor Core performs 64 floating point fused multiply-add (FMA) operations per clock, and each SM performs a total of 1024 individual floating point operations per clock.
High Speed HBM2 Memory
Built with Volta’s vastly optimized 32GB HBM2 memory subsystem for the industry’s fastest graphics memory (870 GB/s peak bandwidth), Quadro GV100 is the ideal platform for latency-sensitive applications handling large datasets. Quadro GV100 offers 2x memory capacity and delivers 20% more memory bandwidth compared to previous generation. HBM2 also provides native support for Error Correcting Code (ECC) without capacity or performance penalties.
Mixed-Precision Computing
Double the throughput and reduce storage requirements with 16-bit floating point precision computing to enable the training and deployment of larger neural networks. With independent parallel integer and floating point data paths, the Volta SM (Streaming Multiprocesssor) is also much more efficient on workloads with a mix of computation and addressing calculations.