
Table of Contents
In an unanticipated transfer, AMD this week published thorough performance numbers of its Intuition MI250 accelerator in comparison to Nvidia’s A100 compute GPU. AMD’s card predictably outperformed Nvidia’s board in all circumstances by two or a few instances. But though it is not uncommon for hardware providers to show their pros, specific general performance figures compared to level of competition are almost never posted on formal web-sites. When they do it, it normally means a single matter: really large self-confidence in its products.
Up to 3 Moments More Effectiveness
Due to the fact AMD’s Instinct MI200 is aimed generally at HPC and AI workloads (and certainly AMD personalized its CDNA 2 additional for HPC and supercomputers somewhat than for AI), AMD tested the competing accelerators in several HPC purposes and benchmarks dealing with algebra, physics, cosmology, molecular dynamics, and particle interaction.
There are a number of physics and molecular dynamics HPC purposes that are utilised broadly and have industry-regarded exams, such as LAMMPS and OpenMM. These might be regarded as as true-entire world workloads and right here AMD’s MI250X can outperform Nvidia’s A100 by 1.4 – 2.4 moments.
There are also a lot of HPC benchmarks that can mimic authentic-world algebraic, cosmology, and particle interaction workloads. In these scenarios, AMD’s top rated-of-the-vary compute accelerator is 1.9 – 3.05 instances a lot quicker than Nvidia’s flagship accelerator.
Preserving in mind that AMD’s MI250X has significantly a lot more ALUs working at substantial clocks than Nvidia’s A100, it is not stunning that the new card substantially outperforms its rival. Meanwhile, it is noteworthy that AMD did not operate any AI benchmarks.
New Architecture, Additional ALUs
AMD’s Intuition MI200 accelerators are run by the firm’s most current CDNA 2 architecture that is optimized for superior-general performance computing (HPC) and will power the future Frontier supercomputer that claims to produce about 1.5 FP64 TFLOPS of sustained overall performance The MI200-series OAM boards use AMD’s Aldebaran compute GPU that consists of two graphics compute dies (GCDs) that every pack 29.1 billion of transistors, which is somewhat a lot more when compared to 26.8 billion transistors inside of the Navi 21. The GCDs are created utilizing TSMC’s N6 fabrication procedure that enabled AMD to pack a little bit additional transistors and simplify production course of action by applying serious ultraviolet lithography on additional layers.
AMD’s flagship Intuition MI250X accelerator capabilities 14,080 stream processors (220 compute units) and is outfitted with 128GB of HBM2E memory. The MI250X compute GPU is rated for 95.7 FP32/FP64 TFLOPS functionality (very same performance for matrix functions) as very well as 383 BF16/INT8/INT4 TFLOPS/TOPS functionality.
By distinction, Nvidia’s A100 GPU consists of 54.2 billion transistors, has 6,912 energetic CUDA cores, and is paired with 80GB of HBM2E memory. Overall performance smart, the accelerator presents 19.5 FP32 TFLOPS, 9.7 FP64 TFLOPS, 19.5 FP64 Tensor TFLOPS, 312 FP16/BF16 TFLOPS, and up to 624 INT8 TOPS (or 1248 TOPS with sparsity).
Even on paper, AMD’s Intuition MI200-sequence presents more performance in regular HPC and matrix workloads, but Nvidia has an edge in AI scenarios. These peak performance figures can be spelled out with a noticeably greater ALU rely in scenario of AMD’s MI200-sequence
To display how fantastic its flagship compute accelerator Instinct MI250X 128GB HBM2E is, AMD utilised 1P or 2P 64-main AMD EPYC 7742-based mostly units outfitted with 1 or 4 AMD Intuition MI250X 128GB HBM2E compute GPU or a single or 4 Nvidia A100 80GB HBM2E. The company utilised AMD-optimized and CUDA-optimized software package.
Summary
For now, AMD’s Instinct MI250X is the world’s highest-executing HPC accelerator, according to its personal info. Looking at the simple fact that the Aldebaran has a whopping 14,080 ALUs and is rated for 95.7 FP32/FP64 TFLOPS efficiency, it is indeed the swiftest compute GPU all over.
Meanwhile, AMD launched its Intuition MI250X about 1.5 a long time soon after Nvidia’s A100 and quite a few months before Intel’s Ponte Vecchio. It is all-natural for a 2021 compute accelerator to outperform its rival released above a calendar year in the past, but what we are curious about is how this GPU will stack versus Intel’s supercomputer-bound compute Ponte Vecchio GPU.