As we had mentioned, the Intel Gaudi 3 chips would arrive throughout 2024 and it was today that the semiconductor giant revealed all its details and specifications.
Intel Gaudi 3 It represents the company's third generation of accelerators for Artificial Intelligence, and arrives at the height of the effervescence of general intelligence for all types of applications. Therefore, it has been specifically designed to offer more performance and greater efficiency in those specific uses.
Each Gaudi 3 chip includes 64 Tensor Processor Cores fifth generation with 8 Matrix Math Engines (Mathematical Matrix Engines). With them, 96MB SRAM with 12.8 TB/s of bandwidth are complemented by nothing more and nothing less than 128GB HBM2e c memorywith 3.7 TB/s of bandwidth.
Each chip includes its own network system to be able to interconnect using 24x 200GbE and PCI Express 5 x16.
All this makes Gaudi 3 offers performance data much higher than the Intel Gaudi 2 currently available. We talk about even 4 times more computing performance BF16 for AI, 2 times more performance in FP8 with its 1,835 TFLOPs, twice the network bandwidth and 1.5 times more memory speed.
Gaudi 3 will also compete with the NVIDIA H100 and H200 solutions. Specifically, against the H100 Intel promises that its systems with Gaudi 3 will be 1.7 times faster to train while promising to be 2.3 times more efficient than the NVIDIA alternative.
The inference process, according to Intel data, is expected to be 1.5 times faster on a Gaudi 3 than on an NVIDIA H100.
Naturally, being chips intended for business use in servers, Intel Gaudi 3 is specially optimized for deployment in systems with multiple accelerators, from 1-node systems with 8 Gaudi 3 accelerators with 14.7 PETAFLOPs and 1 TB of memory, up to clusters of 1024 notes that reach dizzying figures of the order of 15 EXAFLOPS, with 1 PETABYTE of HBM2e memory and a total input/output network bandwidth of 1,229 PB/S.
Depending on the type of system, the Intel Gaudi 3 will be available in three different formats: On the one hand, we will have the HL-325L standalone accelerator card with a Gaudi 3 chip.
This accelerator card offers all the power of a Gaudi 3 chip along with 128 GB of integrated HBM2e.
On the other hand we have the Universal Baseboard HLB-325 which forms a node of 8 Gaudi 3 accelerator cards. Combined, they achieve 14.6 PFLOPS of power, with more than 1 TB of HBM2e memory, 64 matrix multiplication engines, 29.6 TB/s of memory bandwidth and 192 200 GbE connections for 9.6 TB/s of bidirectional network bandwidth.
Finally, for independent systems that have connectivity PCI Express x16Intel will offer the card PCIe CEM HL-338 with a Gaudi 3 chip. It offers the same specifications as the loose accelerator card, but is integrated into a 10.5-inch card form factor with a 600W TDP.
Intel will offer a complete software suite for the development of new applications and tasks, including different tools, frameworks, drivers, libraries, etc. As a curiosity, the company has highlighted that You can go from a tool developed for Gaudi 2 to one for Gaudi 3 simply by changing three lines of code.
During the first half of the year, Intel will offer samples of Gaudi 3 air- and liquid-cooled accelerator cards to businesses and customers, with mass production beginning in the second half of the year.
The first customers will be Dell, HPe, Lenovo and SuperMicroin addition to all those with access to the Intel Developer Cloud who can now access the power of Gaudi 3 through Intel's cloud computing platform starting today.
We leave you with some performance tests that the Intel Gaudi 3 will have according to Intel's projections.
End of Article. Tell us something in the Comments!