Science and Tech

What is NVIDIA Eos and why could it become the world's fastest AI supercomputer?

Eos NVIDIA

Today is king and seeing how technology in all its fields advances every year at exponential levels, it is necessary for large companies to be at the forefront of bringing all these improvements to the world.

This is how NVIDIA Eos emerges, the supercomputer created by NVIDIA that has attracted the attention of the entire technological community for its crazy capabilities thanks to impressive performance and a cutting-edge architecture that challenges the limits of what is possible in the field of computing. high performance.

Classified as a real beast, it was presented for the first time at the Supercomputing 2023 fair and currently occupies ninth place on the TOP500 list of the fastest supercomputers in the world—just below the Spanish supercomputer Marenostrum 5—.

But what exactly is NVIDIA Eos and what makes it so extraordinary? It is time to learn more about what Nvidia has in store for everyone and for the world of artificial intelligence in particular.

What is NVIDIA Eos?

NVIDIA Eos is presented as a supercomputer created by NVIDIA, a leading technology company, which has brought to light this powerful machine designed to perform extremely complex artificial intelligence calculations at an incredibly fast speed. In the end, if you want to do great things in this field, you have to fully contribute to this type of systems.

The company itself describes Eos as a system that can power an “AI factory”, as it is a large-scale DGX H100 SuperPod system. In addition, he comments that it is what allows him to develop his own advances in AI and shows the power of Nvidia's latest technology.

“Eos will deliver an incredible 18 exaflops of AI performance, and we expect it to be the world's fastest AI supercomputer when deployed,” says Paresh Kharya, senior director of product management and marketing at Nvidia. In simple terms, Eos can perform approximately 18.4 billion calculations per second when it comes to tasks related to artificial intelligence.

Of course, leaving aside its components, which will now be talked about, a relevant fact about this is its modular design and it is designed so that any company can build its own AI supercomputers adjusted to its needs.

Top500

Piece by piece, this is this NVIDIA supercomputer

designed for artificial intelligence and high-performance computing tasks, it is equipped with 576 DGX H100 systems, each with eight Nvidia H100 GPUs, for a total of 4,608 GPUs. This allows Eos to achieve performance of 121.4 petaflops at double precision (FP64) and 18.4 exaflops at mixed precision (FP8) for high-performance computing and AI, respectively.

A short stop on exaflops and petaflops

Qualify this that exaflops and petaflops. Eos is rated at 18.4 exaflops in AI applications, meaning it can perform a huge amount of mathematical calculations in a second. On the other hand, when it is mentioned that it has 121 petaflops, we are talking about another performance measure. A petaflop is a measure of the speed at which a computer can perform calculations.

It's like the speed of a bicycle: if it goes at one petaflop, it means it can do a thousand trillion calculations per second. So when he says that Eos has 121 petaflops, he means that he can do about 121,000 trillion calculations per second.

Sowhy the difference between 18.4 exaflops and 121 petaflops? It is because exaflops are used specifically to measure performance in artificial intelligence, while petaflops are a more general measure of the speed of a computer and its tasks, hence it remains in ninth place and below the Marenostrum 5, which It has a maximum total computing performance of 314 petaflops.

NVIDIA DGX SuperPod

NVIDIA

Leaving this aside and as for the design of Eos, it is based on the DGX SuperPOD architecture and is optimized for AI workloads and scalability. It uses NVIDIA's Mellanox Quantum-2 InfiniBand, delivering data transfer speeds of up to 400 Gb/s, essential for seamlessly training and scaling large AI models.

In addition to powerful hardware, Eos comes with a complete set of software designed specifically for AI development and implementation. This includes cluster development, orchestration and management software, accelerated storage and network libraries, as well as an operating system optimized for AI workloads.

As for the price, it is clearly unknown, but if each NVIDIA H100 costs between 30,000 and 40,000 dollars – remember that there are 4,608 in total – and if everything is added to the rest of the components, There is talk of about 200 million dollars if it were sold to the public..

Is it okay and what will this monster of a supercomputer be used for?

Eos will be leveraged by Nvidia's internal software engineering and AI development teams for its products, including autonomous vehicles and conversational AI software. It will also promote research projects led by the company in areas such as climate science and digital biology.

“When we have workloads that can really benefit from H100, and recommenders and language models, now obviously that workload will be the first on Eos,” Charlie Boyle, vice president and general manager of DGX Systems at Nvidia, told HPCwire. .

But Nvidia too, of course, aims for Eos to pave the way for customers to build similarly large systems. Boyle said that while “Nvidia wants the best tools for our R&D teams to use internally,” the most important part for Nvidia customers is that “we have the exact copy of what they're running.”

verticalImage1668792066723

“And the advantage of building one thing and the advantage of building our own supercomputers on top of that thing is that, virtually no matter the size of a customer's system, we have an equivalent or larger system internally,” he adds.

For example, this supercomputer could be used for the pharmaceutical industry. If they used Eos, they could quickly analyze how different compounds interact with specific proteins in the body, identify potential side effects, and evaluate the effectiveness of different doses. This would allow them to make faster and more informed decisions about new treatments, potentially saving lives.

In the case of the autonomous vehicle industry, vehicle safety could be improved through accident simulation and the development of advanced driver assistance systems (ADAS). With Eos' massive computing power, companies can perform collision simulations, for example, in record time.

It is clear that, as companies and developers around the world look to harness the power of AI, Eos is positioned as a key player that promises to accelerate the path to AI-powered applications.

Source link