We live in a world where more and more companies are getting involved in the field of artificial intelligence (AI). From Meta to Microsoft to Tesla, all have invested millions of dollars in creating their own power calculation systems. high performance following the models of traditional supercomputing. But IBM, which has vast experience in the industry, has decided to bet on a different solution.
Instead of setting up a supercomputer in a certain location, with all the effort that this would entail, IBM thought of developing a flexible infrastructure that could be deployed in any of its data centers spread around the world through the cloud. This hybrid system called “Vela” is already a reality. In fact, it has been running quietly since May of last year.
Virtualization and high performance?
At first glance, the idea of using virtualization to train the increasingly complex artificial intelligence models out there may not seem very convincing or effective. We know that the fact of not executing natively the softwareThis usually results in a loss of performance. And how do you allow yourself to lose performance where every margin of computing power is an important resource?
From IBM they point out that after a long time of research they have managed to reduce the virtualization overhead to approximately 5%. These values, according to the company, if contrasted with the versatility of the solution, are within acceptable parameters. One of the advantages is to use existing infrastructure and quickly allocate resources through software.
The idea, on paper, is really tempting. The American giant now has the ability to use your calculation capacity flexibly. Precisely, this first configuration of Vela looks beastly. We don’t know the exact number of nodes that are part of the configuration that they are using at IBM Research, but we do know how each one is made up. Let’s see.
Each node has two 2nd generation Intel Xenon processors, 1.5TB of DRAM, four 3.2TB NVMe drives, and eight 80GB NVIDIA A100 GPUs. In addition, Vela benefits from scalable NVLink and NVSwitch solutions, i.e. direct GPU interconnect that scales the input/output (IO) of multiple GPUs to deliver full NVLink speed communication within a single node and between nodes.
The creators of the famous Watson system say they have set out to “reduce the time to build and implement world class AI models”. Now they hope to continue making advances in this field and creating “new opportunities for innovation.” Of course, for now, Vela will be an exclusive resource for IBM Research researchers. We’ll see what they have up their sleeves.
In Xataka: Frontier, the world’s most powerful new supercomputer, has reached a milestone: breaking the exascale barrier