When Facebook, Inc. became Meta Platforms, Inc. in October 2021, unless he had the uncanny ability to glimpse the future, Mark Zuckerberg may not have imagined that his ambitious bet on the metaverse he would hit a monumental roadblock about a year later.
In November 2022 OpenAI introduced ChatGPT and within days almost everyone was talking about the features of the conversational chatbot. For many, this movement was the starting point of a race to lead in the field of artificial intelligence that strongly shook the technology industry.
Some companies were in a more favorable position than others. Microsoft was undoubtedly one of them.. The Redmonds they had invested 1,000 million dollars in the company led by Sam Altman in 2019 and, seeing what was happening, they took out the checkbook again, this time for 10,000 million dollars.
Meta’s big change
Despite all this, Meta was still burning a fortune in the metaverse, a long-term idea that presented itself with numerous challenges. To achieve the desired results, important advances were needed in the field of virtual and augmented reality. A solid business model had to be developed. And finally, it would take years to be profitable.
The opportunities, apparently, were in the world of artificial intelligence, but at the highest level. And it is not that the social networking company had no experience in this sector, for example, its systems of content recommendation and its advertising platform are mainly supported by advanced algorithms.
Rather, its ability to demonstrate significant advances in developing next-generation language models was a step behind. According to documents seen by Reutersthe company’s infrastructure needed substantial changes to catch up while the production use of its own AI chip did not quite catch on.
The leaks indicate that the final rudder change took place at the end of the summer of 2022, but now is when we begin to see the results. Although Meta claims that she remains committed to the metaverse, she clearly demonstrates a strong focus on AIwith projects that include generative algorithms and beyond.
This Thursday, Zuckerberg has revealed four innovations designed to “promote new experiences” of artificial intelligence from Meta. It has pulled back the curtain to showcase its existing AI data center upgrade, the Research SuperCluster, a new self-designed chip, a new data center design, and a programming assistant. We will focus on the first three.
Own data center, powered by NVIDIA
In January of last year we found out that Meta had been developed for more than a year a AI data center that promised to become one of the most powerful of its kind. Like many projects with similar characteristics, the construction of the so-called AI Research SuperCluster (SRC) it was planned gradually and in stages.
The second phase of the SRC, which was scheduled to come online in mid-2022, has just been completed. Meta has made some adjustments to their diagram to try to achieve almost 5 ExaFLOPS of computing power at full load. All this thanks to a rooted and very expensive hardware developed by NVIDIA.
In this ambitious data center of the Menlo Park company we found 2,000 systems NVIDIA DGX A100 incorporating 16,000 powerful graphics processing units NVIDIA A100 released in 2020. All this under the umbrella of the high-performance NVIDIA Quantum InfiniBand 16 Tb/s interconnect system.
As we say, the SRC has been in operation for a long time, which is why it has been used by the company for different research projects. Among them we find LLaMA, the great language model that was announced to the public earlier this year. A kind of competition to OpenAI’s generative GPT systems.
This data center, with the upgrade it just received, is expected to become one of the protagonists of the next steps of Goal. The company says it will continue to use it to train language models and even explore other areas of generative AI. In addition, he assures, it will be key to the construction of the metaverse.
A new approach, ‘made in Meta’
As we say, Meta’s operational AI infrastructure is currently based on NVIDIA, a company that has become one of the big winners in the race in this field. Following in the footsteps of Google, it chose to start developing its own high-performance chip for AI data centers with a very specific approach.
GPU (Graphics Processing Unit) based solutions are often the right choice for data centers due to their ability to perform multiple worker threads simultaneously, among other features, of course. Goal account in a blog post They came to the conclusion that they are not adequate in all areas.
Although these play a fundamental role in data centers dedicated to training AI models, according to the social networking company, they are not as efficient in the inference process. To put some context, inference is the second phase of the machine learning process, the one that occurs after training.
In training, as we mentioned before, the model learns from the data and its parameters are adjusted to provide answers in a process that is very demanding. time and calculation capacity. In inference, what has been learned is put into practice by giving answers, but with a fraction of the power used in training.
CPU better than GPU for inference processing
Starting from this premise, Meta changed its focus. Instead of using GPU-based systems for inference processes, he opted to use CPU (Central Processing Unit). This meant an opportunity to develop its own family of chips called Meta Training and Inference Accelerator (MTIA) specific for inference.
Although this project has its origins in 2020, now is when the company has decided to speak publicly about it. And this comes with some interesting technical data. We are facing chips manufactured under the process of TSMC 7 nanometer photolithography whose TDP is 25 W. It is designed to support up to 128 GB of memory.
Each CPU is mounted on M.2 boards that connect via PCIe Gen4 x8 slots. Remember that data centers have multiple of these chips working in unison to deliver high levels of computing power. These characteristics, mentioned in broad strokes, are not definitive and continue to evolve.
We do not know how much these chips developed by Meta and manufactured by TSMC will come into play, but the following point can serve as a guide. The company is already working on its next generation data centers, which will complement the work of the SRC. Precisely, the heart of these data centers will be the MTIA chips.
Meta says that root control of the physical and software components of their upcoming data centers translates into an “end-to-end experience” that will allow them to substantially improve their data center capacity, although it does not mention dates. Of course, remember that we are in the middle of a race.
Images: Goal
In Xataka: Sam Altman is clear that to regulate AI you have to license it. That’s especially good for OpenAI.