The performance of large language models has been measured in recent years mainly taking into account the number of parameters established during the training phase. Under this reasoning, it was totally logical to think that the models improved their ability to perform tasks or solve problems as more parameters were incorporated.
But there are indications to believe that we are witnessing a major paradigm shift in which the volume of parameters is not as important as previously believed. Although a lot of information is kept under lock and key due to the increasingly complex competitive scenario, a clear example of this is the path that major players such as Google and OpenAI would be following.
At this point it is necessary to point out the importance of this apparent change in trend. Providing language models with large amounts of parameters translates into high investment in time and money. Now, if it is possible to make better models by saving money in this area, we could see much faster and more significant advances in different fields of AI.
PaLM 2, fewer parameters, more data
A week ago, Google introduced its PaLM 2 language model intended to take part in the battle with OpenAI’s GPT-4. It is about the evolution of PALM, which arrived the year before to compete with another of the products of Sam Altman’s company, at that time the promising GPT-3. What has been seen recently? That the Mountain View company is changing the way it trains its models.
Details about the technical characteristics of Google’s latest model have not been released to the public, but internal documents seen by CNBC point out that PaLM 2 has been trained with million less parameters than its predecessor, and still boasts superior performance. Specifically, the new generation model would have 340 billion parameters compared to 540 billion in the previous one.
In a blog post, the search engine company has recognized the use of a new technique known as “computational optimal scaling” to make the overall performance of the model more efficient, including the use of fewer parameters and, consequently, a lower training cost. Google’s trick for PaLM 2 has come from another part: increasing the data set.
Remember that the data sets (datasets) are made up of a wide variety of information collected from web pages, scientific studies, etc. In this sense, the leaked information suggests that the new Google has been trained with five times more data than PaLM presented in 2022. This change is presented in tokens, that is, in the units that make up the datasets.
PaLM 2 would have been trained with 3.6 billion tokens, while PaLM would have only 780 billion tokens. To get an idea of this scenario we can mention, for example, that Meta’s LLaMA model has been trained with 1.4 billion tokens. This information about GPT-4 is unknown, but the GPT-3 papers state that the model has 300 billion tokens.
This paradigm shift of using fewer parameters to train models is not unique to Google. OpenAI is also working in that direction. For months Altman has pointed out that the race to increase the number of parameters reminds him of the late 1990s when the hardware industry was obsessed with increasing processor clock speeds.
As point out our colleagues from Genbeta, the head of the AI company assures that “GHz has passed into the background” and gives an example of the scenario in which most people do not know the speed of the processor of their iPhone, but know that it is fast. “What we really care about is capabilities, and I think it’s important to focus on capabilities,” he says.
What are the parameters?
Broadly speaking, the parameters enter the scene in the training stage of the AI models. These allow models to learn from the data and provide answers based on predictions. For example, if we train a model specifically designed to find houses based on price, it would learn parameters such as dimensions, location, or amenities.
Google Images
In Xataka: Aleph Alpha is the most advanced European AI company. And it has everything to become our OpenAI