Science and Tech

Meta introduces Llama 3.2, its first open source multimodal language model

Meta introduces Llama 3.2, its first open source multimodal language model

September 26 (Portaltic/EP) –

Meta has presented its First multimodal language model Llama 3.2consisting of the small and medium-sized models 11B and 90B, capable of processing both images and text, and the lightweight text-only models 1B and 3B.

In the framework of your event Meta Connect 2024The technology company has unveiled its latest developments, including its new language model, which comes two months after the presentation of Llama 3.1, and which introduces the ability to process images for the first time in the company.

Thus, the new Llama 3.2 model is made up of two small and medium multimodal models, with 11 billion parameters (11B) and 90 billion parameters (90B) respectively. In this sense, the use cases of these new models support picture reasoning, How can it be? understanding graphs and diagrams or image captionsThey also process the directional location of objects in images.

That is, this model allows actions to be carried out such as: extract details from a photograph, understand the scene and then create sentences which could be used as an image title or a start to tell a story.

Likewise, Llama 3.2 is completed with two smaller models, 1B and 3B, that process text exclusively and are designed to work on devices such as a smartphone. These models are optimized for ARM processors and they can solve multitasking with minimal latencyThey also support a context length of 128,000 tokens.

Specifically, these models allow developers to create custom applications on the device, ensuring data remains private within the smartphone or product in question. For example, Meta has noted that they can used to carry out summaries of the last ten messages received in an instant messaging app. They can also be applied to automatically send calendar invitations to organize meetings.

However, the technology company has pointed out that these models stand out, above all, for their ability to run locallyOn the one hand, this feature allows the model’s instructions or responses to be instantaneous, as they are processed locally. On the other hand, it guarantees total privacy, as data such as messages or calendar information are not sent to the cloud.

COMPETITIVE WITH CLAUDE 3 HAIKU GPT4O-MINI

According to the evaluation of these models offered by Meta, models 3.2 11B and 90B are Competitive with the leading entry-level models, Claude 3 Haiku and GPT4o-mini in image recognition and a variety of visual comprehension tasks.

For its part, the 3B model outperforms the models Gemma 2 2.6B and Phi 3.5-mini when performing tasks such as following instructions, summarizing, rewriting instructions and using tools. Similarly, model 1B is similar in this area to Gemma.

To reach these conclusions, Meta evaluated the performance of the models on more than 150 benchmark data setsthat, covering a wide variety of languages.

With all this, the company has pointed out that the open source Llama 3.2 large language model is now available for all developersso that they can begin to test and experiment with its possibilities. Also, All of these capabilities have been included in its AI assistant, Meta AI.

Source link

About the author

Redaction TLN

Add Comment

Click here to post a comment