Science and Tech

CM3leon is the new Meta AI to create images from text and vice versa using fewer resources

CM3leon is the new Meta AI to create images from text and vice versa using fewer resources

July 14 (Portaltic/EP) –

Goal has presented this friday CM3leon, the first multimodal generative AI model capable of creating images from text and vice versawith an “adapted recipe” from multimodal language which, moreover, is train withfive times less resources”.

The company led by Mark Zuckerberg continues to research generative AI models, introducing advances in natural language processing, in this case, to enable pages to understand and express language, as well as systems that can generate images based on text input.

Within this framework, Meta has presented its new AI model CM3leon -pronounced ‘chameleon’- able to offer “the highest performance” in converting text to image and vice versa which, in addition, is trained with five times less resources than previous models and generates text and image sequences based on “arbitrary sequences of other text and image content”.

As the company has explained in a statement on his blogit is an innovative solution because it is “first multimodal model” that is trained with an adaptation of text-only language models. In other words, text-only generative models adjust to multitasking instructions, comprising different ranges of actions when following instructions. However, imaging models are specialized, as a general rule, only for specific tasks.

By applying the large-scale multitasking of text-only models to the generation of images and text, it has been possible to improved performance in other tasks such as generating text from images to write a legend of these.

Furthermore, although he is a trained model with a amount of resources five times less than previous models, CM3leon is capable of offering performance “last generation” to create images from text and vice versa. In fact, Meta has underlined that CM3leon has the “versatility and effectiveness of autoregressive models”. As a consequence, it is a model that keeps training costs low and is efficient.

With all this, the company has qualified that it is a mixed-modal masked causal model (CM3) since it can generate sequences of text and images conditional on “arbitrary sequences of other image and text content”. As the company has stated, “this greatly expands the functionality of previous models that were text-to-image only or image-to-text only.”

Following this line, CM3Leon also shows an “impressive” ability to generate complex compositional objectsthat is, images with different components that have nothing to do with each other or that are difficult to fit together.

Likewise, the parent company of Instagram has highlighted that CM3leon performs well in a “wide variety of vision and language tasks”, including visual response to questions and long-form captioning.

CM3LEON CAPABILITIES

Thanks to all its features, CM3leon can proceed to the generation and edition of text-guided imagery. Specifically, text-modified editing is “a challenge” since it is necessary for the model to understand both the text instructions and the generated image itself in order to edit it later.

In line, this new Meta model can also edit images following structure instructions. This is an option that allows you to create “visually consistent and contextually appropriate” edits to an image that adhere to the design guidelines already described above.

Another of the capabilities that CM3leon performs is that of generate an image from a text with descriptions. But, specifically, from a text that describes an image “potentially very compositional“, which tests the model to coherently follow the indications of the text.

CM3leon is also capable of carry out text tasks. In this sense, you can follow different indications to, from an image, generate short subtitles or long, even, can answer questions about a picture.

Within his imaging skills, the user can write a description that includes the exact location of where to place the objects that have been included in the descriptionwithin a limited space.

Similarly, CM3leon is also capable of deliver “super resolution” results, this option adds a separately trained stage to feed higher resolution images to the original model output.

Source link