AIs that convert text to images are already outdated. Now what it takes is to convert text to video. Google follows in Meta’s footsteps with Image Video.
Google has presented today Picture Videoyour new artificial intelligence that convert text to video. looks like an answer to make-a-videothe Goal AI that does the same, presented a few days ago.
The diffusion models applied to machine learning, are revolutionizing image-based artificial intelligence. We have already seen some very popular AIs that create images from text, such as DALL-E or Stable Diffusion. But now comes the second generation, which create videos from text.
A few weeks ago Meta presented Make-a-video, and today Google does the same with Video Image, a new AI that convert text to video. In its first version, it generates videos at a resolution of 1280×768 pixels, and 24 fps.
Image Video, a very cinematographic artificial intelligence
Diffusion models are generative models, that is, they generate new data from the data with which they have been trained.
What they do is destroy the data into small manageable pieces, and then rebuild it as needed.
For example, if you type the sentence: “An elephant with a party hat strolling along the bottom of the sea”, the AI deconstructs the sentence to extract keywords like “elephant”, “party hat”, or “bottom of the sea”. , and searches its database for images that meet this description, mixing them consistently to obtain an image or a video with what the phrase asks for:
In the case of Picture Videofirst creates a low resolution video with 24×48 pixels at 3 fps and progressively scales it with higher resolution and more frames, until obtaining videos at 1280×768 pixels at 24 fps, and about 5 seconds long.
It is capable of generating videos imitating famous artists, and various styles of animation.
As explained Ars Technica, Picture Video has been trained using the image bank LAION-400M, made up of more than 400 million images. Google has added 14 million videos.
Unfortunately, this generates results that sometimes they are racist or discriminatory.
That is why Google has decided that, for now, is not going to make this artificial intelligence public. You want to apply a series of filters first to avoid controversial results.
Picture Videothe artificial intelligence of Google that convert text to video, promises to generate a media impact similar to DALL-E. But for now, we have to content ourselves with looking at the examples out there. their website.