Science and Tech

New Nvidia Fugatto AI flexibly generates or transforms music, voices and sounds from text

New Nvidia Fugatto AI flexibly generates or transforms music, voices and sounds from text

Nov. 26 (Portaltic/EP) –

Nvidia has presented a new Artificial Intelligence model called Fugatto that is capable of creating or transforming any mixture of voices (accents), music and sounds in a completely personalized and flexible way based on descriptions and using any combination of text and audio files.

Fugatto is short for Foundational Generative Audio Transformer Opus 1 and has been introduced as “a swiss army knife for sound” which offers features that have not occurred until now in other AI models, as explained in a press release.

Although other generative technologies can compose a song or modify a voice, “none have the skill of the new offering,” because Fugatto is capable of generating or transform elements such as voices, sounds or music described with text prompts.

For example, with this AI it is possible to create a fragment of music based on a certain text, remove or add instruments from an existing song, change the accent or emotion of a voice and even “allow users to produce sounds never heard before.” “.

The applied audio research manager at Nvidia, Rafael Valle, has clarified that this tool is the first to show emergent properties, that is, capabilities that arise from the interaction of its trained skills, as well as the ability to combine free form instructions.

This model uses a technique called ComposableART to combine instructions that have been given separately during training, so that a combination of them could request a text spoken with a French accent and a sad tone. This means that the user You can detail how closed or open the accent is or the degree of emotion of what you are narrating.

To offer that flexibility in its use, it also generates sounds that change over time, what Nvidia has called ‘temporal interpolation’. In this way, the sounds of a storm moving through a specific area can be created with ‘crescendos’ of thunder that fade into the distance.

The company has also indicated that unlike most models, “which can only recreate the training data they have been exposed to,” Fugatto can create soundscapes that transform the context of an electrical storm that transforms into dawn with the sound of birds singing.

Fugatto can be used, for example, in marketing campaigns, to target multiple regions or contexts, applying different accents and emotions to the ‘off’ voices that narrate the ads. Likewise, the developers of video games They will be able to use it to modify the pre-recorded resources of their titles so that they can adapt to its action as the games progress, among other use cases.

Source link