Jan. 10 (Portaltic/EP) –
Microsoft is developing VALLEY, a technology based on Artificial Intelligence (AI) that is able to learn and imitate any voice taking as an example a recording of three seconds.
The tech company American stands betting on the implementation of AI in its products and services. In this sense, it is working on projects to add the chat developed by OpenAI ChatGPT in its browsers and in the Office suite.
Following this line, Microsoft has presented your project of AI VALLEYa Text-to-Speech language model (TTS) that synthesizes text to transform it into speech. The novelty of this technology is its learning capacity in context that, through audio recordings of only three seconds, it is able to imitate the voices of these recordings.
In other words, as Microsoft explains in a document shared on GitHub, VALL-E can synthesize “high-quality” custom voices with a recorded three-second recording of a speaker. Its developers also point out that the samples taken suggest that VALL-E could “preserve the emotion of the speaker and the acoustic environment of the message”.
The company has stressed that this technology beats “significantly” to other TTS systems in terms of the naturalness of speech and the similarity with the speaker. During the pre-training stage, the developers scaled TTS training data to 60,000 hours of English speaking, which they have explained is “hundreds of times” larger than existing systems.
Likewise, another novelty of this technology is that it is being developed to work with “other generative AI models”, such as GPT-3. This feature offers possibilities of integrate VALL-E into other technologies such as ChatGPT. In this way, this AI could also offer voice results in addition to text.