Science and Tech

Nvidia creates an AI that adjusts the gaze of the interlocutor in the direction of the camera and translates speech in real time

Nvidia creates an AI that adjusts the gaze of the interlocutor in the direction of the camera and translates speech in real time

12 Jan. (Portaltic/EP) –

nvidia has developed an Artificial Intelligence (AI) technology, Maxine, which is capable of adjusting the gaze of the speakers so that they direct their eyes to the camera, emulating the position of the eyes and the blinksas well as translate the voice in real time during a video conference.

The company has reported that Nvidia Maxine is a set of ‘software’ development kits (SDKs), which belongs to Nvidia AI Enterprise, a library of programs that includes workflows, AI solutions, and pre-trained learning models.

According to Nvidia, this work allows developers to implement “premium augmented reality features, both in audio and video quality”, as detailed. in his web page.

In it, he also maintains that Maxine includes AI functions. “accelerated and optimized for inference in real time on the GPUs”, resulting in low latency audio, video and augmented reality (AR) effects with high network resilience. It also offers a number of AI-based effects in the Audio Effects Microservice section.

Thus, this solution removes noise and echo of the room and achieves sound with high-quality resolution. At the video level, Video Effects Microservice offers effects such as the virtual background during video calls and allows you to maintain eye contact.

As for the video effects SDK, from the company they emphasize that Nvidia Maxine is capable of achieving “super resolution”, since preserves the texture of images with a quality that multiplies up to four times and preserves its details in low light conditions.

The fact that this tool allows you to keep your eyes on the camera is one of the most outstanding features of Nvidia’s AI. Maxine simulates the interlocutor’s eye contact with the lens, since it estimates and aligns its gaze with the camera. In addition, she emulates the shape of the eyes, their position and the blinking.

This feature, integrated into the AR SDK, offers real-time 3D face tracking and also estimates body posture for measure their actual interactions and duplicate them on the screen in real time.

Nvidia AI integrates an update with an improved model, includes a new six degrees of freedom (DOF) head pose. In addition, it samples facial features and contours with 126 facial recognition points.

For the estimation of the body pose, instead, this Nvidia technology tracks 34 key points of the human body both in two dimensions (2D) like three dimensional (3D).

Another feature that the company has highlighted in this AI is its ability to translate voice in real time. Thus, this technology promises “overcome language barriers” and you can jump from one language to another as soon as you finish a sentence.

As Maxine’s promotional video suggests, at the moment it supports English, Spanish, French and German, although it has not specified whether it can translate more languages ​​in real time.

Source link