Aug 2 (Portaltic/EP) –
The Google Assistant, which until now works through voice commands, is closer to being able to be activated with the gaze, thanks to a new functionality called ‘Look and Talk’ (‘Look and talk’).
Currently, the Google Assistant works in more than 95 countries and in more than 29 languages and can only be activated with two commands, ‘OK Google’ and ‘Hey Google’. Once launched, it listens and executes the commands indicated by the user.
With the aim of making contact between the person and the machine, the company has explored new methods of interaction, whose progress it commented on at the end of 2020, in the presentation of ‘Look to Speak’.
So, the Mountain View company indicated that this application was intended to allow people with motor and speech disabilities communicate with devices through the eyeswith their eyes, as well as choosing predesigned rases for them to reproduce.
Later, in the framework of the Google I/O 2022 developer conference, the manufacturer went a step further with ‘Look and Talk’. This technology is capable of analyzing audio, video and text to determine if the user is going directly to the Nest Hub Max.
Now, the technology has offered an update of this technology on his blog about artificial intelligence (AI) and has disclosed in greater detail how this recognition system works.
First of all, Google has commented that ‘Look and Talk’ uses an algorithm based on eight machine learning models. Thanks to him, you can distinguish intentional interactions of gazes up to five feet (approximately 1.5 meters) away to determine if the user is seeking to contact the device.
The technology company has developed this algorithm by confronting it with different variables and characteristics. Among them, those that are demographicsuch as age and skin tones, as well as different acoustic conditions and camera perspectives.
In real time, this technology also faces unusual camera perspectives, since these smart screens are usually placed at specific points in the home at a medium-low height.
The process on which ‘Look and Talk’ is based consists of three phases. To begin with, the assistant identifies the presence of a person using technology that detects the face and establishes the distance at which the subject is located.
Thanks to technology FaceMatch, this solution determines if that person is registered in the system to communicate with the device, a method used by other assistants, such as Alexa.
In this first phase of recognition, the assistant also relies on other visual cues, such as the angle at which the user’s gaze is set, in order to determine whether or not the user is looking to interact visually with the device.
Next, the second phase begins, in which the assistant takes into account additional signals and listens to the user’s query, to determine if this speech is addressed to him.
To do this, it relies on technologies such as Voice Match, which validates the result returned by Face Match previously and complements it. ‘Look and Talk’ then runs an automatic speech recognition model, which transcribes the sender’s words and commands.
Later, the assistant analyzes this transcription and information of a non-lexical nature in the audio, such as the tone, the speed of speech or sounds that may show the user’s indecision during the utterance. Also relies on contextual visual cues to determine the probability that the interaction was intended for the Attendee.
Finally, when the understanding model of this intent determines that the user’s declaration was intended for the Assistant, ‘Look and Talk’ moves on to the phase where it processes the query and looks for a response.
Finally, the company has recognized that each model that supports this system has been evaluated and improved in isolation, as well as tested in a wide variety of environmental conditions that allow the introduction of customization parameters for its use.