Chatbots like Gemini and ChatGPT could be just the tip of the iceberg in the artificial intelligence (AI) revolution. Everything indicates that the next great advance of this industry will come with the agents: Programs designed to take control of systems or applications and perform a wide variety of tasks. Google has just taken a very important step in this direction.
The Mountain View giant presented this Wednesday to Project Mariner (formerly known as Project Jarvis). It is an AI agent designed to understand what appears on the browser screen and perform actions on the user’s behalf. It is based on Gemini 2.0, the most recent version of the company’s family of language models.
A new way to use the browser
Google explains that Project Mariner has the ability to interact with web pages thanks to an experimental extension available in Chrome. First, the system analyzes the user’s instructions (written or voice). It then tries to perform the requested requests by analyzing the pixels, the text on the pages, the code, the images and even the forms.
In a demo video We see a Chrome window with a spreadsheet open containing the names of several companies. A member of the Google DeepMind team asks the agent to take the list of companies and search their web pages to extract a contact email. Immediately, the agent begins to do exactly what has been asked of him.
Open the Google search page, search for each of the companies, navigate within them to the About Us section and extract the information. The agent is carrying out a visual report progress in a browser sidebar, letting you know exactly what you’re doing. Also that it is possible to stop its operation at any time.
Google says the agent can be useful for automating repetitive tasks and helping save people time. And, if a request is not clear enough, the agent can ask the user for clarification or more information. This should reduce failure situations. It should be noted that the company expects its agent to make some mistakes, since this is an experimental version currently available only to some “trusted testers.”
In October of this year we learned about Computer Use from Anthropic, a system that allows you to automate tasks in the computer’s operating system. Since this is an early version, the Anthropic agent It’s still very limited.. There are tasks that he cannot complete, and sometimes he becomes slow or makes mistakes. In any case, this technology should continue to evolve.
Images | Google
In Xataka | Chatbots and generative AI seemed like the industry’s way forward in AI. Now there are some new pretty children: the agents
Add Comment