“Would you mind if the teacher used your computer?” The question was asked by Dr. McCoy in ‘Star Trek IV: Mission Save Earth‘. “Please,” responds the engineer they are visiting. Scotty, very determined, walks up to the screen and says “Computer?” waiting for a response from a PC from 1986. Seeing that nothing happens, Dr. McCoy gives him the mouse, believing it to be a microphone. “Hello, computer!”. That’s when the engineer, surprised, tells him to use the keyboard. “The keyboard? How quaint!” says Scotty. The scene is mythical. And visionary.
In fact, it’s almost an ironic meme of what many science fiction films before and after that one took for granted. Humans do not type or mouse on a screen to interact with machines. They don’t go around touching a cell phone screen.
You don’t see Matthew McCoughnahey talking to TARS and CASE like that. in ‘Interstellar’. Nor to HAL 9000 when he says that “Sorry Dave, I can’t do that.” It doesn’t do it with a message on a screen. He says it. In all those scenes the men and the machines They spoke naturally. And that science fiction future is increasingly real. OpenAI already made it clear to us with GPT-4o, but now it is Anthropic that has placed us a little closer to that future.
He has done it with presentation of ‘Computer Use’a tool with which its AI model, Claude, can interact with our computer. At the moment it does so through a technical demo in an isolated environment – lest it be… – but this makes it clear that this type of function could reach our machines in the near future.
With this new API, they explain in Anthropic, it is possible convert prompts into commands that the computer executes. It achieves this because Anthropic’s AI does not stop taking screenshots to analyze them and know where everything is. There is a simple example:
- You write as a prompt “Open Firefox”
- The AI model, which sees what we see on the screen, scans the screenshot looking for the Firefox icon.
- It locates it and automatically moves the mouse pointer there.
- Simulates the mouse click on the icon to open Firefox.
- Ready. Firefox on screen.
That simple interaction can be much more complex because, as we say, we can ask ‘Computer Use’ to do everything. For example, programming a web page with a design from the late 90s, search information about yourself, fill out forms to find a job or even order food at home.
In these interactions, the keyboard is still the mode for entering requests, but it is inevitable that we end up using our voice instead. It is in fact what is shown in the let’s give of the employees from Anthropic, who speak and then confirm what they want by clicking on the Send button of the message that the machine has “heard”.
The voice, it seems, will end up gradually imposing itself. This is certainly what all the spectacular demos that OpenAI made with GPT-4o proposed months ago. At that time there was a lot of talk about the inevitable analogy that existed with the film ‘Her‘, and certainly everything pointed to a similar future.
Increasingly, we are reaching that point where the mouse and keyboard (and gestures and touch on the mobile phone) can be blurred and are no longer the eternal peripherals. And when it does and someone asks us to use them, we can probably answer the same thing as Scotty.
How picturesque.
Image | Paramount Pictures
In Xataka | Sundar Pichai (CEO of Google) believes that ‘Her’ is inevitable: “there will be people who fall in love with an AI and we should prepare ourselves”
Add Comment