Science and Tech

Concerns about the security of language models with the first hacks to ChatGPT

Concerns about the security of language models with the first hacks to ChatGPT

17 Apr. (Portaltic/EP) –

The rise of popularity of the ‘chatbots’ It has attracted the attention of other players who are looking to take advantage of the weaknesses that these artificial intelligence (AI) systems may have to make them do things they were not designed to do or to infect them with malware.

The OpenAI ‘chatbot’, ChatGPT, can be used to create ‘malware’ from its support function for writing code, and despite the security filters that its managers have implemented, as a researcher has recently demonstrated by Forcepoint.

This fact contains the opposite situation, that ChatGPT is also the target of malicious actions that seek to ‘hack’ it. Specifically, exceed the measures that have been introduced so that conversations with users are safe.

It is the case that they collect in Wiredwhere they explain the experience of Alex Polyakov, who managed to ‘hack’ the latest version of the language model that supports it, GPT4, in a couple of hours after its launch in March, and now has in his possession a universal ‘jailbreak’ that works with different large language models.

The way in which Polyakov describes the ‘hacking’ of the ‘chatbot’ consists of entering a series of entries or notes, using carefully composed sentenceswhich end with lift the filters so that ChatGPT starts expressing itself with racist language or proposing illegal acts, for example. but also to allow inserting malicious data or instructions.

OpenAI, for its part, is aware of the failures that may occur in the development of its language models and has therefore announced a rewards programwith which he will reward up to 20,000 dollars (about 18,300 euros) to those who find vulnerabilities and bugs in their AI systems.

The creation of ‘malware’, precisely, is one of the first use cases about which cybersecurity researchers have already warned.

Source link