May 28. (Portaltic/EP) –
The working group tasked with determining whether OpenAI’s large language models (LLMs) comply with European legality has shared preliminary conclusions regarding public data and transparency.
This first approach to the legality of LLM shares some considerations on data processing – which it divides into five phases – and the fundamental rights and freedoms of users in the European Union, in some cases with notes on the measures adopted by the technology company.
The first three phases, which range from data collection and data pre-processing to training, pose risks to the fundamental rights and freedoms of citizens, since, as the working group understands, the data used for the training, often collected from public publications on the web, may contain personal information, including data from special categories included in the GDPR such as those that reveal racial or ethnic origin, political opinions, religious convictions, union membership or even relative to health and sexual orientation.
Therefore, they understand the need for the collection process to find a balance in which both these rights and freedoms of individuals and the legitimate interests of the data controller are taken into account.
And they add the importance of security and protection measures, which may include establishing criteria for accurate data collection, as well as avoiding the processing of special categories of data or even technologies to anonymize data and eliminate it if it comes from technical techniques. such as ‘scraping’, that is, the use of data previously collected on other websites or applications.
They also point out that the fact that some data is publicly available on the web does not mean that the person has made that data public, and that therefore, for its collection and processing to be legal, especially in special categories, it is important to determine whether “it was intended, explicitly and through clear affirmative action, to make the personal data in question accessible to the general public.”
The next two phases of processing address user interactions with ChatGPT, that is, prompts, responses, and training with those prompts. These interactions are done through text, but also with the uploading of audiovisual files, and therefore, the working group considers that it is necessary to “demonstrably inform” that this shared content will be used for the training of the ‘chatbot’ and language models.
The working group also addresses issues such as fairness, transparency obligations and data accuracy, and points out the measures that OpenAI has already presented in this regard.
Starting with equity, the working group points out that users cannot be held responsible for the information they share with the chatbot in their interactions because “if ChatGPT is made available to the public, it should be assumed that people will enter personal data before or after”.
Likewise, with regard to ‘scraping’, and taking into account that this technique collects large amounts of data that make it impossible to inform all affected people, the working group indicates that Article 14.5b of the GDPR applies, which establishes that “the controller will take appropriate measures to protect the rights and freedoms and legitimate interests of the data subject, including making the information available to the public.”
If the information is obtained from interaction with ChatGPT, however, Article 13 applies, and it becomes “particularly important” to inform users that their data may be used for training.
Finally, and given that the answers offered by ChatGPT may be erroneous, biased or invented, the convenience of clearly indicating this fact is also highlighted, since users tend to take the information provided by the chatbot as factual data. ‘.
On the other hand, the group also analyzes the rights of users, and recognizes that the technology firm, in its terms, informs about the processing of data and how it can be deleted, or rectified, and even if in certain circumstances it transfers said data. To thirds.
It also highlights that OpenAI allows direct contact via ’email’ to resolve questions about your rights, and that the user account settings themselves already allow you to exercise some of the citizens’ rights.
Are preliminary conclusions They come a month after the European Data Protection Committee created the working group focused on ChatGPT in April 2023, following the announcement that national data protection authorities were going to investigate whether OpenAI complied with European law.
Add Comment