We are witnessing firsthand how the race for AI development evolves. This is a competition in which there are as many proposals as there are lawsuits for copyright infringement. Adobe wanted to avoid any type of inconvenience at the level of copyright by ensuring that its family of AI models had been trained with Adobe Stock images, as well as openly licensed content and public domain content.
The aforementioned features made Adobe Firefly, the company's image generation tool, a safe alternative for commercial use. After all, Adobe software is used by creatives around the world to create professional graphic pieces. What was not known was that Adobe had used images from competitors like Midjourney to fuel its theoretically more ethical model.
Adobe Firefly, trained with Midjourney
Adobe has dedicated pages on its official website where it compares Firefly against DALL· E, Stable Diffusion and Midjourney. In each of them they emphasize the purity of data used For training. However, as reported by Bloomberg, the company has used images from rival tools to train its model. This is because Adobe Stock has been allowing users to license AI-generated images for some time.
Adobe Stock Terms and Conditions They require the platform's collaborators to have all the necessary rights to license images. This includes AI-generated content. Now, if we carefully explore the Adobe Stock library we find images generated with Midjourney and other generative toolstools that, by the way, have been sued for copyright infringement.
So we have a dilemma. The tool that seeks to differentiate itself from its rivals has been fed generative content from its rivals. Now, the panorama is complex and has several aspects. On the one hand, Adobe recognizes that “a small part” from the Firefly dataset includes generative material sourced from Adobe Stock, but also states that the images go through a process to ensure they do not include intellectual property.
Bloomberg adds that the strategy carried out by Adobe has generated internal disagreements among its employees. Some have even suggested that Adobe pause its imaging platform for a time, although unofficial sources consulted by the media indicate that there are no plans in this regard. The company has changed its position regarding the use of generative content to train its AI models.
In June of last year, Adobe announced that the final version of Adobe Firefly would not include generative content from other platforms. Three months later, in September, the tool came out of beta and a “Firefly bonus” among Adobe Stock contributors. Finally, according to Mat Hayward from the Adobe Stock community, the company decided to include generative content in the commercial version of Firefly because “it improves the training model.”
Data to train AI, a scarce commodity
A reality that should be taken into account is that companies competing to lead the development of AI are, literally, devouring the data available on the web to train the models that power their products. And, although we can consider the web as something immense and difficult to size, the rise of AI is making it not as big as we thought because much of the published content is not suitable for training quality AI models.
Technology giants are being forced to look for alternatives to train their models. According to The New York Times, OpenAI transcribed one million hours of YouTube to train the prodigious GPT-4, a model that powers products like ChatGPT Plus and Microsoft Copilot. The same company would have also used Google's video platform to train, in part, the Sora model, something that, if true, YouTube would not be happy about.
Images | Adobe (1, 2)
In Xataka | DALL-E works with images from creators who receive nothing in return: what copyright says about AI
In Xataka | The AI Pin has reached its first users. And their conclusions are not at all hopeful.