These Artists Found Their Work Being Used To Train AI, And Now They’re Furious

Once the preserve of a select group of tech insiders, text-to-image AI systems are becoming increasingly popular and powerful. These tools, which usually offer a few free credits before being charged, can create all kinds of images with just a few words, including those that clearly evoke the works of very many artists (if they don’t seem to have been created by the same artist). Users can refer to these artists with words such as “in the style of” or “by”, along with a specific name. And the actual uses of these tools can range from personal amusement to more commercial cases. But the discovery by artists that their work is being used to train AI raises a fundamental concern: their own art is actually being used to train a computer program that could one day attack their livelihood.

In just a few months, millions of people have flocked to AI text-to-image systems and they are already being used to create experimental films, magazine covers and images to illustrate news articles. An image generated by an AI system called Midjourney recently won an art competition at the Colorado State Fair and caused an uproar among artists.

But the discovery by artists that their work is being used to train AI raises an even more fundamental concern: their own art is actually being used to train a computer program that could one day attack their livelihood. Anyone who generates images with systems such as Stable Diffusion or DALL-E can then sell them, the specific terms regarding copyright and ownership of these images vary. “I don’t want to participate in the machine at all that will devalue what I do,” said Daniel Danger, an illustrator and printmaker who learned that a number of his works had been used to train Stable Diffusion.

When fine arts become deals

Machines are far from magical. For one of these systems to ingest your words and produce an image, it must be trained on mountains of data, which can include billions of images pulled from the Internet, along with written descriptions.

Some services, including OpenAI’s DALL-E system, do not disclose the datasets that their AI systems are based on. But with Stability Diffusion, Stability AI is clear about its origins. Its base dataset was trained on pairs of images and text selected for their appearance from an even more massive cache of images and text from the Internet. The complete dataset, known as LAION-5B, was created by the German artificial intelligence association LAION: “large-scale artificial intelligence open network”.

This practice of scraping images or other content from the internet to form datasets is not new and traditionally falls under what is known as “fair use”, a legal principle of US copyright law. author who authorizes the use of works protected by copyright in certain situations. Indeed, these images, many of which may be copyrightable, are used in very different ways, for example to teach a computer to identify cats.

But the datasets are getting bigger and bigger and forming more and more powerful AI systems, including, recently, these generative systems that anyone can use to create remarkable images in an instant.

There are a few tools to search the LAION-5B dataset, and a growing number of professional artists are discovering that their work is one of them. One such research tool, designed by writer and technologist Andy Baio and programmer Simon Willison, stands out. Although it can only be used to search for a small fraction of Stable Diffusion’s training data (over 12 million images), its creators have analyzed the art images it contains and have determined that, of the first 25 artists whose work was represented, Erin Hanson is one of only three still living. They found 3,854 images of his art in this small sample.

Erin Hanson is one of several professional artists whose work has been included in the dataset used to form Stable Diffusion, which was launched in August by London-based company Stability AI. She is one of many artists who have been upset to learn that photos of their work have been used without their knowledge, consent, or payment for their use.

Stability AI Founder and CEO Emad Mostaque said that art is only a tiny part of the LAION training data that Stable Diffusion is based on. Art is much less than 0.1% of the dataset and is only created when the user deliberately requests it, he said. But that’s cold comfort for some artists.

angry artists

Danger, whose artwork includes posters for bands like Phish and Primus, is one of many professional artists who have said they fear AI image generators will threaten their livelihood. He worries that the images people produce with AI image generators will replace some of his more “utilitarian” work, which includes media like book covers and illustrations for articles published online. Why are we going to pay an artist $1,000 when we can have 1,000 images to choose from for free? , he asked.

Tara McPherson, a Pittsburgh-based artist whose works appear on toys, clothing and in movies like the Oscar-winning film “Juno,” is also concerned about the possibility of some work being lost to AI. She feels owed and “exploited” that her work was included in the original Stable Diffusion dataset without her knowledge, she said. how easy is it going to be? How elegant is this art going to become? Right now it’s a little wonky at times, but it’s only just begun, she wondered.

While the concerns are real, the remedies are unclear. Even if AI-generated images have a widespread impact (e.g. changing business models) that doesn’t necessarily mean they violate artists’ copyrights, according to Zahr Said, a law professor at the University of Washington. And it would be prohibitively expensive to license every image in a dataset before using it, she added. You can have sympathy for the artistic communities and want to support them, but also tell yourself that there is no way to do it. If we did that, it would be like saying that machine learning is impossible, she concluded.

McPherson and Danger considered the possibility of watermarking their works when they were uploaded in order to protect the images (or at least make them less attractive). But McPherson said when she’s seen artist friends put watermarks on their images online, “it ruins the art, and the joy of people who look at it and find inspiration in it.”

If he could, Danger said he would remove his images from the datasets used to train the AI ​​systems. But removing images of an artist’s work from a dataset would not prevent Stable Diffusion from generating images in that artist’s style.

To begin with, the AI ​​model has already been trained. But also, as Mostaque said, specific artistic styles could still be solicited by users thanks to OpenAI’s CLIP model, which was used to train the stablecast to understand the connections between words and images.

Christoph Schuhmann, one of the founders of LAION, said via email that his group believes that the ability to choose to enter or exit datasets will only work if all parts of the AI ​​models (which can many) respect these choices. A one-sided approach to consent processing will not suffice in the world of AI; we need an inter-professional system to manage this, he said.

Giving artists more control

Partners Mathew Dryhurst and Holly Herndon, Berlin-based artists who experiment with AI in their collaborative work, strive to address these challenges. Along with two other collaborators, they launched Spawning, a tool for artists that they hope will give them a better understanding and control over how their online art is used in datasets.

In September, Spawning launched a search engine capable of combing through the LAION-5B dataset, haveibeentrained.com, and in the coming weeks intends to offer a way for people to withdraw or enter data sets used for training. Over the past month, Dryhurst has encountered organizations that are training great AI models. He wants them to accept that if Spawning collects lists of works from artists who don’t want to be included, they will respect those requests.

According to Dryhurst, the goal of Spawning is to make people understand that consensual data collection benefits everyone. And Mostaque agrees that people should be able to opt out. He said Stability AI is working with many groups on ways to enable greater community control of database content in the future.

In a Twitter thread in September, he said Stability is open to contributing ways for people to opt out of the datasets, such as supporting Herndon’s work on this with many other upcoming projects. I personally understand the emotions this brings, as systems become smart enough to understand styles,” he said.

Schuhmann said LAION is also working with “various groups” to figure out how to allow people to opt in or out of having their images included in training text-to-image AI models. “We take artists’ feelings and concerns very seriously,” Schuhmann said.

Hanson, for her part, doesn’t mind her works being used for learning AI, but she wants to be paid. If images are sold that were made with the AI ​​systems trained on their work, the artists must be compensated, she says, even if they are fractions of cents.

It could be on the horizon. According to Mostaque, Stability AI is exploring how creative people can be rewarded for their work, particularly because Stability AI publishes AI models itself, rather than using those built by others. The company will soon announce a plan to gather community feedback on “practical ways” to achieve this, he added.

Sources: Spawn, Twitter

And you?

What is your opinion on the subject?

See as well :

Artists are starting to sell AI-generated artwork on photo websites, using software that creates art on demand, some artists are trying to cash in on it

An AI-generated artwork has won first place in a fine art competition at a state fair and artists are furious



Dall-E 2 can generate images from a few words, but is the product yours? Your AI-generated digital artwork may not be copyrighted

An artificial intelligence has recomposed “realistic” photos of historical figures like Napoleon Bonaparte and Julius Caesar, using generative adversarial networks

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *