Computer science culture often means anybody’s data is fair game to feed the AI algorithm

Content created with the help of generative AI is popping up everywhere, and it’s worrying some artists and content creators. They’re concerned that their intellectual property may be at risk if generative AI tools have been built by scraping the internet for data and images, regardless of whether they had permissions to do so.

Now some artists and content creators are trying novel ways to sabotage AI to prevent it from scraping their work, through what’s called data poisoning.

In this episode of The Conversation Weekly podcast, we speak with a computer scientist who explains how data poisoning works and what impact it could have, and why he thinks the issue it’s trying to combat is part of a bigger ethical problem at the heart of computer science.

Dan Angus enjoys playing around with generative AI. A computer scientist by training, he’s now a professor of digital communication at Queensland University of Technology in Australia, and he thinks a lot about AI, automation and its impact on society. He’s worried about what new generative AI tools mean for creators.

We need to be mindful about how they can intrude upon the intellectual property and the whole financial ecosystem that supports art and artists.

A number of copyright infringement cases have emerged in recent years of artists accusing big tech companies of stealing their work.

When Angus spoke to The Conversation Weekly, he prompted a popular AI text-to-image generator to create a series of images – of a person riding a space bull in a Mars environment, in the style of Van Gogh. The images it created are recognisable, if pretty wacky.

Four AI-generated images created using Midjourney showing an astronaut on a bull on Mars in the style of Van Gogh. — Images created via a prompt to Midjourney. Screenshot taken by The Conversation., Author provided (no reuse)

But if the image generator had been built using data that had been “poisoned”, the images it produced might be even stranger. The bull might be substituted by horse, for example, or it wouldn’t look like a Mars environment at all.

Angus explained that an artist who chooses to poison their data in this way might insert a small pixel inside the digital image, that would be invisible to the naked eye, but would throw off the generative AI. It could “completely skew the training of the model in particular directions”, he says, adding that “it doesn’t take a lot of that to enter a system to start to cause havoc.”

One such tool called Nightshade was released in January 2024 by a team at the University of Chicago, who told The Conversation it was downloaded 250,000 times in its first week of launch. Other tools available for audio, or video creation too.

Angus doesn’t believe data poisoning in this way will have a huge impact on the most popular generative AI companies, mainly because of its limited scale. But he is worried that a culture in computer science to focus more on the end, rather than means, means intellectual property rights are often disregarded.

It breeds a certain set of attitudes around data, which is that data found is data that is yours. That if you can find it online, if you can download it, it’s fair game and you can use it for the training of an algorithm, and that’s just fine because the ends usually justify the means.

He thinks this “really deep cultural problem” about how computer scientists and developers treat data, and generate data sets, that could lead to bigger problems down the line.

Listen to the full interview with Dan Angus on The Conversation Weekly podcast, which also features Eric Smalley, science and technology editor at The Conversation in the US.

A transcript of this episode will be available shortly.

This episode of The Conversation Weekly was written by Katie Flood with production assistance from Mend Mariwany. Gemma Ware is the executive producer. Sound design was by Eloise Stevens, and our theme music is by Neeta Sarl. Stephen Khan is our global executive editor, Alice Mason runs our social media and Soraya Nandy does our transcripts.

You can find us on Instagram at theconversationdotcom or via email. You can also subscribe to The Conversation’s free daily email here.

Listen to The Conversation Weekly via any of the apps listed above, download it directly via our RSS feed or find out how else to listen here.

Computer science culture often means anybody’s data is fair game to feed the AI algorithm – but artists are fighting back

Author

Interviewed

Disclosure statement

Partners

Want to write?