Scientists left and right are worried that the AI apocalypse is imminent, and that incredibly smart AI will seriously threaten humanity, potentially leading to our extinction. That’s the worst-case scenario. On our way, we could experience plenty of other AI issues, like the inability to tell what’s real and what’s not.
But many of the same scientists who are raising awareness are also behind some of the breakthroughs that made generative AI products like ChatGPT possible. They started worrying after deploying their inventions.
While I think that we can’t put the genie back in the bottle and that generative AI represents the future of computing, I do see a way to temper the AI advancements. And comedian Sarah Silverman might be at the forefront of it.
No, this isn’t Mad Max-style fan fiction that would see the famous comedian star in her own Netflix show about saving the world. It’s what Silverman just did in real life that’s important. The comedian sued both OpenAI and Meta, whose generative AI products have allegedly ingested her copyrighted book while training.
Silverman isn’t the first to claim that ChatGPT has infringed on copyrights. And she’s not alone. At least two other authors have filed similar copyright infringement suits against OpenAI and Meta. Plenty of others will undoubtedly follow, too.
After all, we’ve long discussed the two main problems with ChatGPT, Google Bard, Bing Chat, and whatever generative AI products you might find in the wild. Many of them use large language model tech that needs to be trained on massive amounts of data. That means companies like OpenAI have ignored copyrights and user privacy to get their hands on as much data as possible.
This was the only way to train the chatbots and develop products like ChatGPT. OpenAI, Meta, Google, and others must have been fully aware of the copyright and privacy implications.
We’re in the wild west of AI
Take Google, for example. The company wants to scrape all available public content you may have authored to train Google Bard. And all the public content that’s otherwise available on the internet. But Google doesn’t want to launch Bard in the European Union, fearing the local laws governing tech and user privacy. Similarly, Google doesn’t want its own employees to use generative AI products at work.
This is the wild west of generative AI development. You probably can’t get to where ChatGPT (GPT-4) is right now without breaking the rules and worrying about the consequences later. Like allegedly reading Sarah Silverman’s The Bedwetter book to train AI without paying for it.
The comedian joined two lawsuits that authors Christopher Golden and Richard Kadrey started. The New York Times reports the suits were filed on Friday in the San Francisco Division of the U.S. District Court of the Northern District of California.
The authors allege that some of the text that might have trained OpenAI and Meta generative AI products might originate from so-called shadow libraries that contain copyrighted content.
Apparently, Meta admitted in a research paper to relying on such data sets for training large language models.
As for OpenAI’s ChatGPT, the chatbot’s ability to generate summaries of the plaintiffs’ books proves that the bot had access to them.
These copyright lawsuits are part of a growing number of similar actions against generative AI companies, The Times reports.
Will copyright and privacy slow generative AI down?
So why is Sarah Silverman’s case significant? It isn’t. But she is a well-known star with plenty of fans. Her involvement in a copyright case against OpenAI’s ChatGPT and Meta might further raise awareness about the copyright and privacy problems that come with such software solutions right now. More people might care about the way AI uses their data. And more authors might follow up with similar lawsuits.
Also, big wins for Silverman and others would set big precedents.
Remember that OpenAI only allowed users to delete their data (their chats with ChatGPT) when faced with privacy inquiries in various markets, starting with Europe. As for copyright, OpenAI had to recently turn off ChatGPT Plus’s access to the internet, as it was bypassing paywalls.
Moreover, I’ll point out that the recent Reddit problems were caused by generative AI accessing the data. Twitter also used that excuse recently.
Eventually, it might prove costly for tech giants to train ChatGPT-like programs with the help of copyrighted content. And you can’t train large language models without access to diverse languages.
On that note, AI privacy legislation should also arrive in due time. Combined with verdicts in these copyright cases, it might help temper the development of AI. Not that AI can be stopped for good now that we got here.
That said, we’re still in the early days. We have no idea what sort of implications a win for Silverman and the other authors will have. The harm has been done. Financial compensation will alleviate the authors’ concerns.
But what about the copyrighted data that was used to train the bots? Can the company remove it? Will this lead to setbacks for generative AI products? We’ll just have to wait and see what happens next. At the very least, Silverman should get some great material for her future shows out of it.
Meanwhile, OpenAI is investing in a massive AI program to train the AI that will hopefully help us save ourselves from the inevitable superintelligent AI threat. This isn’t a Silverman joke, either. It’s called Superalignment, and it’s very real.