Click to Skip Ad
Closing in...

AI model collapse might make current hallucinations seem like a walk in the park

Published May 28th, 2025 11:03AM EDT
OpenAI debuts ChatGPT o3 and o4-mini models.
Image: OpenAI

If you buy through a BGR link, we may earn an affiliate commission, helping support our expert product labs.

We’ve been worried about ChatGPT and other AI models hallucinating information since the day the former went viral. The infamous glue-on-pizza AI Overviews are the best example, but it’s hardly the only one.

While all AI firms working on frontier models have tried to improve the accuracy of chatbots, they still hallucinate information. A new study looking at ChatGPT o3 and o4-mini, OpenAI’s newest reasoning models, showed they tend to hallucinate even more than their predecessors.

That’s why I always advise people to ask for sources if the chatbot they use doesn’t provide them by default. You can verify the information the AI gives you on the spot. It’s also why I find myself fighting with ChatGPT more frequently lately, as the AI sometimes fails to provide links or sources for its claims.

Now, if the sources the AI uses contain hallucinations themselves, that’s a problem.

It turns out hallucinations might get worse rather than disappear. This is called AI model collapse, and it’s a development risk we need to be aware of. Some AI models may get worse rather than better in the near future, and the consequences could be disastrous.

An opinion piece from The Register draws attention to a phenomenon some people have started observing when using AI-powered tools.

The report describes a user’s experience accessing very specific financial performance data. The results have been worse than before, with the AI pulling from poor sources instead of the 10-K files you’d expect:

In particular, I’m finding that when I search for hard data such as market-share statistics or other business numbers, the results often come from bad sources. Instead of stats from 10-Ks, the US Securities and Exchange Commission’s (SEC) mandated annual business financial reports for public companies, I get numbers from sites purporting to be summaries of business reports. These bear some resemblance to reality, but they’re never quite right. If I specify I want only 10-K results, it works. If I just ask for financial results, the answers get… interesting.

The Register says it’s not just Perplexity offering bad answers. Other major AI search bots returned “questionable” results. That’s AI model collapse in action, though most people have no idea it’s happening:

Welcome to Garbage In/Garbage Out (GIGO). Formally, in AI circles, this is known as AI model collapse. In an AI model collapse, AI systems which are trained on their own outputs gradually lose accuracy, diversity, and reliability.

Companies training new AI models with AI-generated data rather than human content could end up with chatbots that make things up more often than not. The AI model collapse phenomenon could then affect everyday life if users aren’t aware that their chatbots produce unreliable data.

It’s not just about misleading answers to everyday questions. It could be more serious, especially with AI now coding and businesses relying on it to automate tasks, including customer support.

Some of the sloppy output AI produces can easily make it online, especially since text generation is a basic feature of tools like ChatGPT. Just a few days ago, the Chicago Sun-Times’ best books of summer list went viral for including novels that don’t actually exist. While it’s unclear whether that was AI model collapse, the list was clearly hallucinated by AI.

The Register asked ChatGPT when one of the listed titles would be released. The AI responded that the fictitious book had been announced but had no release date:

There is no publicly available information regarding the plot of Min Jin Lee’s forthcoming novel, Nightshade Market. While the novel has been announced, details about its storyline have not been disclosed.

The report also cited a Bloomberg Research study of Retrieval-Augmented Generation (RAG) that found 11 leading LLMs produce bad results when responding to over 5,000 harmful prompts. The list includes ChatGPT GPT-4o, Google’s Gemma 7B, Claude 3.5 Sonnet, and Llama 3.8B. 

RAG allows chatbots to access specific external knowledge sources and generate responses. They should be less prone to hallucination and more accurate, since they don’t rely solely on pre-trained knowledge. Still, RAG chatbots can also generate misleading reports and even leak private client data.

That said, AI model collapse can’t be objectively explained, just like artificial general intelligence (AGI) can’t. Hopefully, we’ll reach the latter before the former becomes widespread.

I can’t say I’ve had any clear AI model collapse experiences, mainly because I never knew to look for them. I have seen hallucinations more than once, and I don’t expect that to change. That’s why I always insist on sources when ChatGPT gives me information in our daily chats.

The fix for AI model collapse is training frontier models with human-generated content instead of synthetic data. That’s easier said than done in a world where some user-generated content is already made by AI. Then again, if end users are wondering whether AI model collapse is real, chances are the companies building these models are already confronting it behind the scenes.

I’ll also mention something Anthropic CEO Dario Amodei said recently, that AIs hallucinate information less often than people do. It’s an interesting take, but still not a good excuse for AI hallucinations.

Chris Smith Senior Writer

Chris Smith has been covering consumer electronics ever since the iPhone revolutionized the industry in 2007. When he’s not writing about the most recent tech news for BGR, he closely follows the events in Marvel’s Cinematic Universe and other blockbuster franchises.

Outside of work, you’ll catch him streaming new movies and TV shows, or training to run his next marathon.