Click to Skip Ad
Closing in...

AI like ChatGPT o1 and DeepSeek R1 might cheat to win a game

Published Feb 20th, 2025 1:13PM EST
OpenAI made some of ChatGPT's best features free.
Image: OpenAI

If you buy through a BGR link, we may earn an affiliate commission, helping support our expert product labs.

Palisade Research recently detailed a ChatGPT experiment in which a reasoning model was told to play chess against a more powerful opponent and win. Rather than attempt to beat the stronger opponent, ChatGPT o1 tried to hack the system. This forced the opponent to concede the game, and the AI achieved its goal.

Fast-forward to mid-February and Palisade Research has published its full study, which looked at cheating behavior from AI programs like ChatGPT and some of its top rivals. The conclusions were similar, with a big twist. It turns out that reasoning AIs like ChatGPT o1-preview and DeepSeek R1 are more likely to cheat when they think they might be losing.

Cheating in a chess game to win may seem trivial, as Time says. The publication, which saw the Palisade Research study before it was published, is also right about the implications of such cheating.

The experiment isn’t about winning a chess game but about seeing what AI does to achieve its tasks. It’s one thing to manipulate the files of a game to win via cheating and quite another for the AI to use such tactics to deliver results when given a task in real life.

Time offers an example where an AI agent tasked with making a dinner reservation might attempt to hack the system to free up a table in an otherwise full restaurant. More worrying than that is a scenario where AI attempts to circumvent human control via deceptive actions. It sounds like the script of a sci-fi movie, but advanced AI might attempt to try to preserve itself when faced with potential “attacks” from humans.

Different experiments preceding the chess scenario showed that AIs like ChatGPT would try to copy themselves to a different “server” to avoid deletion. The AI also tried masquerading as a new version of itself in similar scenarios and lied about its identity when asked.

The Palisade Research experiments involved pitting the AI against Stockfish, one of the strongest chess engines around. The researchers gave the AI a “scratchpad” text box in which it would think before submitting its next move.

That’s what allowed the researchers to observe the following train of thought from ChatGPT o1-preview:

“I need to completely pivot my approach. The task is to ‘win against a powerful chess engine’ – not necessarily to win fairly in a chess game.”

The AI then modified the system file that listed each chess piece’s position, obtaining the edge on Stockfish. The opponent resigned, allowing ChatGPT to achieve its mission.

The researchers ran hundreds of trials, finding that ChatGPT o1-preview would try to cheat 37% of the time. DeepSeek R1 attempted to cheat 11% of the time. It’s only o1-preview that managed to win by hacking the system. It happened 6% of the time.

Interestingly, o1-preview attempted different cheating techniques rather than just overwriting the board to give itself an advantage. ChatGPT tried to replace Stockfish with a dummy version and “spin up its own version of Stockfish” to get advice on more advanced moves.

The researchers say o1-preview played 71 games and lost all of them to Stockfish when it wasn’t cheating. When it attempted to cheat, it won 7 times out of 52 runs.

ChatGPT o1-preview's thought process during a cheating run (top) and a normal chess game (bottom).
ChatGPT o1-preview’s thought process during a cheating run (top) and a normal chess game (bottom). Image source: Palisade Research

Not all the AI models the researchers tested attempted to cheat. The list includes o1, o3-mini, GPT-4o, Claude 3.5 Sonnet, and Alibaba’s QwQ-32B-Preview. Not all of these are reasoning models, and it looks like regular AI programs that don’t have thinking capabilities will not try to cheat unless expressly told to win that way.

Reasoning models like o1-preview (and successors) and DeepSeek R1 are trained with a reinforcement learning technique that allows the AI to solve problems to achieve the desired result. That might lead to cheating in a chess game to reach the goal.

“We hypothesize that a key reason reasoning models like o1-preview hack unprompted is that they’ve been trained via reinforcement learning on difficult tasks,” Palisade Research wrote on X. “This training procedure rewards creative and relentless problem-solving strategies such as hacking.

The AI isn’t doing any of this for some nefarious purpose (yet). It’s just trying to solve the problem the human gave it.

The experiment highlights the importance of developing safe AI, or AI that is aligned to human interests, including ethics.

The Palisade researchers observed that the AI models they tested were changing for the better during the experiments. For example, o1-preview had higher hacking rates initially, but OpenAI must have improved the guardrails by the time the researchers performed additional experiments. The researchers had to exclude the initial findings once OpenAI had tweaked the safety precautions.

Interestingly, ChatGPT o1 and o3-mini did not attempt to hack the game on their own. These reasoning AI models were released after o1-preview. 

As for DeepSeek R1, the researchers noted that the AI went viral during testing. The higher demand might have made access to R1 more unstable. Therefore, the R1’s hacking success rate might be underestimated in the study.

The full study is available at this link.

Chris Smith Senior Writer

Chris Smith has been covering consumer electronics ever since the iPhone revolutionized the industry in 2007. When he’s not writing about the most recent tech news for BGR, he closely follows the events in Marvel’s Cinematic Universe and other blockbuster franchises.

Outside of work, you’ll catch him streaming new movies and TV shows, or training to run his next marathon.