Click to Skip Ad
Closing in...

Claude-3.7 outperforms other AI in Super Mario Bros, but it’s still no gamer

Published Mar 7th, 2025 12:29PM EST
Claude-3.7 plays Super Mario Bros
Image: Hao AI Lab

If you buy through a BGR link, we may earn an affiliate commission, helping support our expert product labs.

Last week, BGR reported on Claude’s journey playing Pokemon Red. While thousands of players playing it at the same time was more efficient—since the AI is still stuck on Mt. Moon—researchers think the next AI breakthrough might be related to live games.

Led by Hao Zhang, an assistant professor at UC San Diego, the research team is developing custom frameworks to test the capabilities of the leading AI models at gaming.

While Claude has been kind of disastrous playing Pokemon Red (it seems it doesn’t have what it takes to become a Pokemon Master), it sucks a little bit less than Gemini-1.5 Pro and GPT-4o. Comparing Claude-3.7 and Claude-3.5, the newer AI is more responsive and seems to know a bit more about what needs to be done in Super Mario Bros. In addition to this classic Nintendo game, the researchers are also testing 2048 and Tetris, with more games coming soon.

Another test is with Roblox. A blog post explains: “We developed a live Roblox game, AI Space Escape, powered by state-of-the-art large language models (LLMs), offering a unique experience to reason with AI. Beyond entertainment, our game generates gaming data for evaluating AI reasoning abilities in real-world scenarios, extending beyond math and coding benchmarks. All gaming data, evaluation scripts, and code are publicly available for further research.”

We still have to wait for Claude and other AI improvements to see how these models can continue to evolve playing games. For the Pokemon Red experiment, the developer explained that what sets Claude apart is that it can see what’s happening, understand the game state, and make decisions “similar to how a human player would”—although I might disagree, as the AI is still suffering to pass one of the first “dungeons” of the game.

José Adorno Tech News Reporter

José is a Tech News Reporter at BGR. He has previously covered Apple and iPhone news for 9to5Mac, and was a producer and web editor for Latin America broadcaster TV Globo. He is based out of Brazil.