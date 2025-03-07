Click to Skip Ad
Claude-3.7 outperforms other AI in Super Mario Bros, but it’s still no gamer

Mar 7th, 2025
Last week, BGR reported on Claude’s journey playing Pokemon Red. While thousands of players playing it at the same time was more efficient—since the AI is still stuck on Mt. Moon—researchers think the next AI breakthrough might be related to live games.

Led by Hao Zhang, an assistant professor at UC San Diego, the research team is developing custom frameworks to test the capabilities of the leading AI models at gaming.

While Claude has been kind of disastrous playing Pokemon Red (it seems it doesn’t have what it takes to become a Pokemon Master), it sucks a little bit less than Gemini-1.5 Pro and GPT-4o. Comparing Claude-3.7 and Claude-3.5, the newer AI is more responsive and seems to know a bit more about what needs to be done in Super Mario Bros. In addition to this classic Nintendo game, the researchers are also testing 2048 and Tetris, with more games coming soon.

Another test is with Roblox. A blog post explains: “We developed a live Roblox game, AI Space Escape, powered by state-of-the-art large language models (LLMs), offering a unique experience to reason with AI. Beyond entertainment, our game generates gaming data for evaluating AI reasoning abilities in real-world scenarios, extending beyond math and coding benchmarks. All gaming data, evaluation scripts, and code are publicly available for further research.”

We still have to wait for Claude and other AI improvements to see how these models can continue to evolve playing games. For the Pokemon Red experiment, the developer explained that what sets Claude apart is that it can see what’s happening, understand the game state, and make decisions “similar to how a human player would”—although I might disagree, as the AI is still suffering to pass one of the first “dungeons” of the game.

