Click to Skip Ad
Closing in...

Gemini 2.5 Flash is Google’s cheapest thinking AI: What you need to know

Published Apr 18th, 2025 11:29AM EDT
Gemini 2.5 is Google's new AI model.
Image: Google

If you buy through a BGR link, we may earn an affiliate commission, helping support our expert product labs.

After launching the Gemini 2.5 Pro model a few weeks ago, Google has a new AI product ready for testing. Gemini 2.5 Flash is supposed to bring more affordable AI reasoning to tasks that require more thinking.

Google lets users specify a budget and turn reasoning on and off depending on the task. Not everything you throw at the AI will require reasoning, so you don’t have to overspend by having the AI “think” when it doesn’t need to.

However, Gemini 2.5 Flash isn’t an AI product targeting regular users. Instead, Gemini 2.5 Flash is a new tool that developers and enterprise customers can use for work. Gemini 2.5 Flash is available in preview via the Gemini API in Google AI Studio and Vertex AI.

Google says Gemini 2.5 Flash is quite formidable. The AI is Google’s lowest latency and most cost-efficient thinking model. That means it’s faster and cheaper than other models.

Gemini 2.5 Flash delivers a “major upgrade in reasoning abilities,” Google said in a blog post. The new AI is Google’s “first fully hybrid reasoning model,” which is how Google describes AI models where developers can turn reasoning on or off.

Interestingly, developers can set up thinking budgets so the AI can perform thinking tasks when they’re required. However, the AI will not consume the entire budget during a single reasoning task if that task doesn’t need it. The model is trained to know how long to think for prompts, so it’ll decide beforehand how much reasoning is required based on the perceived complexity.

Google offers a few prompt examples that explain how much reasoning Gemini 2.5 Flash will perform. For example, asking it to translate a word into a different language requires little reasoning. The same goes for answering questions like “How many provinces does Canada have?”

But more complex math and physics problems will require medium to high reasoning. The AI will spend more time on a prompt, and you’ll pay more money to get your answers.

Developers can set a thinking budget from 0 to 24576 tokens in the API or use a slider in Google AI Studio and Vertex AI.

As for the cost, Google says Gemini 2.5 Flash costs $0.15 per million tokens (input) and $0.60 per million tokens (output). If reasoning is involved for the output, the price goes up sixfold, up to $3.50 per million tokens. These costs make Gemini 2.5 Flash incredibly competitive, as seen in the table at the end of this post.

With thinking turned off, the Gemini 2.5 Flash will be at least as fast as the Gemini 2.0 Flash model.

The speed and competitive pricing for reasoning tasks aren’t Gemini 2.5 Flash’s only advantages. The new model also does very well in benchmarks. According to Google, Gemini 2.5 Flash is second only to Gemini 2.5 Pro in Hard Prompts in LMArena.

In Humanity’s Last Exam, Gemini 2.5 Flash outscored all recent models except ChatGPT o4-mini, which was launched earlier this week. The image below shows more benchmark results.

Gemini 2.5 Flash price and benchmarks compared to other high-end AI models.
Gemini 2.5 Flash price and benchmarks compared to other high-end AI models. Image source: Google
Chris Smith Senior Writer

Chris Smith has been covering consumer electronics ever since the iPhone revolutionized the industry in 2007. When he’s not writing about the most recent tech news for BGR, he closely follows the events in Marvel’s Cinematic Universe and other blockbuster franchises.

Outside of work, you’ll catch him streaming new movies and TV shows, or training to run his next marathon.