Google on Wednesday unveiled Gemini 2.0, its next-gen genAI model and its biggest AI update to date. The announcement dropped right in the middle of OpenAI’s big ChatGPT Christmas-themed event, showing how incredibly competitive the AI race is.
Gemini 2.0 was released about a year after Google’s Gemini 1.0 announcement. Since then, Google has also unveiled Gemini 1.5, which is a big upgrade over the original release. Gemini 1.5 brought multimodal support to Google’s main genAI update and a much larger context. It was Google’s answer to the ChatGPT GPT-4o update.
The new Gemini 2.0 features
Google calls Gemini 2.0 its “most capable yet,” in a blog post that should bring Google closer to building the “vision of a universal assistant.”
Does that make Gemini 2.0 the ChatGPT o1 alternative you might have expected? Well, Gemini 2.0 will feature a Deep Research feature that will offer “advanced reasoning and long context capabilities” that will turn it into a “virtual research assistant.”
Advanced reasoning is coming to AI Overviews for better or worse. Google says the infamous AI Google Search feature reaches over one billion people. Gemini 2.0 will let AI Overviews answer “more complex topics and multi-step questions, including advanced math equations, multimodal queries, and coding.”
That’s not exactly the Google Search AI upgrade I’d be looking for. Thankfully, I don’t use Google Search, so I won’t have to deal with smarter AI Overviews. If you love them, you’ll get these reasoning abilities as soon as this week, though Google will run a limited test initially.
Reasoning isn’t all Gemini 2.0 can do. Google mentions several important upgrades, including “advances in multimodality — like native image and audio output — and native tool use.” As for that universal assistant, Google says Gemini 2.0 will allow the development of AI agents, which will ultimately lead to that AI assistant.
There’s also Gemini 2.0 Flash, the evolution of Gemini 1.5 Flash. Gemini 2.0 Flash is faster in benchmarks than its predecessor, delivering twice the speed.
The model supports multimodal inputs (images, video, and audio), as expected. The novelty concerns the multimodal outputs. That is, it can generate images “mixed with text and steerable text-to-speech (TTS) multilingual audio.” That’s not something ChatGPT can do, whether it’s GPT-4o or o1.
Gemini 2.0 Flash is available right now as an experimental model to developers. You’ll find it in Google AI Studio and Vertex AI. General availability will follow in January.
A Multimodal Live API tool will let developers use “real-time audio, video-streaming input and the ability to use multiple, combined tools.
Gemini 2.0 Flash is also coming to the desktop and mobile versions of Gemini. The model will be available in a new dropdown menu that lets you select your desired AI Model for your Gemini assistant experience. Gemini 2.0 will expand to more products earlier next year.
Say hello to all the new agents
As for the agentic feature Gemini 2.0 will make possible, Google mentions three projects, Astra, Mariner, and Jules (seen in the clips above and below).
Project Astra was first demoed at I/O 2024, and its first features are available in the Gemini app on mobile devices. It’s an AI assistant that supports multimodality and voice abilities similar to ChatGPT’s Advanced Voice Mode.
With Gemini 2.0, Project Astra supports Google Search, Lens, and Maps. It also remembers more information per session, getting a 10-minute window. Project Astra will remember more conversations so it can offer personalized answers.
Project Astra’s Gemini 2.0 improvements will be available in the Gemini app and mysterious “other form factors like glasses.” Trusted testers will get access to Project Astra on prototype glasses.
Project Mariner explores the “future of human-agent interaction,” Google says. What that means is that Project Mariner will let you use AI to take actions on your behalf in browsers like Google Chrome via an extension.
Gemini 2.0 will be able to type, scroll, and click in the active tab of your browser. It’ll ask users for confirmation when performing specific actions, like buying something on your behalf.
Project Mariner will be available to trusted testers right now. OpenAI has yet to release a ChatGPT tool like Project Mariner. While you wait for access, this video will give you an idea of how useful Project Mariner might be for one’s productivity.
Jules is a coding agent that can help developers. The AI agent can integrate directly into a GitHub workflow. “It can tackle an issue, develop, a plan and execute it, all under a developer’s direction and supervision.” OpenAI can let ChatGPT grab code from certain Mac apps, but the feature is more limited.
Beyond Astra, Mariner, and Jules, Google is also working on Gemini 2.0 agents that can help gamers with gameplay based on what’s happening on the screen.