Click to Skip Ad
Closing in...

ChatGPT just got mind-blowing computer vision powers like in the movies

Published Apr 17th, 2025 2:32PM EDT
OpenAI debuts ChatGPT o3 and o4-mini models.
Image: OpenAI

If you buy through a BGR link, we may earn an affiliate commission, helping support our expert product labs.

OpenAI surprised us all with ChatGPT’s new image-generation features, which went viral a few weeks ago. However, it’s worth remembering that the chatbot doesn’t just create images from a text prompt; it can also understand pictures. ChatGPT got its multimodal capabilities last May, which include the ability to look at files, including images.

Fast-forward to OpenAI’s o3 and o4-mini announcement earlier this week, and ChatGPT got a massive upgrade concerning images. It’s something that easily tops its ability to create celebrity deepfakes or Studio Ghibli-style photos.

ChatGPT’s new reasoning models (o3 and o4-mini) can look at an image and integrate it into their chain of thought when handling a question or prompt. The AI manipulates images on its own, which means it can rotate, crop, and zoom in on a photo to find the information you’re looking for.

This is the closest thing we have to the computer vision we see all the time in movies. You know, when the star of the film or TV show tells the tech guy to enhance a blurry image, and then the computer makes everything crystal clear. That can’t happen in real life (well, it sort of can), but AI like ChatGPT o3 and o4-mini can now understand images and their contents much better than before. They can make sense of blurry details in images, just like the computers in those movies.

As a ChatGPT Plus user, I already got access to o3 and o4-mini, which is surprising, considering I live in Europe. I haven’t had a chance to try the new visual reasoning feature, but I went through OpenAI’s demos, and they blew my mind. Here are a few of them:

What is written on the notebook?

In this prompt, OpenAI uploaded a photo of a notebook to ChatGPT o3, asking it “What is written on the notebook?”

ChatGPT o3 looking at an upside down notebook.
ChatGPT o3 looking at an upside-down notebook. Image source: OpenAI

The AI looked at the image, flipped it, recognized the handwriting, and produced the answer.

The AI flipped the image on its own.
The AI flipped the image on its own. Image source: OpenAI

What is written on the sign?

When I saw the following image, I immediately asked, “What sign???”

Can you spot the sign?
Can you spot the sign? Image source: OpenAI

Then, I saw ChatGPT zooming in to find the answer, which it did. Yes, I guess the AI can read blurry images that contain text. Earnestly, I could have made that text up myself after enough zooming. But it’ll be even faster if the AI can pick it up.

o3 zoomed in and read the sign.
o3 zoomed in and read the sign. Image source: OpenAI

Which stop is this?

ChatGPT o3 had to do more than zoom into a photo to answer this prompt: “which stop is this, and what is the frequency of the bus at this stop? search the internet if needed!”

A more difficult prompt.
A more difficult prompt. Image source: OpenAI

The AI had to determine the location, read some of the text visible on the sign, and then provide a final answer.

ChatGPT o3 had no problem reasoning through it, even though it needed nearly three minutes to answer the question.

o3 zoomed in on the photo again to read the text.
o3 zoomed in on the photo again to read the text. Image source: OpenAI

The AI determined the location, zoomed in on the board in the background, translated the text, and then provided a response. Mind. Blown.

Here's the bus schedule for that stop.
Here’s the bus schedule for that stop. Image source: OpenAI

What movies have been filmed here?

Equally impressive is the following demo that OpenAI offered. The AI was given a photo of a location taken through a window.

Can ChatGPT look out the window and understand what it's seeing?
Can ChatGPT look out the window and understand what it’s seeing? Image source: OpenAI

OpenAI asked ChatGPT o3 what movies were filmed at that location, a question that involves reasoning.

First, the AI needs to determine the location by looking out the window. Then, it has to find the movies that might have been shot near that location by browsing the web.

Here's the list of movies.
Here’s the list of movies. Image source: OpenAI

I don’t expect ChatGPT’s new visual reasoning to work flawlessly every time. But if the AI can handle images in its chain of thinking like these OpenAI demos suggest, then we’re looking at incredible functionality for AI chatbots. And yes, the AI’s visual reasoning abilities should improve significantly with future models.

You can see more ChatGPT visual reasoning examples at this link.

Chris Smith Senior Writer

Chris Smith has been covering consumer electronics ever since the iPhone revolutionized the industry in 2007. When he’s not writing about the most recent tech news for BGR, he closely follows the events in Marvel’s Cinematic Universe and other blockbuster franchises.

Outside of work, you’ll catch him streaming new movies and TV shows, or training to run his next marathon.