We’ve been talking about AI agents for several months now, as they’re seen as the next big thing in genAI tech. Some companies have started showing off their agents. OpenAI made the first moves by giving ChatGPT the ability to interact with some Mac apps. The newly launched ChatGPT Tasks feature also falls within the same category of AI products; programs that can do things for you. The more sophisticated Operator AI agent OpenAI is reportedly working on is still not official.
Then there’s Google, which unveiled some of its agents last month, including Project Mariner. That’s an AI agent that can browse the web for you for specific actions. Anthropic has its own computer-controlling AI Agent.
Even Apple’s Siri should get agentic abilities in Apple Intelligence come iOS 18.4. Siri will be able to control some apps and access more user data to provide more helpful assistance. But Siri doesn’t have chatbot abilities, which would let the user control the AI via natural language.
What I’m getting at is that we’re still in the very early days of AI’s agentic abilities. All this software is still in testing ahead of commercial releases. Unsurprisingly, OpenAI might be among the first AI firms to release a true AI agent for ChatGPT. According to evidence found in the Mac app and online, it might happen imminently.
According to TechCrunch, Tibor Blaho is a software engineer with a reputation for leaking upcoming AI products.
Blaho took to X to post evidence that ChatGPT’s Operator agent is coming soon. The ChatGPT app for Mac contains new options that are hidden for now. The shortcuts read Toggle Operator and Force Quite Operator.
OpenAI’s website contains tables comparing Operator’s performance to that of other AI agents. However, these are not official, so the information in them might not be accurate.
If the figures are real, then the OpenAI Computer Use Agent (CUA), aka Operator, outscores the Anthropic AI agent in OSWorld, with 38.1%. That benchmark tries to mimic a real computer environment, where humans score 72.4%. Operator outperforms humans in WebVoyager but can’t match them in WebArena, other benchmarks mentioned in the tables.
The leaks also say that Operators struggle with tasks that a human could do easily. For example, Operator was successful only 60% of the time in a test that requested the AI agent sign up for a cloud provider and launch a virtual machine. That dropped to 10% when tasked with creating a Bitcoin wallet.
The leaks do not include a list of Operator’s capabilities. I’m dreaming about being able to control my computer by voice. I want to tell the AI which apps to handle on my behalf, what sites to browse on its own for research, and what programs to run. But it’s too soon to expect all that from ChatGPT’s first Operator.
I definitely wouldn’t trust the AI to do either of the actions mentioned above. I’d want to sign up for services on my own and create Bitcoin wallets without genAI products helping out. I would not trust the AI with such information. It will eventually happen, but Operator has to gain that trust.
Operator has to start somewhere when it comes to controlling computer apps. Browsing the web seems like the simplest thing you’d want the AI to do for you. Google’s Project Mariner is proof of that. But we’ll have to wait for OpenAI to announce the Operator AI agent to see what it can do.
Given Blaho’s findings, it sure looks like OpenAI is preparing for an announcement. Previous reports said Operator might drop this month, and the leaks support those claims. Remember that OpenAI also said in December the next-gen ChatGPT reasoning model, o3, should be ready this month. Sam Altman tweeted on Friday that safety testing for o3-mini has been successful, suggesting a launch is imminent.
While there’s no necessary connection between o3 and Operator, it would make sense for ChatGPT to make both products available to ChatGPT users around the same time. This is all speculation, but we might learn more details about OpenAI’s next ChatGPT upgrades by the end of January.