OpenAI stunned the world with a ChatGPT demo in mid-May, just a day before Google’s AI-centric I/O 2024 event. OpenAI unveiled the GPT-4o model, a multimodal chatbot that can process audio, video, text, and image prompts. More interesting was the new Voice Mode that OpenAI developed for GPT-4o.
ChatGPT now supports more natural conversations. GPT-4o allows users to stop the chatbot at any time during a voice chat to tweak the conversation slightly, just as they would during a discussion with another person. ChatGPT will not lose its train of thought and will continue to process the updated voice prompts from the user.
At the same time, the new Voice Mode upgrade seemed capable of showing emotions, and detecting how the human feels. That’s the Her-like personality that later got OpenAI in trouble with Scarlett Johansson.
All of that turned OpenAI’s ChatGPT demo into a massive success.
However, the rollout of this ChatGPT voice upgrade was delayed as OpenAI continued to tweak it. Fast-forward to late July and OpenAI is ready to make the new Voice Mode available to users as long as they’re subscribed to the ChatGPT Plus tier.
ChatGPT Plus subscribers will routinely get access to new ChatGPT features before they’re available to the Free tier. The new Voice Mode won’t be available to all Plus users simultaneously. OpenAI explained on X that it’s rolling the Voice Mode feature out to a small group of Plus users.
The company is conducting an alpha test with this limited release. ChatGPT Plus users will receive an email with instructions and a message in the mobile app if selected. It’s unclear how OpenAI will choose early users for Voice Mode.
I’m on the Plus plan but did not get an invite. That might have something to do with the fact that I’m in the EU, where new tech features launch later than elsewhere, thanks to the various laws policing tech in the region.
Still, OpenAI says that all Plus users will get access to Voice Mode this fall. Video and screen-sharing capabilities will be launched later.
As for what caused the launch delay, OpenAI says it’s been working to “reinforce the safety and quality of voice conversations” before it’s ready to bring “this frontier of technology to millions of people. OpenAI also set several protections in place to protect user privacy and enforce safety:
We tested GPT-4o’s voice capabilities with 100+ external red teamers across 45 languages. To protect people’s privacy, we’ve trained the model to only speak in the four preset voices, and we built systems to block outputs that differ from those voices. We’ve also implemented guardrails to block requests for violent or copyrighted content.
The company also said it’ll share “a detailed report on GPT-4o’s capabilities, limitations, and safety evaluations in early August,” including ChatGPT’s big Voice Mode upgrade.