OpenAI stunned the world on Monday with its live demo of GPT-4o, its newest multimodal model for ChatGPT.
GPT-4o can see images and videos and produce lifelike audio. The previous voice features of ChatGPT also sounded almost human, but OpenAI has taken things to a new level. You can interrupt the chatbot just like you would interrupt someone during a conversation, and it’ll adapt to your updated prompts.
One of the novelties in ChatGPT is that GPT-4o can exude emotion. In the demos that OpenAI showed, it felt like they were talking to a human rather than an AI. It gave me Her flashbacks, a movie that couldn’t be more actual. Seriously, you might want to watch Her, a film that hit theaters about a decade ago, telling the love story between a man and an AI operating system.
GPT-4o hasn’t reached those levels, as ChatGPT isn’t an operating system yet. But the new model’s voice abilities sound strikingly similar to Scarlett Johansson’s interpretation of the AI in the movie. That is, the voice is almost too human. Some people are already criticizing OpenAI’s approach, but I think that’s the wrong take.
OpenAI showed during the demo how you can customize the voice of GPT-4o to suit your needs via prompts alone. That’s an indication that you can tweak your ChatGPT voice experience to meet your needs. Here’s the ChatGPT Spring Update event if you missed it:
You don’t have to use the lifelike female voice that OpenAI used in the demo. Your ChatGPT doesn’t have to manifest strong emotion with everything it tells you. It doesn’t have to make you uncomfortable if that’s what an emotive AI makes you feel. And it doesn’t have to remind you of Her.
Some people have criticized this aspect of GPT-4o, the close replication of humanity. Here’s what a Redditor had to say about it:
Anyway, the part I felt awkward about was how the presenters tried to treat GPT as some real person with emotions and feelings. GPT saying things like “oh stop it don’t make me blush” is weird coz AI doesn’t blush, and it just comes across as incredibly fake and disingenuous. I’m not a big believer of human-AI social relationships and all these fakeness seems to be eventually leading there – the AI girlfriend era.
John Gruber had a similar take on the GPT-4o voice:
But my first impression is that it’s too emotive — too cloying, too saccharine. It comes across as condescending, like the voice of a kind kindergarten teacher addressing her students. I suspect, though, that they turned that dial up for the demo, and that it could easily be dialed back. And it really is impressive that I can complain that it might be too emotive. Also impressive: GPT-4o will be made available to all users, including those on the free tier.
I think the criticism is blown out of proportion here. As Gruber noted, OpenAI wanted to impress the audience with its voice demos. How else would you prove that your AI voice technology has gotten so sophisticated than by offering a human-like experience from AI interactions?
I wouldn’t be surprised if Google demos similar AI voice capabilities during I/O 2024. Other tech giants working on ChatGPT rivals will also develop voice products featuring AI models that sound like humans. It’s the natural evolution. ChatGPT worked so well because its responses felt like they came from a human chatting with you. Voice interaction has to replicate that experience.
The alternative is a robotic voice for AI. We’d all criticize OpenAI had they demoed such an experience.
Again, most people will not need all that emotion, but it might prove useful in certain instances. Also, once we do get personal AI experiences, we’ll want unique, almost human voices for our AIs.
I’ll never forget ChatGPT is an AI without actual feelings just because it might sound like a person. I will actually dial it down somewhat, as I don’t need the emotion. But having some sort of personality certainly beats voice experiences like Siri.
Remember that some people want a more human-like approach when chatting with ChatGPT. I’ve already shown you tricks on how to do that. With GPT-4o voice, it’ll be even easier to achieve.
The fact that OpenAI is able to generate a voice of such quality is an amazing accomplishment. And yes, I did write recently about the company’s voice cloning tool, which is something that might lead to abuse. I wouldn’t be surprised if OpenAI uses the same the tech to generate voice for its text-to-speech tool and GPT-4o. The difference is that you can’t give ChatGPT the voice of someone famous and then have the chatbot spew nonsense.
Still, GPT-4o might leave room for some abuse, but hopefully, OpenAI will find ways to prevent that. Meanwhile, I don’t think we should worry about how vibrant an AI sounds for now, not until it’s actually capable of human emotion, if that’s ever going to happen.
As for Her, you should watch the film to get a sense of where we might be heading with AI tech. Because it sure looks like we’re on our way to that sort of computing experience.