OpenAI Voice Engine is the most dangerous AI tool yet

Image: Stanislav Kogiku/SOPA Images/LightRocket via Getty Images

When Sam Altman was asked about the GPT-5 upgrade for ChatGPT, the CEO said OpenAI had other products to release before then. The text-to-video Sora service might be one of them, a product that OpenAI will release publicly this year.

We already knew about Sora when Altman addressed those GPT-5 questions. What we weren’t familiar with was the new Voice Engine AI tool that OpenAI announced. It’s an AI program that lets you clone a voice with a sample that’s just 15 seconds long.

That’s an incredible achievement with seemingly limitless utility. For example, people losing their voice might use it to talk to others in their regular voice. Movie studios could rely on Voice Engine to dub films and TV shows using the actor’s actual voice.

While all that sounds great, Voice Engine might also do a lot of harm once it’s released publicly, and I see no way OpenAI can prevent abuse.

Let’s talk about abuse first

OpenAI detailed the Voice Engine in a blog post, explaining that it was first developed in late 2022. The AI model powers the text-to-speech APIs and two ChatGPT voice-based features: ChatGPT Voice and Read Aloud. The same tech now lets you clone your voice with very short audio samples.

But, as you might have guessed, this opens the door to obvious abuses. Some people might clone the voices of politicians and celebrities to manipulate public opinion. The other idea that comes to mind is someone using a voice recording to clone your voice and then perform social engineering attacks on coworkers or family members. Some banks even use voice to authenticate customers when they call.

I’m probably not the only one thinking about the dangers of Voice Engine. And indeed, OpenAI mentions user safety from the start:

We are taking a cautious and informed approach to a broader release due to the potential for synthetic voice misuse. We hope to start a dialogue on the responsible deployment of synthetic voices, and how society can adapt to these new capabilities. Based on these conversations and the results of these small-scale tests, we will make a more informed decision about whether and how to deploy this technology at scale.

Later in the blog post, OpenAI explains the way it’s building Voice Engine safely. The company says all the testers have agreed to the usage policies, which prohibit the impersonation of others without consent or legal rights. Consent from the original speaker is also needed to generate voices. Moreover, the testers will disclose the generated voices are made with AI, and the audio clips might contain a watermark.

Good luck enforcing any of that.

OpenAI also notes that it’s engaging with US and international partners “from across government, media, entertainment, education, civil society and beyond to ensure we are incorporating their feedback as we build.”

Looking ahead, OpenAI also has a security suggestion that any company using voice-based authentication should consider. The company encourages the “phasing out voice based authentication as a security measure for accessing bank accounts and other sensitive information.”

ChatGPT photo illustration — In this photo illustration, the ChatGPT (OpenAI) logo is displayed on a smartphone screen. Image source: Rafael Henrique/SOPA Images/LightRocket via Getty Images

Who is testing Voice Engine?

All this might signal that Voice Engine will take a while to arrive, if it ever gets released publicly to begin with. But OpenAI is testing it with various partners.

For example, Age of Learning is an education technology company that used Voice Engine to generate pre-scripted voice-over content for kids who can’t read.

HeyGen is an AI visual storytelling platform that uses Voice Engine to translate the voice of an actor/speaker in a different language while maintaining their accent across translations.

Dimagi is developing tools for community health workers who might be working in specific regions of the world to provide support for locals. Combining Voice Engine with GPT-4 allows these workers to offer proper information, like breastfeeding information, in their own voice.

Livox is another example that OpenAI provides in the blog post. The company offers help to non-verbal people so they can communicate in a natural-sounding voice.

Similarly, Lifespan is a health non-profit that uses Voice Engine to help patients who suffer from degenerative speech conditions recover their voice.

These are remarkable ideas, and Voice Engine certainly has plenty of great above-board applications. I’m sure Google, Microsoft, Apple, and other companies working on AI models will come up with similar technology. The only problem is preventing abuse, which is probably impossible.

On that note, there’s no release date for Voice Engine yet, but you can listen to samples from OpenAI’s partners in the blog post at this link.

Don’t Miss: How to make ChatGPT spoken text sound shockingly realistic

This article talks about:

ChatGPT OpenAI

Chris Smith Senior Writer

Chris Smith has been covering consumer electronics ever since the iPhone revolutionized the industry in 2008. When he’s not writing about the most recent tech news for BGR, he brings his entertainment expertise to Marvel’s Cinematic Universe and other blockbuster franchises.

Outside of work, you’ll catch him streaming almost every new movie and TV show release as soon as it's available.