I thought Google delivered a huge blow to ChatGPT last week when incorporating Bard in various apps. Gmail, YouTube, Google Maps, and Flights are just some of the apps that are getting Bard support. That’s a big advantage over ChatGPT, but OpenAI isn’t sitting idly by. A few days ago, OpenAI released a new Dall-E version that lets you use ChatGPT to generate AI images. And now, OpenAI announced that the free version of ChatGPT will work with voice and picture prompts in the very near future.
That is, you can talk with ChatGPT rather than having to type on iPhone and Android. Also, you’ll be able to use images to get better answers. The best part about these updates is that you won’t have to pay for ChatGPT Plus to get them. However, paying subscribers will be the first to try them.
OpenAI announced ChatGPT’s ability to hear and speak in a blog post on Monday. The features will be available initially to Plus and Enterprise users, who will get them over the next two weeks. Then, “other groups of users, including developers,” will get them soon after that. That means the free ChatGPT experience will also support voice and picture commands.
Using images in ChatGPT prompts
If using images in ChatGPT prompts sounds familiar, that’s because we talked about it before. That’s how multimodal generative AI models work. It’s similar to how Google uses Google Lens with AI. As for ChatGPT image commands, OpenAI says image understanding is powered by multimodal GPT-3.5 and GPT-4.
The video example below shows a bike owner uploading a photo of their bike and asking a question. ChatGPT provides an answer, with the user then uploading additional images so the chatbot can better understand the problem.
The user even draws a circle around the bike component that represents the main topic of discussion so ChatGPT can adjust the response. Furthermore, the user uploads images showing the tools at their disposal so the bot can tell them which one to use to lower the seat.
That’s great functionality to add to ChatGPT, one that will work amazingly on smartphones. And OpenAI demos the image input example for the mobile version of ChatGPT. However, there will be some limitations to the things ChatGPT can say about humans who appear in the images:
We’ve also taken technical measures to significantly limit ChatGPT’s ability to analyze and make direct statements about people since ChatGPT is not always accurate, and these systems should respect individuals’ privacy.
Image prompts will also be available on computers, but it’ll be a lot easier to use a smartphone to take additional photos relevant to a particular chat with ChatGPT.
Voice support for iPhone and Android
The voice support feature is only coming to the iPhone and Android ChatGPT apps. And it’s something that is definitely needed for this type of application. Talking to the phone’s AI apps via voice rather than typing everything is so much easier. You’ll just have to enable the feature in the Settings section of the app once it’s available on iPhone and Android.
OpenAI offered the ChatGPT chat below, telling a bedtime story, as an example of what voice can do in a generative AI program.
OpenAI says it needs just a few seconds of sample speech to create human-like audio from text. It’s using a new text-to-speech model for that. But, to prevent abuse, OpenAI relies on voice actors for the voice of ChatGPT:
The new voice technology — capable of crafting realistic synthetic voices from just a few seconds of real speech — opens doors to many creative and accessibility-focused applications. However, these capabilities also present new risks, such as the potential for malicious actors to impersonate public figures or commit fraud.
Interestingly, OpenAI also says it’s working with Spotify to test a new Voice Translation feature for podcasts that lets creators translate their content into other languages using their own voice.