ChatGPT Will Support Voice And Picture Prompts For Free
I thought Google delivered a huge blow to ChatGPT last week when incorporating Bard in various apps. Gmail, YouTube, Google Maps, and Flights are just some of the apps that are getting Bard support. That's a big advantage over ChatGPT, but OpenAI isn't sitting idly by. A few days ago, OpenAI released a new Dall-E version that lets you use ChatGPT to generate AI images. And now, OpenAI announced that the free version of ChatGPT will work with voice and picture prompts in the very near future.
That is, you can talk with ChatGPT rather than having to type on iPhone and Android. Also, you'll be able to use images to get better answers. The best part about these updates is that you won't have to pay for ChatGPT Plus to get them. However, paying subscribers will be the first to try them.
OpenAI announced ChatGPT's ability to hear and speak in a blog post on Monday. The features will be available initially to Plus and Enterprise users, who will get them over the next two weeks. Then, "other groups of users, including developers," will get them soon after that. That means the free ChatGPT experience will also support voice and picture commands.
Using images in ChatGPT prompts
If using images in ChatGPT prompts sounds familiar, that's because we talked about it before. That's how multimodal generative AI models work. It's similar to how Google uses Google Lens with AI. As for ChatGPT image commands, OpenAI says image understanding is powered by multimodal GPT-3.5 and GPT-4.
The video example below shows a bike owner uploading a photo of their bike and asking a question. ChatGPT provides an answer, with the user then uploading additional images so the chatbot can better understand the problem.
The user even draws a circle around the bike component that represents the main topic of discussion so ChatGPT can adjust the response. Furthermore, the user uploads images showing the tools at their disposal so the bot can tell them which one to use to lower the seat.
ChatGPT can now see, hear, and speak. Rolling out over next two weeks, Plus users will be able to have voice conversations with ChatGPT (iOS & Android) and to include images in conversations (all platforms). https://t.co/uNZjgbR5Bm pic.twitter.com/paG0hMshXb
— OpenAI (@OpenAI) September 25, 2023
That's great functionality to add to ChatGPT, one that will work amazingly on smartphones. And OpenAI demos the image input example for the mobile version of ChatGPT. However, there will be some limitations to the things ChatGPT can say about humans who appear in the images:
We've also taken technical measures to significantly limit ChatGPT's ability to analyze and make direct statements about people since ChatGPT is not always accurate, and these systems should respect individuals' privacy.
Image prompts will also be available on computers, but it'll be a lot easier to use a smartphone to take additional photos relevant to a particular chat with ChatGPT.
Voice support for iPhone and Android
The voice support feature is only coming to the iPhone and Android ChatGPT apps. And it's something that is definitely needed for this type of application. Talking to the phone's AI apps via voice rather than typing everything is so much easier. You'll just have to enable the feature in the Settings section of the app once it's available on iPhone and Android.
You might want to check how the voice data is handled from a privacy point of view. OpenAI doesn't address this aspect in the announcement. Therefore, I would assume the current privacy policy applies to all chats with ChatGPT; that your prompts will help train the voice assistant unless you opt out. The privacy policy might be updated once this feature begins to roll out, however.
OpenAI offered the ChatGPT chat below, telling a bedtime story, as an example of what voice can do in a generative AI program.
Use your voice to engage in a back-and-forth conversation with ChatGPT. Speak with it on the go, request a bedtime story, or settle a dinner table debate.
Sound on 🔊 pic.twitter.com/3tuWzX0wtS
— OpenAI (@OpenAI) September 25, 2023
OpenAI says it needs just a few seconds of sample speech to create human-like audio from text. It's using a new text-to-speech model for that. But, to prevent abuse, OpenAI relies on voice actors for the voice of ChatGPT:
The new voice technology — capable of crafting realistic synthetic voices from just a few seconds of real speech — opens doors to many creative and accessibility-focused applications. However, these capabilities also present new risks, such as the potential for malicious actors to impersonate public figures or commit fraud.
Interestingly, OpenAI also says it's working with Spotify to test a new Voice Translation feature for podcasts that lets creators translate their content into other languages using their own voice.
