Microsoft announced a mysterious AI event for March 16th, and it looks like we’re getting a big ChatGPT upgrade this week in the form of GPT-4, which comes with multimodal support. That might mean nothing to most people, given that ChatGPT stormed the tech landscape just three months ago, and we’re still learning what it can do and how it can disrupt tech as we know it.
A multimodal ChatGPT chatbot is a massive upgrade for AI that already provides human-like responses to your queries. Currently, ChatGPT only supports text input or one mode of interaction. GPT-4 will support text, audio, video, and images as input. That’s what makes it multimodal, a feature that could significantly increase the AI’s capabilities.
Microsoft USA did not reveal any details about ChatGPT’s GPT-4 upgrade last week, only teasing the March 16th event. But Microsoft Germany went a step further, essentially soft-launching GPT-4. The company hosted an event last week in Germany, where it detailed the GPT-4 upgrade, per Heise.de.
It’s unclear whether GPT-4 will be a built-in upgrade of ChatGPT or whether it’ll be exclusive to Microsoft’s Bing search engine that already supports ChatGPT. However, Microsoft Germany confirmed that GPT-4 is coming this week and that it’ll be multimodal.
“We will introduce GPT-4 next week, there we will have multimodal models that will offer completely different possibilities – for example videos,” Microsoft CTO Andreas Braun said.
Braun called the underlying technology, AI understanding natural language, a “game changer.” And he revealed that ChaGPT will work in all languages, including multi-language support. You might want to ask it something in German and then get an answer in Italian.
Furthermore, Holger Kenn, another Microsoft Germany exec, explained that a multimodal ChatGPT bot can translate text into images, music, and video if asked.
How will ChatGPT GPT -4’s multimodal tech help users?
While many details about GPT-4 are still unclear, users will presumably be able to use various input types to get the answers they need. Going beyond text means the AI can look at YouTube clips or listen to audio recordings and then provide answers to questions.
Microsoft offered an example of how the multimodality of ChatGPT could help businesses. The AI could automatically summarize support calls with text after listening to the recordings. This would save 500 work hours a day for a large Microsoft customer in the Netherlands, which receives 30,000 calls a day that need to be summarized. Setting up ChatGPT for such a task would only take a couple of hours.
Still, Microsoft warns that ChatGPT will not always be reliable, even after the GPT-4 upgrade. Microsoft is working on confidence metrics to improve the chatbot’s reliability.
It’s unclear, however, how users will get to test GPT-4 and whether OpenAI will just make it available within ChatGPT later this week. Microsoft quietly unveiled Kosmos-1 in early March, a multimodal AI supporting image input. And it’s Microsoft that’s holding an AI-centric event on Thursday.
Then again, Microsoft might be one of the big investors in OpenAI tech, but OpenAI will continue to upgrade its chatbot. And that means making GPT-4 available to the masses.