Click to Skip Ad
Closing in...

Nvidia stunned the world with a ChatGPT rival that’s as good as GPT-4o

Published Oct 2nd, 2024 10:51AM EDT
OpenAI made some of ChatGPT's best features free.
Image: OpenAI

If you buy through a BGR link, we may earn an affiliate commission, helping support our expert product labs.

You can’t talk about generative AI software like ChatGPT without thinking of Nvidia, which is one of the big winners of the early days of the genAI revolution. But Nvidia is best known so far for providing the chips that companies like OpenAI need to process all of their complex generative AI functions.

Fast-forward to early October 2024, and Nvidia stunned the AI world by announcing NVLM 1.0, a family of large multimodal language models that can perform at least as well as ChatGPT’s GPT-4o model.

Before you get too excited about Nvidia’s potential consumer-facing NVLM product, you should know the company is choosing a different avenue to show its genAI strength. Rather than releasing a direct rival to ChatGPT, Claude, and Gemini, it’s making the model weights publicly available so others can use NVLM to develop their own AI apps and systems.

Nvidia released a paper to announce NVLM 1.0 and reveal it’s going to open-source the weights and training code:

We introduce NVLM 1.0, a family of frontier-class multimodal large language models (LLMs) that achieve state-of-the-art results on vision-language tasks, rivaling the leading proprietary models (e.g., GPT-4o) and open-access models (e.g., Llama 3-V 405B and InternVL 2). Remarkably, after multimodal training, NVLM 1.0 shows improved accuracy on text-only tasks over its LLM backbone. We are open-sourcing the model weights and training code in Megatron-Core for the community.

The 72 billion parameter NVLM-D-72B is Nvidia’s flagship LLM. The company says it “achieves performance on par with leading models across both vision-language and text-only tasks.”

The paper shows various chat examples that involve multimodal input. The humans in the chats use text and images in their prompts. The examples show that the AI is very good at identifying people, animals, and objects in these images and providing answers related to them.

An example of NVLM answering a prompt that includes text and an image.
An example of NVLM answering a prompt that includes text and an image. Image source: Nvidia

In the example above, the user asks NVLM to explain a meme, and the AI does it exceptionally well. Here’s Nvidia’s explanation for the AI’s abilities:

Our NVLM-D-1.0-72B demonstrates versatile capabilities in various multimodal tasks by jointly utilizing OCR, reasoning, localization, common sense, world knowledge, and coding ability. For instance, our model can understand the humor behind the “abstract vs. paper” meme in example (a) by performing OCR to recognize the text labels for each image and using reasoning to grasp why juxtaposing “the abstract” — labeled with a fierce-looking lynx — and “the paper” — labeled with a domestic cat — is humorous.

NVLM can also solve complex math problems, something we’ve seen with other genAI products, including OpenAI’s ChatGPT.

Also, Nvidia says NVLM-D-72B can improve performance on text-only tasks after multimodal training.

The benchmarks Nvidia offered indicate that NVLM can more than hold its own against GPT-4o, Claude 3.5 Sonnet, and Gemini 1.5 Pro. Nvidia’s now-open genAI language model can actually outpefrom the proprietary AI products from OpenAI, Anthrophic, and Google in certain tasks. The table below also shows that NVLM-D-72B is on par with open-access Llama AI platforms from Meta.

NVLM 1.0 benchmarks compared with open and closed AI rivals.
NVLM 1.0 benchmarks compared with open and closed AI rivals. Image source: Nvidia

As VentureBeat points out, Nvidia’s surprise reveal has stunned some AI researchers.

It’s not just the performance of NVLM, but Nvidia’s decision to make it available as an open-source project. The likes of OpenAI, Claude, and Google aren’t expected to do that anytime soon. Nvidia’s approach could benefit AI researchers and smaller firms, as they’d get access to a seemingly powerful multimodal LLM without having to pay for it.

Regular ChatGPT users like you and I will have to wait and see what comes out of Nvidia’s announcement. That is, we’ll have to wait for commercial products that utilize NVLM. The sooner that happens, the better for the industry, as it might impact the various business decisions of OpenAI, Anthropic, Google, and others.

Chris Smith Senior Writer

Chris Smith has been covering consumer electronics ever since the iPhone revolutionized the industry in 2008. When he’s not writing about the most recent tech news for BGR, he brings his entertainment expertise to Marvel’s Cinematic Universe and other blockbuster franchises.

Outside of work, you’ll catch him streaming almost every new movie and TV show release as soon as it's available.