Click to Skip Ad
Closing in...

MIT developed a way to help AI chatbots perform better over long conversations

Published Feb 13th, 2024 4:05PM EST
Open AI's ChatGPT start page.
Image: Jonathan S. Geller

If you buy through a BGR link, we may earn an affiliate commission, helping support our expert product labs.

A group of researchers working with MIT have come up with a solution to a baffling problem with ChatGPT and other large language models. As these models talk to users, they gradually start to collapse, eventually leading the bot’s performance to drop rapidly. With this solution, though, that could be a thing of the past.

The issue, the researchers note, stems from the key-value cache, which is essentially the bot’s conversation memory. When this cache becomes full and needs to hold more, it often lets the first pieces of data get bumped out to make room.

This move can actually cause ChatGPT and other LLM’s performance to drop. As such, ensuring that the first few points of data remain in the memory is important to keeping the LLM moving forward without any issues, even if the conversation goes on for a long time.

OpenAI DevDay keynote: ChatGPT usage this year.
With ChatGPT usage still skyrocketing, adding a feature like StreamingLLM could make it perform even better. Image source: YouTube

The researchers call the new method StreamingLLM, which allows the AI to remain efficient even when a conversation extends to more than four million words. The researchers tested it against another method, which helps avoid crashing and performance issues by constantly recomputing part of the past conversations.

StreamingLLM actually performed more than 22 times faster, which would allow for performance in ChatGPT and other LLMs to remain consistent even during longer conversations, allowing you to get better results from ChatGPT and more. The authors of the study say that StreamingLLM would allow for the chatbot to have continual conversations throughout the entire day without requiring rebooting.

Understanding the relationship that the cache plays to how the chatbot responds to human inputs is important. It helped highlight the issue for which the researchers needed to provide a resolution. They’ve published their findings in a new paper that appears on the arXiv preprint server.

Currently, StreamingLLM has been incorporated into Nvidia’s TensorRT-LLM, but it could make appearances in other chatbots, like ChatGPT, Claude, and more, if those companies see the same value that Nvidia did.

Josh Hawkins has been writing for over a decade, covering science, gaming, and tech culture. He also is a top-rated product reviewer with experience in extensively researched product comparisons, headphones, and gaming devices.

Whenever he isn’t busy writing about tech or gadgets, he can usually be found enjoying a new world in a video game, or tinkering with something on his computer.