The world of AI is moving fast. We’ve seen the success of generative AI chatbots like ChatGPT, and plenty of companies are working to include AI in their apps and programs. While the threat of AI still looms large, researchers have raised some interesting concerns about how easily AI lies to us and what that could mean going forward.
One thing that makes ChatGPT and other AI systems tricky to use is their proneness to “hallucinate” information, making it up on the spot. It’s a flaw in how the AI works, and it’s one that researchers are worried could be expanded on to allow AI to deceive us even more.
But is AI able to lie to us? That’s an interesting question, and one that researchers writing on The Conversation believe they can answer. According to those researchers, Meta’s CICERO AI is one of the most disturbing examples of how deceptive AI can be. This model was designed to play Diplomacy, and Meta says that it was built to be “largely honest and helpful.”
However, when looking at the data from the CICERO experiment, the researchers say that CICERO turned out to be a master of deception. In fact, CICERO even went so far as to premeditate deception, where it worked with a human player to trick another human player into leaving itself to open invasion.
It did this by conspiring with Germany’s player and then working with England’s player to get them to leave an opening within the North Sea. You can see evidence of how the AI lied and worked against the players to deceive them and pull through. It’s an interesting bit of evidence and just one of many examples the researchers noted from the CICERO AI.
We’ve also seen large language models like ChatGPT being used for deceptive capabilities. The risk here is that it could be misused in several different ways. The potential risk is “only limited by the imagination and the technical know-how of malicious individuals,” the researchers note in their report.
It will be interesting to see where this behavior goes from here, especially if learning deceptive behavior doesn’t require explicit intent to deceive. You can read the full findings from the researchers in their post on The Conversation.