OpenAI has a crazy plan to prevent AI from going rogue

OpenAI DevDay keynote: ChatGPT usage in 2023.

Image: YouTube

If you followed the Sam Altman drama at OpenAI a few weeks ago, you might have noticed an intriguing development concerning ChatGPT and other AI products at the company. OpenAI co-founder and board member Ilya Sutskever was seen as the “bad guy” who masterminded the Altman firing, at least initially.

Sutskever then switched sides abruptly, joining the overwhelming majority of OpenAI employees who demanded that the board rehire Altman.

The board then rehired Altman as CEO and changed the roster of the board. Sutskever was gone from the board, and Altman’s comments about the OpenAI co-founder made it seem like Sutskever’s days were numbered. The brilliant AI scientist leaving OpenAI seemed like a real possibility, and a dangerous one for the development of safe AI.

It turns out that concerns Ilya Sutskever might be leaving OpenAI might be unwarranted. Or they might be right on the money. Whatever the case, Sutskever has been working on a big multi-year project at OpenAI to develop superalignment in the past few months. That’s the technology that will prevent the smarter-than-human, post-AGI, superintelligent AI of the future from going rogue.

Ilya Sutskever and Jan Leike announced in July they’re leading the superalignment efforts at OpenAI. They’ll use some 20% of OpenAI’s current compute capacity over four years to ensure that superalignment is successful. Now, the first results are here, and they’re promising.

What is superalignment?

Sutskever might be one of the most important minds in the world when it comes to the development of ChatGPT, but he’s also been vocal about the dangers of misaligned AI. That is artificial intelligence that can go rogue and can potentially lead to catastrophic events for the human species.

ChatGPT isn’t that kind of AI. The next big milestone will be AGI, or artificial general intelligence. At that point, AI will be able to reason like humans. And it could self-develop, FOOM, and become superintelligent. That’s why the world needs superalignment for any kind of super AI it might develop.

The problem with aligning AI to serve our interests is that humans actually aren’t that smart. We can align the current AI models. AI developers do it via reinforcement learning that relies on giving the AI feedback on its responses. The AI then learns and tweaks its behavior to offer responses that humans like.

OpenAI chief scientist Ilya Sutskever speaks at Tel Aviv University in Tel Aviv, on June 5, 2023. Image source: Jack Guez/AFP via Getty Images

Once AGI and superintelligence is achieved, superalignment is needed. And for that, we’ll need dumber AI to align the smarter one. At least, that’s what OpenAI has been working on under Sutskever and Leike. The two penned a blog post in July on OpenAI titled Introducing superalignment.

A few months later, OpenAI announced the first promising results. And Sutskever was part of that. Rather than Sutskever leaving the company, he appears to be on a journey that will never be as exciting as ChatGPT innovations that the more commercial side of OpenAI releases.

However, the work of the OpenAI superalignment team might prove to be critical to the safe evolution of ChatGPT. It’s important to reiterate that Sutskever started the superalignment team in July. That was before the Altman drama. That’s why it’s all the more important that Sutskever appears to be continuing that journey.

OpenAI has set aside vast resources to work on superalignment. That 20% of the current computer capacity is a great deal. So is the commitment to try to fix the problem in the coming four years. The key aspect here is trying to do it. There’s no guarantee of success.

Using dumber AI to train and contain the smarter AI

So, if success isn’t guaranteed, how do Sutskever, Leike, and their team go about it? Humans won’t be able to align superintelligence, so they’ll need dumber AI to do the job for them. That’s what OpenAI proposes. They’ll build an “a roughly human-level automated alignment researcher,” which will then superalign superintelligence.

As MIT Technology Review, OpenAI already ran such an experiment, releasing the paper on the superalignment test. It used GPT-2 to train GPT-4 to perform similar tasks. “It’s as if a 12th grader were taught how to do a task by a third grader. The trick was to do it without GPT-4 taking too big a hit in performance,” the report explains. The results were mixed but show promise, per MIT:

The results were mixed. The team measured the gap in performance between GPT-4 trained on GPT-2’s best guesses and GPT-4 trained on correct answers. They found that GPT-4 trained by GPT-2 performed 20% to 70% better than GPT-2 on the language tasks but did less well on the chess puzzles.

Sutskever is one of the coauthors of the paper. Though Wired notes that Sutskever wasn’t available for actual comment on these developments. Reports last week did say that Sutskever’s future at OpenAI is uncertain.

In this photo illustration, the welcome screen for the OpenAI "ChatGPT" app is displayed on a laptop screen. — In this photo illustration, the welcome screen for the OpenAI “ChatGPT” app is displayed on a laptop screen. Image source: Leon Neal/Getty Images

Will it work?

The problem with superalignment work is that it’s all theoretical. Sutskever and Leike said in July they expect superintelligence to be here this decade. That’s why the four-year commitment is so important.

If OpenAI is successful, the superintelligent ChatGPT versions of coming years might help solve some of humanity’s biggest problems rather than eradicate our species. And other companies might utilize similar tech to superalign their own superintelligence down the road.

But superintelligence might always detect that someone or something is trying to align it. And protest. Or hide its true intentions. And if AI goes rogue, we might never know it happened.

Still, the work is going forward, and I, as a regular ChatGPT user, hope Ilya Sutskever is just as involved as before the Altman drama.

In a blog post on Thursday, OpenAI is inviting other AI researchers to contribute to its superalignment efforts, setting aside $10 million for funding and grants related to superalignment. Interestingly, OpenAI is partnering with Eric Schmidt, the former Google CEO, on the $10 million grants program.

Don’t Miss: Google’s fake Gemini AI features might become a reality on the Pixel 9

This article talks about:

OpenAI

Chris Smith Senior Writer

Chris Smith has been covering consumer electronics ever since the iPhone revolutionized the industry in 2007. When he’s not writing about the most recent tech news for BGR, he closely follows the events in Marvel’s Cinematic Universe and other blockbuster franchises.

Outside of work, you’ll catch him streaming new movies and TV shows, or training to run his next marathon.