OpenAI is forming a new team led by Ilya Sutskever, its chief scientist and one of the company’s co-founders, to develop ways to steer and control “superintelligent” AI systems.
In a blog post published today, Sutskever and Jan Leike, a lead on the alignment team at OpenAI, predict that AI with intelligence exceeding that of humans could arrive within the decade. This AI — assuming it does, indeed, arrive eventually — won’t necessarily be benevolent, necessitating research into ways to control and restrict it, Sutskever and Leike say.
“Currently, we don’t have a solution for steering or controlling a potentially superintelligent AI, and preventing it from going rogue,” they write. “Our current techniques for aligning AI, such as reinforcement learning from human feedback, rely on humans’ ability to supervise AI. But humans won’t be able to reliably supervise AI systems much smarter than us.”
To move the needle forward in the area of “superintelligence alignment,” OpenAI is creating a new Superalignment team, led by both Sutskever and Leike, which will have access to 20% of the compute the company has secured to date. Joined by scientists and engineers from OpenAI’s previous alignment division as well as researchers from other orgs across the company, the team will aim to solve the core technical challenges of controlling superintelligent AI over the next four years.
How? By building what Sutskever and Leike describe as a “human-level automated alignment researcher.” The high-level goal is to train AI systems using human feedback, train AI to assist in evaluating other AI systems and ultimately to build AI that can do alignment research. (Here, “alignment research” refers to ensuring AI systems achieve desired outcomes or don’t go off the rails.)
It’s OpenAI’s hypothesis that AI can make faster and better alignment research progress than humans can.
“As we make progress on this, our AI systems can take over more and more of our alignment work and ultimately conceive, implement, study and develop better alignment techniques than we have now,” Leike and colleagues John Schulman and Jeffrey Wu postulated in a previous blog post. “They will work together with humans to ensure that their own successors are more aligned with humans. . . . Human researchers will focus more and more of their effort on reviewing alignment research done by AI systems instead of generating this research by themselves.”
Of course, no method is foolproof — and Leike, Schulman and Wu acknowledge the many limitations of OpenAI in their post. Using AI for evaluation has the potential to scale up inconsistencies, biases or vulnerabilities in that AI, they say. And it might turn out that the hardest parts of the alignment problem might not be related to engineering at all.
But Sutskever and Leike think it’s worth a go.
“Superintelligence alignment is fundamentally a machine learning problem, and we think great machine learning experts — even if they’re not already working on alignment — will be critical to solving it,” they write. “We plan to share the fruits of this effort broadly and view contributing to alignment and safety of non-OpenAI models as an important part of our work.”