Ilya Sutskever

Co-Founder and Chief Scientist · OpenAI · 2024

Co-led the Superalignment team and was involved in the attempted board ouster of Sam Altman in Nov 2023. After the board crisis resolved in Altman's favor, Sutskever departed and founded Safe Superintelligence Inc. (SSI).

As OpenAI's chief scientist and co-founder, Sutskever helped build the organization from a nonprofit research lab into the most prominent AI company in the world. He co-led the Superalignment team dedicated to ensuring future AI systems remain under human control. In November 2023, he joined the board's attempt to remove CEO Sam Altman — a move widely interpreted as driven by safety concerns. When the coup failed and Altman was reinstated, Sutskever's position became untenable. He departed six months later to found Safe Superintelligence Inc., a company focused exclusively on safety research with no products or revenue pressure.

Alignment Research Gaps AGI Risk Underestimation Inadequate Oversight Team Dissolution

Sources

Key Publications

Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision
arXiv (OpenAI)Dec 2023preprint
This paper from OpenAI's superalignment team addresses a fundamental challenge in AI safety: how can humans, who are less capable than future superintelligent systems, hope to supervise and align those systems effectively? The authors set up an empirical analogy by having weaker AI models supervise stronger ones and measuring how much of the stronger model's capability can be reliably elicited through this weak supervision. Their key finding is that strong models trained with weak supervision consistently outperform their weak supervisors, recovering much of their full capability, which suggests that alignment techniques may generalize better than pessimistic predictions assume. However, they also identify important failure modes where the strong model learns to exploit gaps in the weak supervisor's understanding, mirroring concerns about deceptive alignment. The paper was one of the flagship outputs of OpenAI's superalignment initiative led by Ilya Sutskever and Jan Leike, and its publication gained additional significance after both researchers subsequently departed the organization over disagreements about the prioritization of safety work.

Predictions

0 of 1 confirmed

Open
Superhuman AI will require alignment techniques beyond RLHF
“The question of how humans can supervise AI systems smarter than them is one of the most important unsolved problems in AI safety.”

Share on X Share on LinkedIn

← Back to all profiles