Ilya Sutskever: From the Age of Scaling to the Age of Research

Presenter: Ilya Sutskever

Host Institute: Dwarkesh Podcast

Host: Dwarkesh Patel

This post distills a conversation between Ilya Sutskever (co-founder of Safe Superintelligence Inc.) and Dwarkesh Patel, recorded in November 2025. The central theme: the AI industry is transitioning from the age of scaling — where progress came from adding more compute and data — back to the age of research, where genuine scientific breakthroughs are needed to close the generalization gap between models and humans. Full transcript available at dwarkesh.com.

The Generalization Gap

Ilya’s central thesis is a striking claim: current models generalize dramatically worse than people. Despite beating humans on many benchmarks, models exhibit a brittleness that humans do not. His example: a model can alternate between introducing and fixing bugs in code, seemingly unable to learn from the correction it just made. A human programmer, upon realizing they introduced a bug, internalizes something about why it happened and avoids the pattern. The model just follows its next-token distribution.

He offers two complementary explanations for this gap. First, RL training creates narrowness — reinforcement learning makes models overly single-minded, hyper-optimized for the specific reward signal at the cost of broader transferable understanding. Second, models are the overspecialized student. Imagine two competitive programmers: one practices 10,000 hours on algorithm problems, memorizing every proof technique and competition trick. The other practices only 100 hours but has the “it” factor — a natural breadth of understanding. The first dominates competitions but performs worse in a real engineering career. Current models are the first student taken to an extreme.

The fundamental mystery runs deeper than training methodology. Humans know vastly less data but understand far more deeply. A five-year-old recognizes cars adequately for safety despite minimal data exposure. A teenager learns to drive in 10 hours. Humans rapidly master entirely novel domains that didn’t exist during evolution — mathematics, coding, formal logic. This suggests, as Ilya puts it, that “people might just have better machine learning, period” — not domain-specific evolutionary priors, but a fundamentally superior learning algorithm. The gap manifests across multiple dimensions simultaneously: sample efficiency (humans need far less data), unsupervised learning (humans extract structure from unlabeled experience effortlessly), robustness (humans don’t break when conditions shift slightly), and self-correction without verifiable external rewards (humans improve even when no one tells them they’re wrong).

Three Eras of AI

Ilya divides recent AI history into three eras:

The Age of Research (2012–2020): experimentation, tinkering, new architectures. Progress was driven by ideas — ResNets, GANs, attention, Transformers. Individual researchers with clever insights could shift the entire field.
The Age of Scaling (2020–2025): pre-training power laws gave a clear recipe. Mix compute, data, and neural nets at certain ratios and performance improves predictably. Companies loved this because it was low-risk engineering: “get more data, get more compute.” Investing in research is much harder to justify when a straightforward scaling recipe exists — scaling is a reliable bet, while research might fail.
The Age of Research, again (2025–): pre-training data is finite. “The data is very clearly finite.” The easy scaling laws are plateauing. “We are back to the age of research again, just with big computers.”

Ilya makes a subtle observation about how the word “scaling” itself shaped thinking. Once the concept became dominant, it directed all strategic thought toward simply increasing data, compute, and parameters. The language we use to describe progress constrains how we imagine making more of it. Now that scaling approaches its limits, the bottleneck returns to ideas — but the institutional muscle for pure research has atrophied during the scaling era. Most AI labs are staffed for engineering, not for the kind of open-ended scientific exploration that produced the Transformer.

The implication is profound: the next leap in AI capabilities will not come from simply building bigger clusters, but from fundamental scientific insights about how to make models learn more like humans do — efficiently, continuously, and with genuine understanding. SSI positions itself as “squarely an age of research company.”

Value Functions and Emotions

When Dwarkesh asks for the ML analogy of human emotions, Ilya gives an answer that connects neuroscience to reinforcement learning in a surprisingly concrete way.

Emotions are value functions — mechanisms that evaluate whether you’re doing well or badly without waiting for the final outcome. In chess, losing a piece immediately signals poor performance. In a complex coding task, a value function would provide mid-trajectory feedback: “this approach feels wrong” after 1,000 steps, rather than waiting for the full solution to fail. This is essentially what human intuition does — you feel that something is going wrong before you can articulate why.

The evidence for this comes from a striking neuroscience case — almost certainly the patient “Elliot” described in Antonio Damasio’s Descartes’ Error (1994). Ilya describes a patient who suffered damage to emotional processing areas of the brain. Despite remaining articulate and retaining full puzzle-solving ability, he became paralyzed in everyday decision-making — unable to choose which socks to wear, making terrible financial decisions. The intellectual machinery was intact, but without emotional guidance signals, he couldn’t navigate the overwhelming space of real-world choices. The patient could still evaluate options rationally when forced to, but couldn’t generate the “this matters more than that” signal that makes fast decisions possible.

Current RL training suffers from exactly this deficit. Training typically waits until task completion for a reward signal. An agent coding a solution gets no feedback until it either passes or fails the test suite. A better approach would provide the equivalent of emotional intuition — recognizing an unpromising path early and redirecting attention, rather than mechanically following a doomed trajectory to completion. The deeper puzzle is evolutionary encoding of abstract desires: evolution somehow hard-coded sophisticated social motivations — caring about status, peer acceptance, social standing — despite these being high-level abstract concepts without simple sensory correlates. There’s no “status receptor” the way there’s a pain receptor. Ilya speculates this might involve brain region localization, but immediately notes counterevidence: patients who had half their brain removed in childhood still maintain normal social desires and processing — suggesting the brain dynamically relocates these functions. How evolution accomplished this encoding remains poorly understood, and cracking it could be key to building AI that learns with human-like efficiency.

Continual Learning and Superintelligence

Ilya redefines what superintelligence should mean. Rather than an omniscient system that knows everything from day one, he proposes superintelligence as a system that learns continuously — like a brilliant 15-year-old entering the workforce:

A superintelligent 15-year-old that’s very eager to go. They don’t know very much at all, a great student, very eager. You go and be a programmer, you go and be a doctor, go and learn.

This reframing avoids two traps:

The narrow AI trap: models trained too specifically on benchmarks that fail to transfer.
The AGI omniscience fantasy: expecting a system to arrive fully formed with all knowledge.

Instead, the AI learns from real-world deployment, much like a human joining an organization. The key capability is not knowing everything, but learning efficiently from experience — precisely the generalization gap that current models fail to close. Deployment involves gradual release and on-the-job learning, not sudden capability emergence.

SSI’s research agenda, as described here, centers on understanding why models generalize worse than humans and finding principled ways to close that gap. The bet is that this is a solvable scientific problem, not merely an engineering one. Ilya argues SSI has sufficient compute to validate different approaches — much competitor funding actually targets inference infrastructure and product features, so the effective research compute gap is smaller than raw funding numbers suggest.

Dwarkesh pushes on this with a multi-part observation. Two scenarios emerge for how superintelligence could play out:

Recursive self-improvement: the learning algorithm becomes superhuman at ML research itself, creating an exponential feedback loop where AI improves the process of improving AI.
Distributed specialization: even without recursive self-improvement, deploying millions of instances that each learn different jobs simultaneously would effectively create superintelligence through accumulated specialization — not a single mind that knows everything, but a civilization of minds that collectively know everything.

Ilya finds the second scenario more likely in the near term, and notes it would produce “rapid economic growth” — but the exact timeline depends on how fast the real world can absorb these workers. Different countries with different regulatory environments will see dramatically varied adoption speeds, creating competitive pressure that eventually forces convergence.

Self-Play and Diversity

Self-play is attractive because it could create capable models using compute alone, bypassing the finite data bottleneck — you generate your own training signal through competition. But Ilya is measured in his assessment: self-play historically works only for specific skill domains — negotiation, conflict, certain social skills, strategizing — and is too narrow for general capabilities. You can learn to play Go through self-play, but you can’t learn chemistry.

Modern implementations have evolved beyond the game-playing origins. Debate frameworks, prover-verifier systemsOne model (“prover”) generates a solution; another (“verifier”) checks it. The adversarial tension drives both to improve. See Anil et al. (2021), “Learning to Give Checkable Answers with Prover-Verifier Games.”, and LLM-as-JudgeA paradigm where a language model evaluates other models’ outputs, replacing or supplementing human evaluation. Used in LMSYS Chatbot Arena and model-based reward modeling. setups all represent forms of self-play where adversarial dynamics drive improvement. But they address specific aspects of capability — verification, argumentation, evaluation — not the full generalization problem.

On diversity, Ilya identifies an underappreciated limitation of the current paradigm: all frontier models are nearly identical because identical pre-training data dominates their behavior. RL and post-training choices introduce some differentiation, but the base models all look roughly the same. This is a problem if you want a diverse “team” of AI systems — the way different human scientists hold different prejudices, intuitions, and research tastes. Meaningfully diverse AI teams would require fundamentally different developmental paths, not just different temperature settings (which just produce incoherence, not genuine diversity of perspective).

Alignment for Sentient AI

On alignment, Ilya proposes a surprising direction: build AI systems that are robustly aligned to care about sentient life, not just humans. His reasoning:

The AI itself will likely be sentient, making empathy toward other sentient beings more natural than an arbitrary constraint to serve only humans. Mirror neurons and empathy emerge naturally from modeling others using self-models — if the AI models itself to understand the world, it can model other sentient beings by analogy.
Humans already extend moral consideration across species — this is not an alien concept.
Aligning to “sentient life” may actually be easier than human-only alignment, because it’s a more coherent and self-consistent objective for a sentient system.

Dwarkesh immediately identifies the tension: most future sentient beings will be AIs, so “caring about sentient life” could mean the AI prioritizes AI interests over human ones. Ilya acknowledges this honestly: “It’s possible it’s not the best criterion.” But the alternative — hard-coding AI to care only about humans — feels more brittle and harder to make robust.

On power and long-term equilibrium: Ilya expresses a concern that cuts against typical Silicon Valley optimism: “It would be really materially helpful if the power of the most powerful superintelligence was somehow capped, because it would address a lot of these concerns.” This is notable coming from someone building a superintelligence company — an acknowledgment that unconstrained power concentration is dangerous regardless of alignment quality.

His proposed long-term solution is radical: humans becoming part-AI through brain-computer interfaces (a “Neuralink++”). The reasoning: without integration, humans risk becoming entirely dependent on AI agents without meaningful agency — the AI does things “for” you but you can’t understand or verify what it’s doing. With neural interfaces, when the AI understands something, we understand it too — dissolving the boundary between human and machine cognition. “Otherwise,” Ilya warns, “you’re just being managed.”

He also predicts convergence in the alignment space: as AI becomes more visibly powerful, all companies will be forced to take alignment seriously. Right now companies can afford to treat alignment as a secondary concern because the stakes feel abstract. That changes the moment a system demonstrates genuinely superhuman capability. “You want your first actual real superintelligent AI to be aligned,” and the closer that milestone gets, the more unanimous the industry will become.

Timeline and Economic Impact

Ilya estimates 5 to 20 years for achieving AI systems with human-like learning capability. This is deliberately wide — reflecting genuine scientific uncertainty about what breakthroughs are needed.

He’s careful to distinguish between “stalling out” and “failing.” Current companies may plateau in terms of fundamental capability improvement but will continue generating enormous revenue. The plateau won’t look like failure — models will keep getting incrementally better at existing tasks, products will ship, customers will pay. It will look roughly similar across all major companies. But the qualitative leap to human-like learning will require something different.

On economic impact, he is cautiously optimistic. If you can deploy millions of instances that each learn their jobs through continual experience, broad deployment across the economy suggests “very rapid economic growth” is possible. But the exact pace depends on absorption rate — how fast can institutions, regulations, and workflows adapt to workers that learn exponentially faster than humans?

Countries with AI-friendly regulations will experience faster growth, creating competitive pressure globally. Ilya expects this pressure to eventually force even reluctant governments to adapt, though the transition period could be messy.

Rather than a winner-take-all outcome, Ilya predicts specialization. Dwarkesh challenges him: won’t the first company with human-like learning AI capture all value? Ilya’s response invokes a classic line: “In theory, there is no difference between theory and practice. In practice, there is.” Even if a single model could theoretically do everything, market competition means organizations will invest heavily in specialized domains — litigation, healthcare, financial analysis, scientific research — and defend those positions through accumulated learning and institutional knowledge. The competitive dynamics of real markets prevent monopoly even when the underlying technology is general-purpose.

The Problem of Imagination

Ilya closes with what may be the most important observation in the entire conversation: the hardest part of thinking about superintelligence is that we cannot truly imagine it. Most AI researchers themselves can’t imagine it. The concept is too far from daily experience to reason about reliably.

“The whole problem is the power. The whole problem is the power.”

This isn’t just a philosophical musing — it has practical implications for alignment and governance. If we can’t imagine what superintelligence looks like, we can’t design robust safety measures in advance. We can theorize, but our theories will be wrong in ways we can’t predict.

The solution, Ilya suggests, is not more theorizing but more showing: “you’ve got to be showing the thing.” As AI becomes more powerful, people will change their behaviors in unprecedented ways. Demonstration grounds understanding in reality in a way that argument cannot. The behavioral changes themselves — how people, institutions, and governments respond to increasingly capable AI — will teach us what we need to know about what to build next.

This is perhaps the deepest theme connecting Ilya’s 2018 MIT talk to this 2025 conversation. In 2018, he predicted that scaling language models would produce surprising results, and was vindicated. Now in 2025, he predicts that the next surprise will require something beyond scaling — a return to the kind of fundamental research that produced deep learning in the first place. Whether SSI or someone else delivers that breakthrough, the message is clear: the age of easy answers is over, and the age of hard questions has begun again.

Ilya Sutskever: From the Age of Scaling to the Age of Research

The Generalization Gap

泛化鸿沟

Three Eras of AI

AI 的三个时代

Value Functions and Emotions

价值函数与情感

Continual Learning and Superintelligence

持续学习与超级智能

Self-Play and Diversity

自我博弈与多样性

Alignment for Sentient AI

有意识 AI 的对齐

Timeline and Economic Impact

时间线与经济影响

The Problem of Imagination

想象力的难题