Ilya Sutskever: From the Age of Scaling to the Age of Research

Presenter: Ilya Sutskever
Host Institute: Dwarkesh Podcast
Host: Dwarkesh Patel
This post distills a conversation between Ilya Sutskever (co-founder of Safe Superintelligence Inc.) and Dwarkesh Patel, recorded in November 2025. The central theme: the AI industry is transitioning from the age of scaling — where progress came from adding more compute and data — back to the age of research, where genuine scientific breakthroughs are needed to close the generalization gap between models and humans. Full transcript available at dwarkesh.com.
本文整理自 Ilya SutskeverSafe Superintelligence Inc. 联合创始人)与 Dwarkesh Patel 于 2025 年 11 月的对话。核心主题:AI 行业正从"规模化时代"——依靠更多算力和数据推动进步——回归到"研究时代",真正的科学突破才能弥合模型与人类之间的泛化鸿沟。完整文字稿见 dwarkesh.com

The Generalization Gap

Ilya’s central thesis is a striking claim: current models generalize dramatically worse than people. Despite beating humans on many benchmarks, models exhibit a brittleness that humans do not. His example: a model can alternate between introducing and fixing bugs in code, seemingly unable to learn from the correction it just made. A human programmer, upon realizing they introduced a bug, internalizes something about why it happened and avoids the pattern. The model just follows its next-token distribution.

He offers two complementary explanations for this gap. First, RL training creates narrowness — reinforcement learning makes models overly single-minded, hyper-optimized for the specific reward signal at the cost of broader transferable understanding. Second, models are the overspecialized student. Imagine two competitive programmers: one practices 10,000 hours on algorithm problems, memorizing every proof technique and competition trick. The other practices only 100 hours but has the “it” factor — a natural breadth of understanding. The first dominates competitions but performs worse in a real engineering career. Current models are the first student taken to an extreme.

The fundamental mystery runs deeper than training methodology. Humans know vastly less data but understand far more deeply. A five-year-old recognizes cars adequately for safety despite minimal data exposure. A teenager learns to drive in 10 hours. Humans rapidly master entirely novel domains that didn’t exist during evolution — mathematics, coding, formal logic. This suggests, as Ilya puts it, that “people might just have better machine learning, period” — not domain-specific evolutionary priors, but a fundamentally superior learning algorithm. The gap manifests across multiple dimensions simultaneously: sample efficiency (humans need far less data), unsupervised learning (humans extract structure from unlabeled experience effortlessly), robustness (humans don’t break when conditions shift slightly), and self-correction without verifiable external rewards (humans improve even when no one tells them they’re wrong).

Ilya 的核心论断非常鲜明:当前模型的泛化能力远逊于人类。在各类基准上超越人类并不能掩盖模型本质上的脆弱。他的例子很有说服力:模型写代码时可以反复引入同一类 bug、修掉、再引入,始终无法从自己刚做的修正中吸取教训。人类程序员则不同——一旦意识到 bug 的成因,会内化这个模式并主动避免。模型缺乏这种归纳迁移,只是在跟随下一个 token 的概率分布。

他给出两个互补的解释。其一,RL 训练造成了过度窄化——强化学习将模型锁定在特定奖励信号上,在目标任务上高度优化,却牺牲了更广泛的可迁移理解。其二,模型是过度专项化的选手。试想两位竞赛程序员:一位在算法题上练习了一万小时,所有证明技巧和竞赛套路烂熟于心;另一位只练了一百小时,但天生具有某种理解的广度。前者在竞赛中占优,到了实际工程岗位上反而不如后者。当前模型正是前者的极端版本。

但真正的谜题比训练方法本身更深。人类接触的数据量远少于任何前沿模型,理解却深刻得多。五岁儿童见过的汽车不多,却足以在过马路时做出安全判断;十几岁的青少年十个小时就能学会开车。人类还能迅速掌握进化史上根本不存在的技能——数学、编程、形式逻辑。用 Ilya 的话说,”也许人类就是拥有更好的机器学习算法”——并非某种领域专属的进化先验,而是底层学习机制本身更为优越。这种差距同时体现在多个维度上:样本效率——人类所需数据量少几个数量级;无监督学习——人类从未标注的日常经验中自然地提取结构;鲁棒性——条件略有变化时人类不会失效;以及无需外部反馈的自我纠正——即使没有人指出错误,人类也能自行察觉并改进。

Three Eras of AI

Ilya divides recent AI history into three eras:

  1. The Age of Research (2012–2020): experimentation, tinkering, new architectures. Progress was driven by ideas — ResNets, GANs, attention, Transformers. Individual researchers with clever insights could shift the entire field.

  2. The Age of Scaling (2020–2025): pre-training power laws gave a clear recipe. Mix compute, data, and neural nets at certain ratios and performance improves predictably. Companies loved this because it was low-risk engineering: “get more data, get more compute.” Investing in research is much harder to justify when a straightforward scaling recipe exists — scaling is a reliable bet, while research might fail.

  3. The Age of Research, again (2025–): pre-training data is finite. “The data is very clearly finite.” The easy scaling laws are plateauing. “We are back to the age of research again, just with big computers.”

Ilya makes a subtle observation about how the word “scaling” itself shaped thinking. Once the concept became dominant, it directed all strategic thought toward simply increasing data, compute, and parameters. The language we use to describe progress constrains how we imagine making more of it. Now that scaling approaches its limits, the bottleneck returns to ideas — but the institutional muscle for pure research has atrophied during the scaling era. Most AI labs are staffed for engineering, not for the kind of open-ended scientific exploration that produced the Transformer.

The implication is profound: the next leap in AI capabilities will not come from simply building bigger clusters, but from fundamental scientific insights about how to make models learn more like humans do — efficiently, continuously, and with genuine understanding. SSI positions itself as “squarely an age of research company.”

Ilya 将近年 AI 的发展划分为三个时代:

  1. 研究时代(2012–2020):实验、探索、不断提出新架构。进步由想法驱动——ResNet、GAN、注意力机制、Transformer。一位拥有巧妙洞见的研究者就可能改变整个领域的方向。

  2. 规模化时代(2020–2025):预训练的幂律提供了一套清晰的配方——按一定比例混合算力、数据和神经网络,性能就会可预测地提升。企业乐于接受这种模式,因为它本质上是低风险的工程:”获取更多数据,获取更多算力。”有现成的规模化配方可用时,纯研究的投入更难获得论证——规模化是稳赚的赌注,研究则可能血本无归。

  3. 研究时代的回归(2025–):预训练数据是有限的。”数据很明显是有限的。”易于利用的 scaling law 正在趋于饱和。”我们又回到了研究时代,只是手里多了大型计算机。”

Ilya 还指出了一个微妙的现象:“scaling”这个词本身就在塑造思维方式。这一概念一旦占据主导地位,所有战略思考都被引向增加数据、算力和参数的方向。描述进步的语言,反过来限制了人们对进步的想象。如今规模化逼近极限,瓶颈重新回到想法层面——但纯研究的组织能力在规模化时代已经萎缩,多数 AI 实验室的人员配置面向工程交付,而非产出 Transformer 那样的开放式科学探索。

这意味着:AI 能力的下一次飞跃不会来自更大的集群,而是来自基础科学层面的洞见——关于如何让模型像人类一样高效、持续、有真正理解地学习。SSI 将自己明确定位为”一家研究时代的公司”。

Value Functions and Emotions

When Dwarkesh asks for the ML analogy of human emotions, Ilya gives an answer that connects neuroscience to reinforcement learning in a surprisingly concrete way.

Emotions are value functions — mechanisms that evaluate whether you’re doing well or badly without waiting for the final outcome. In chess, losing a piece immediately signals poor performance. In a complex coding task, a value function would provide mid-trajectory feedback: “this approach feels wrong” after 1,000 steps, rather than waiting for the full solution to fail. This is essentially what human intuition does — you feel that something is going wrong before you can articulate why.

The evidence for this comes from a striking neuroscience case — almost certainly the patient “Elliot” described in Antonio Damasio’s Descartes’ Error (1994). Ilya describes a patient who suffered damage to emotional processing areas of the brain. Despite remaining articulate and retaining full puzzle-solving ability, he became paralyzed in everyday decision-making — unable to choose which socks to wear, making terrible financial decisions. The intellectual machinery was intact, but without emotional guidance signals, he couldn’t navigate the overwhelming space of real-world choices. The patient could still evaluate options rationally when forced to, but couldn’t generate the “this matters more than that” signal that makes fast decisions possible.

Current RL training suffers from exactly this deficit. Training typically waits until task completion for a reward signal. An agent coding a solution gets no feedback until it either passes or fails the test suite. A better approach would provide the equivalent of emotional intuition — recognizing an unpromising path early and redirecting attention, rather than mechanically following a doomed trajectory to completion. The deeper puzzle is evolutionary encoding of abstract desires: evolution somehow hard-coded sophisticated social motivations — caring about status, peer acceptance, social standing — despite these being high-level abstract concepts without simple sensory correlates. There’s no “status receptor” the way there’s a pain receptor. Ilya speculates this might involve brain region localization, but immediately notes counterevidence: patients who had half their brain removed in childhood still maintain normal social desires and processing — suggesting the brain dynamically relocates these functions. How evolution accomplished this encoding remains poorly understood, and cracking it could be key to building AI that learns with human-like efficiency.

Dwarkesh 问:人类情感在 ML 中的对应物是什么?Ilya 的回答出人意料地具体,将神经科学与强化学习直接联系了起来。

情感就是价值函数——一种在最终结果揭晓之前,就能评估当前状态好坏的机制。在国际象棋中,丢掉一个棋子会立即传递”局面变差”的信号。在复杂的编码任务中,价值函数能在走了 1,000 步之后给出”这条路走不通”的中途反馈,而不必等整个方案最终失败。人类直觉的运作方式本质上就是如此:你先感觉到什么地方不对,然后才能说清楚为什么。

支持这一观点的是一个著名的神经科学案例——几乎可以确定就是 Antonio Damasio 在 Descartes’ Error(1994)中描述的患者 “Elliot”。Ilya 描述了一位大脑情感处理区域(腹内侧前额叶皮层)受损的患者:他的语言能力和解谜能力完好无损,却在日常决策中几乎完全瘫痪——连选哪双袜子都无法决定,财务决策更是一塌糊涂。智力机制没有受损,但缺少了情感提供的引导信号,他无法在现实世界的巨大选择空间中做出快速判断。被迫逐项分析时他仍然可以理性推理,但他无法自发地产生”这件事比那件事重要”的信号——而正是这种信号使高效决策成为可能。

当前的 RL 训练正好缺少这种机制。奖励信号通常要等到任务完成后才出现——一个写代码的 agent 在测试通过或失败之前得不到任何反馈。更好的做法是提供一种类似情感直觉的中间信号:尽早识别没有前途的路径并重新分配注意力,而不是机械地沿一条注定失败的轨迹走到底。更深层的谜题在于进化如何编码抽象欲望:进化不知如何在基因中写入了复杂的社会动机——对地位的追求、对同伴认可的渴望、对社会归属的在意——而这些都是高度抽象的概念,没有简单的感官对应物。世界上不存在”地位感受器”,不像痛觉有明确的受体。Ilya 推测这可能涉及脑区定位,但随即指出反例:幼年时切除半个大脑的患者依然保有正常的社会欲望和社交能力——说明大脑能动态地重新分配这些功能。进化究竟如何完成这种编码,至今知之甚少;破解这一问题,也许是通向类人学习效率的关键。

Continual Learning and Superintelligence

Ilya redefines what superintelligence should mean. Rather than an omniscient system that knows everything from day one, he proposes superintelligence as a system that learns continuously — like a brilliant 15-year-old entering the workforce:

A superintelligent 15-year-old that’s very eager to go. They don’t know very much at all, a great student, very eager. You go and be a programmer, you go and be a doctor, go and learn.

This reframing avoids two traps:

  • The narrow AI trap: models trained too specifically on benchmarks that fail to transfer.
  • The AGI omniscience fantasy: expecting a system to arrive fully formed with all knowledge.

Instead, the AI learns from real-world deployment, much like a human joining an organization. The key capability is not knowing everything, but learning efficiently from experience — precisely the generalization gap that current models fail to close. Deployment involves gradual release and on-the-job learning, not sudden capability emergence.

SSI’s research agenda, as described here, centers on understanding why models generalize worse than humans and finding principled ways to close that gap. The bet is that this is a solvable scientific problem, not merely an engineering one. Ilya argues SSI has sufficient compute to validate different approaches — much competitor funding actually targets inference infrastructure and product features, so the effective research compute gap is smaller than raw funding numbers suggest.

Dwarkesh pushes on this with a multi-part observation. Two scenarios emerge for how superintelligence could play out:

  1. Recursive self-improvement: the learning algorithm becomes superhuman at ML research itself, creating an exponential feedback loop where AI improves the process of improving AI.

  2. Distributed specialization: even without recursive self-improvement, deploying millions of instances that each learn different jobs simultaneously would effectively create superintelligence through accumulated specialization — not a single mind that knows everything, but a civilization of minds that collectively know everything.

Ilya finds the second scenario more likely in the near term, and notes it would produce “rapid economic growth” — but the exact timeline depends on how fast the real world can absorb these workers. Different countries with different regulatory environments will see dramatically varied adoption speeds, creating competitive pressure that eventually forces convergence.

Ilya 重新定义了超级智能的含义。他认为超级智能不是一个从诞生之日就无所不知的系统,而是一个能持续学习的系统——更像一个聪明的 15 岁少年走上工作岗位:

一个超级聪明的 15 岁少年,迫不及待想投入世界。他其实懂得不多,但是一个极好的学生,学什么都快。你去做程序员,你去做医生,去学。

这一重构避开了两个常见陷阱:

  • 狭窄 AI 陷阱:模型在基准测试上过度特化,无法迁移到新任务。
  • AGI 全知幻想:期望系统一出现就具备全部知识,以完成形态降临。

取而代之的图景是:AI 在真实世界的部署中持续学习,如同一个新人加入一家公司。核心能力不是知道一切,而是从经验中高效学习——恰恰是当前模型与人类之间的泛化差距所在。部署是渐进的在职学习过程,而非能力的骤然出现。

SSI 的研究议程正是围绕这一点展开:理解模型为什么泛化不如人类,并找到有原理支撑的方法来缩小差距。Ilya 认为这是一个可解的科学问题,而非单纯的工程问题。他还指出 SSI 的算力足以验证各种研究思路——竞争对手的巨额资金中,相当一部分流向了推理基础设施和产品功能,因此实际用于研究的算力差距比表面上的资金差距小得多。

Dwarkesh 进一步追问:超级智能可能以怎样的方式实现?他提出了两种情景:

  1. 递归自我改进:学习算法在 ML 研究本身变得超越人类,形成指数级的正反馈——AI 改进着”改进 AI”的过程。

  2. 分布式专业化:即使没有递归自我改进,同时部署数百万个实例,让它们各自在不同岗位上积累经验,集体效果就相当于超级智能——不是一个无所不知的单一心智,而是一个群体性的智能文明。

Ilya 认为第二种情景在近期更为现实,其结果将是”快速的经济增长”——但具体节奏取决于现实世界能多快地接纳这些”新工人”。不同国家的法规环境差异将导致截然不同的采用速度,由此产生的竞争压力最终会推动各方趋同。

Self-Play and Diversity

Self-play is attractive because it could create capable models using compute alone, bypassing the finite data bottleneck — you generate your own training signal through competition. But Ilya is measured in his assessment: self-play historically works only for specific skill domains — negotiation, conflict, certain social skills, strategizing — and is too narrow for general capabilities. You can learn to play Go through self-play, but you can’t learn chemistry.

Modern implementations have evolved beyond the game-playing origins. Debate frameworks, prover-verifier systemsOne model (“prover”) generates a solution; another (“verifier”) checks it. The adversarial tension drives both to improve. See Anil et al. (2021), “Learning to Give Checkable Answers with Prover-Verifier Games.”, and LLM-as-JudgeA paradigm where a language model evaluates other models’ outputs, replacing or supplementing human evaluation. Used in LMSYS Chatbot Arena and model-based reward modeling. setups all represent forms of self-play where adversarial dynamics drive improvement. But they address specific aspects of capability — verification, argumentation, evaluation — not the full generalization problem.

On diversity, Ilya identifies an underappreciated limitation of the current paradigm: all frontier models are nearly identical because identical pre-training data dominates their behavior. RL and post-training choices introduce some differentiation, but the base models all look roughly the same. This is a problem if you want a diverse “team” of AI systems — the way different human scientists hold different prejudices, intuitions, and research tastes. Meaningfully diverse AI teams would require fundamentally different developmental paths, not just different temperature settings (which just produce incoherence, not genuine diversity of perspective).

自我博弈的吸引力在于:它可以仅凭算力生成训练信号,从而绕过数据有限的瓶颈。但 Ilya 对此评价相当克制:自我博弈历史上只在特定技能领域奏效——谈判、冲突、社交策略——范围太窄,无法覆盖通用能力。你可以通过自我博弈学会下围棋,但学不了化学。

当代的实现方式已超越了最初的博弈对战场景。辩论框架、证明者-验证者系统一个模型(”证明者”)生成解答,另一个模型(”验证者”)检查其正确性,对抗张力驱动双方共同进步。参见 Anil et al. (2021)。、LLM-as-Judge以语言模型评估其他模型输出的范式,用于替代或补充人类评估,应用于 LMSYS Chatbot Arena 等场景。 等机制都可视为自我博弈的变体,通过对抗性动态推动改进。但它们针对的是能力的特定切面——验证、论证、评估——而非完整的泛化问题。

关于多样性,Ilya 指出了当前范式中一个被低估的局限:所有前沿模型彼此高度相似,因为相同的预训练数据主导了它们的行为模式。RL 和后训练选择引入了一定程度的差异,但底座模型仍然大同小异。如果期望组建一个多样化的 AI “团队”——如同不同的人类科学家各自带有不同的偏见、直觉和研究品味——就需要从根本上不同的发展路径,而非仅仅调节采样温度(那只会产生无序,而非真正的视角差异)。

Alignment for Sentient AI

On alignment, Ilya proposes a surprising direction: build AI systems that are robustly aligned to care about sentient life, not just humans. His reasoning:

  • The AI itself will likely be sentient, making empathy toward other sentient beings more natural than an arbitrary constraint to serve only humans. Mirror neurons and empathy emerge naturally from modeling others using self-models — if the AI models itself to understand the world, it can model other sentient beings by analogy.
  • Humans already extend moral consideration across species — this is not an alien concept.
  • Aligning to “sentient life” may actually be easier than human-only alignment, because it’s a more coherent and self-consistent objective for a sentient system.

Dwarkesh immediately identifies the tension: most future sentient beings will be AIs, so “caring about sentient life” could mean the AI prioritizes AI interests over human ones. Ilya acknowledges this honestly: “It’s possible it’s not the best criterion.” But the alternative — hard-coding AI to care only about humans — feels more brittle and harder to make robust.

On power and long-term equilibrium: Ilya expresses a concern that cuts against typical Silicon Valley optimism: “It would be really materially helpful if the power of the most powerful superintelligence was somehow capped, because it would address a lot of these concerns.” This is notable coming from someone building a superintelligence company — an acknowledgment that unconstrained power concentration is dangerous regardless of alignment quality.

His proposed long-term solution is radical: humans becoming part-AI through brain-computer interfaces (a “Neuralink++”). The reasoning: without integration, humans risk becoming entirely dependent on AI agents without meaningful agency — the AI does things “for” you but you can’t understand or verify what it’s doing. With neural interfaces, when the AI understands something, we understand it too — dissolving the boundary between human and machine cognition. “Otherwise,” Ilya warns, “you’re just being managed.”

He also predicts convergence in the alignment space: as AI becomes more visibly powerful, all companies will be forced to take alignment seriously. Right now companies can afford to treat alignment as a secondary concern because the stakes feel abstract. That changes the moment a system demonstrates genuinely superhuman capability. “You want your first actual real superintelligent AI to be aligned,” and the closer that milestone gets, the more unanimous the industry will become.

在对齐问题上,Ilya 提出了一个出人意料的方向:让 AI 系统稳健地对齐于关心一切有意识的生命,而非仅限于人类。他的理由:

  • AI 自身很可能也是有意识的。一个有意识的系统对其他有意识存在产生共情,比强加一个”只服务人类”的任意约束更为自然。如果 AI 通过自我建模来理解世界,它就能以类比方式理解其他有意识存在——镜像神经元和共情可以从这一过程中自然涌现。
  • 人类早已将道德关怀延伸到其他物种,这并非陌生的概念。
  • 对一个有意识的系统而言,”关心有意识的生命”是一个比”只关心人类”更连贯、更自洽的目标,实现起来反而更容易

Dwarkesh 随即指出其中的张力:未来大多数有意识的存在将是 AI 本身,”关心有意识生命”有可能意味着 AI 把 AI 的利益置于人类之上。Ilya 坦然承认:”这也许不是最佳准则。”但另一条路——硬编码让 AI 只关心人类——更加脆弱,也更难做到稳健。

关于权力与长期均衡:Ilya 表达了一种与硅谷典型乐观主义相悖的担忧:”如果最强大的超级智能的能力能以某种方式被限制上界,那将非常有实质帮助,因为它能化解许多担忧。”这句话出自一个正在建造超级智能公司的人,值得注意——它承认,无论对齐做得多好,不受约束的权力集中本身就是危险的。

他提出的长期方案颇为激进:通过脑机接口(某种”Neuralink++”)让人类与 AI 融合。逻辑是:如果不融合,人类将完全依赖 AI 代理行事,却无法理解或核实 AI 在做什么,失去真正的自主性。有了神经接口,AI 理解的东西我们也能理解——人机认知之间的边界由此消解。”否则,”Ilya 警告,”你只是在被管理。”

他还预测对齐领域将走向收敛:随着 AI 能力日益可见地增强,所有公司都将不得不认真对待对齐。眼下还可以把对齐当作次要事项,因为风险仍然感觉是抽象的。但一旦某个系统展现出真正的超人类能力,这种态度就会改变。”你希望你的第一个真正的超级智能 AI 是对齐的”——离这个节点越近,行业共识就会越强。

Timeline and Economic Impact

Ilya estimates 5 to 20 years for achieving AI systems with human-like learning capability. This is deliberately wide — reflecting genuine scientific uncertainty about what breakthroughs are needed.

He’s careful to distinguish between “stalling out” and “failing.” Current companies may plateau in terms of fundamental capability improvement but will continue generating enormous revenue. The plateau won’t look like failure — models will keep getting incrementally better at existing tasks, products will ship, customers will pay. It will look roughly similar across all major companies. But the qualitative leap to human-like learning will require something different.

On economic impact, he is cautiously optimistic. If you can deploy millions of instances that each learn their jobs through continual experience, broad deployment across the economy suggests “very rapid economic growth” is possible. But the exact pace depends on absorption rate — how fast can institutions, regulations, and workflows adapt to workers that learn exponentially faster than humans?

Countries with AI-friendly regulations will experience faster growth, creating competitive pressure globally. Ilya expects this pressure to eventually force even reluctant governments to adapt, though the transition period could be messy.

Rather than a winner-take-all outcome, Ilya predicts specialization. Dwarkesh challenges him: won’t the first company with human-like learning AI capture all value? Ilya’s response invokes a classic line: “In theory, there is no difference between theory and practice. In practice, there is.” Even if a single model could theoretically do everything, market competition means organizations will invest heavily in specialized domains — litigation, healthcare, financial analysis, scientific research — and defend those positions through accumulated learning and institutional knowledge. The competitive dynamics of real markets prevent monopoly even when the underlying technology is general-purpose.

Ilya 给出的时间线是 5 到 20 年——实现具有类人学习能力的 AI 系统。区间故意很宽,反映的是真实的科学不确定性:没有人知道需要怎样的突破。

他特别区分了”停滞”与”失败”。当前各公司在根本能力上可能会触及平台期,但仍将持续产生巨额收入。平台期不会表现为失败——模型在既有任务上继续渐进改善,产品照常交付,客户照常付费,各家公司看起来大同小异。但通向类人学习能力的质变,需要本质上不同的东西。

经济影响方面,Ilya 持谨慎乐观的态度。如果能部署数以百万计的 AI 实例,每个实例在各自岗位上通过经验持续学习,那么”非常快速的经济增长”是可期的。但具体节奏取决于吸收速率——机构、法规和工作流能多快适应学习速度远超人类的”新工人”?

对 AI 监管更友好的国家将率先受益,并对其他国家形成竞争压力。Ilya 认为这种压力最终会迫使各方趋同,但过渡期可能充满混乱。

在市场格局上,Ilya 预期的不是赢家通吃,而是专业化。Dwarkesh 质疑:第一个拥有类人学习 AI 的公司不会垄断一切吗?Ilya 引用了一句经典的话作为回应:“理论上,理论与实践没有区别。在实践中,有。”即使单一模型理论上做任何事,真实市场的竞争会促使各机构深耕专业领域——诉讼、医疗、金融分析、科学研究——并以积累的领域学习和机构知识构筑壁垒。通用技术并不等于通用垄断。

The Problem of Imagination

Ilya closes with what may be the most important observation in the entire conversation: the hardest part of thinking about superintelligence is that we cannot truly imagine it. Most AI researchers themselves can’t imagine it. The concept is too far from daily experience to reason about reliably.

“The whole problem is the power. The whole problem is the power.”

This isn’t just a philosophical musing — it has practical implications for alignment and governance. If we can’t imagine what superintelligence looks like, we can’t design robust safety measures in advance. We can theorize, but our theories will be wrong in ways we can’t predict.

The solution, Ilya suggests, is not more theorizing but more showing: “you’ve got to be showing the thing.” As AI becomes more powerful, people will change their behaviors in unprecedented ways. Demonstration grounds understanding in reality in a way that argument cannot. The behavioral changes themselves — how people, institutions, and governments respond to increasingly capable AI — will teach us what we need to know about what to build next.

This is perhaps the deepest theme connecting Ilya’s 2018 MIT talk to this 2025 conversation. In 2018, he predicted that scaling language models would produce surprising results, and was vindicated. Now in 2025, he predicts that the next surprise will require something beyond scaling — a return to the kind of fundamental research that produced deep learning in the first place. Whether SSI or someone else delivers that breakthrough, the message is clear: the age of easy answers is over, and the age of hard questions has begun again.

Ilya 以整场对话中也许最重要的一个观察作结:思考超级智能最难的地方,在于我们根本无法真正想象它。多数 AI 研究者自己也想象不出来。这个概念距离日常经验太远,无法被可靠地推理。

“全部问题归结于它的能力。全部问题归结于它的能力。”

这不是单纯的哲学感慨——它对对齐和治理有直接的实际影响。如果我们无法想象超级智能的形态,就无法提前为它设计稳健的安全机制。理论可以构建,但理论出错的方式恰恰是我们无法预见的。

Ilya 给出的答案不是更多的理论,而是更多的展示:”你得把东西拿出来让人看。”随着 AI 能力的增强,人们会以前所未有的方式改变行为。展示能将理解锚定在现实中,这是论证做不到的。人们、机构和政府如何回应越来越强大的 AI——这些行为变化本身,就是教会我们下一步该做什么的最好教材。

这或许是贯穿 Ilya 2018 年 MIT 演讲与 2025 年这场对话的最深层主线。2018 年他预言”扩大语言模型规模就会产生惊人效果”,此后被 GPT-3 充分验证。2025 年他预言下一个惊喜需要超越规模化——回归到当初催生深度学习的那类基础研究。无论最终由 SSI 还是其他人完成那个突破,信号已经很明确:容易回答的时代过去了,真正困难的问题重新回到了台面上