重溫神作《苦澀的教訓》：預判了從 GPT 到 o1/r1 到 Manus，以及更多...

2026-01-10 08:26:17　來源: 賽博禪心

北京舉報

分享至

這是 AI 從業(yè)者的必讀指南

周末，讓我們重溫下《苦澀的教訓》這一神作，發(fā)布于 2019 年，預言全中，作者 Rich Sutton，是現(xiàn)代強化學習之父

Rich Sutton 寫下《苦澀的教訓》，核心觀點只有一句：搜索和學習這兩種通用方法，配合算力擴展，最終會碾壓一切精巧的人工設(shè)計

當時主流觀點還是「純堆算力不行，要嵌入人類知識」。然后 GPT-3 來了，Scaling Laws 被驗證了，語言學家設(shè)計了幾十年的 NLP 流水線被一個 Transformer 端到端取代，ChatGPT 爆發(fā)。預言全部兌現(xiàn)

這個預測，現(xiàn)在也正在 Agent 領(lǐng)域繼續(xù)驗證

推理模型把搜索內(nèi)化到模型內(nèi)部，o1、DeepSeek-R1 不需要外部設(shè)計思維鏈，模型自己在 token 空間里搜索推理路徑

Manus 這類 Agent 更進一步（他們在定方向的時候，復用了 Sutton 的結(jié)論：交給模型）：模型自己判斷用什么工具、怎么拆解任務、如何執(zhí)行。不再需要人工編排 workflow

這和 Sutton 六年前的判斷完全一致：別折騰精巧設(shè)計了，通用方法配合算力擴展，最終會贏

苦澀的教訓（譯）

AI 研究 70 年，最大的教訓只有一個：利用算力的通用方法，最終總是最有效的，而且優(yōu)勢極其明顯

根本原因在于摩爾定律，或者更準確地說，在于單位算力成本持續(xù)指數(shù)級下降這一更普遍的規(guī)律。大多數(shù) AI 研究都有一個隱含假設(shè)：智能體可用的算力是固定的。在這個假設(shè)下，嵌入人類知識幾乎是提升性能的唯一途徑。但只要把時間尺度稍微拉長，超出一個典型研究項目的周期，算力就會出現(xiàn)數(shù)量級的增長

為了在短期內(nèi)做出成果，研究者傾向于利用自己對領(lǐng)域的理解。但長期來看，真正重要的只有一件事：如何利用算力。這兩條路線理論上可以并行，實踐中卻往往相互排斥。時間花在一邊，就沒法花在另一邊。心理上也會形成路徑依賴。更麻煩的是，人類知識導向的方法往往把系統(tǒng)搞得很復雜，反而不利于發(fā)揮通用方法的算力優(yōu)勢。AI 研究者一次又一次地遲到才學會這個苦澀的教訓，回顧幾個最典型的案例很有啟發(fā)

在國際象棋領(lǐng)域，1997 年擊敗卡斯帕羅夫的方法，核心就是大規(guī)模深度搜索。當時大多數(shù)計算機象棋研究者對此很不滿。他們一直在研究如何利用人類對棋局結(jié)構(gòu)的理解。當一個更簡單的、基于搜索的方法配合專用硬件和軟件被證明遠遠更有效時，這些研究者輸?shù)貌⒉惑w面。他們說「暴力搜索」這次贏了，但這不是通用策略，而且人類下棋也不是這么下的。他們希望基于人類知識的方法獲勝，結(jié)果失望了

圍棋領(lǐng)域上演了同樣的劇情，只是晚了 20 年。早期投入了大量精力來避免搜索，想辦法利用人類知識，利用圍棋的特殊結(jié)構(gòu)。但當搜索被有效地大規(guī)模應用后，所有這些努力都變得無關(guān)緊要，甚至適得其反。同樣重要的是通過自我對弈，來學習價值函數(shù)（即：讓 AI 自己跟自己下棋，學習判斷局面好壞）

這個方法在很多游戲甚至國際象棋中都很關(guān)鍵，盡管學習在 1997 年首次擊敗世界冠軍的程序中并沒有起主要作用。自我對弈學習，乃至學習本身，和搜索一樣，都是讓大規(guī)模算力發(fā)揮作用的方式。搜索和學習是 AI 研究中利用海量算力的兩類最重要技術(shù)。在圍棋領(lǐng)域，和國際象棋一樣，研究者最初把精力放在利用人類理解上，希望減少搜索量，很久之后才通過擁抱搜索和學習取得了大得多的成功

在語音識別領(lǐng)域，1970 年代 DARPA 資助了一場早期競賽。參賽者中有大量利用人類知識的特殊方法，涉及關(guān)于單詞、音素、人類聲道等等的知識。另一邊是更新的統(tǒng)計方法，計算量更大，基于隱馬爾可夫模型（HMM）。統(tǒng)計方法再次戰(zhàn)勝了人類知識導向的方法。這引發(fā)了整個自然語言處理領(lǐng)域幾十年的漸變，統(tǒng)計和計算開始主導這個領(lǐng)域。近年來深度學習在語音識別中的崛起，是這個方向上最新的一步。深度學習方法對人類知識的依賴更少，使用更多算力，在海量訓練集上學習，產(chǎn)生了效果好得多的語音識別系統(tǒng)。和游戲領(lǐng)域一樣，研究者總是試圖讓系統(tǒng)按照他們認為自己大腦工作的方式來運作。他們試圖把這些知識嵌入系統(tǒng)。但最終證明這是適得其反的，是對研究者時間的巨大浪費。因為通過摩爾定律，海量算力變得可用，而且找到了利用它的方法

計算機視覺領(lǐng)域也是同樣的模式。早期方法把視覺理解為搜索邊緣、廣義圓柱體、或者 SIFT 特征。但今天這些全被拋棄了。現(xiàn)代深度學習神經(jīng)網(wǎng)絡只使用卷積和某些不變性的概念，效果卻好得多

這是一個重大教訓。作為一個領(lǐng)域，我們?nèi)匀粵]有徹底學會它，因為我們還在犯同樣的錯誤。要看清這一點并有效抵制它，我們必須理解這些錯誤為什么有吸引力。我們必須學會這個苦澀的教訓：把我們自以為的思維方式嵌入系統(tǒng)，長期來看行不通

苦澀的教訓基于以下歷史觀察：
1）AI 研究者經(jīng)常試圖把知識嵌入智能體
2）這在短期內(nèi)總是有幫助的，而且讓研究者個人很有成就感
3）但長期來看會遇到瓶頸，甚至阻礙進一步發(fā)展
4）突破性進展最終來自相反的方法，即通過搜索和學習來擴展算力

最終的成功帶著苦澀，而且往往消化不完全，因為它是對一種受偏愛的、以人類為中心的方法的勝利

從苦澀的教訓中應該學到的第一點是：通用方法的力量是巨大的。這些方法能隨著算力增加而持續(xù)擴展，即使算力變得非常大也能繼續(xù)擴展。能夠這樣無限擴展的方法似乎只有兩種：搜索和學習

第二點是：心智的實際內(nèi)容極其復雜，而且這種復雜性無法簡化。我們應該停止尋找簡單的方式來思考心智的內(nèi)容，比如關(guān)于空間、物體、多智能體或?qū)ΨQ性的簡單概念。所有這些都是外部世界的一部分，而外部世界是任意的、內(nèi)在復雜的。我們不應該把這些內(nèi)容嵌入系統(tǒng)，因為它們的復雜性是無窮的。我們應該只嵌入能夠發(fā)現(xiàn)和捕捉這種任意復雜性的元方法。這些方法的關(guān)鍵在于：它們能夠找到好的近似解，但尋找的過程應該由我們的方法來完成，而不是由我們?nèi)祟愑H自來完成

我們想要的是能夠像我們一樣去發(fā)現(xiàn)的 AI 智能體，而不是包含我們已有發(fā)現(xiàn)的 AI 智能體

把我們的發(fā)現(xiàn)嵌入系統(tǒng)，只會讓我們更難看清發(fā)現(xiàn)過程本身是如何運作的

The Bitter Lesson

The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin. The ultimate reason for this is Moore's law, or rather its generalization of continued exponentially falling cost per unit of computation. Most AI research has been conducted as if the computation available to the agent were constant (in which case leveraging human knowledge would be one of the only ways to improve performance) but, over a slightly longer time than a typical research project, massively more computation inevitably becomes available. Seeking an improvement that makes a difference in the shorter term, researchers seek to leverage their human knowledge of the domain, but the only thing that matters in the long run is the leveraging of computation. These two need not run counter to each other, but in practice they tend to. Time spent on one is time not spent on the other. There are psychological commitments to investment in one approach or the other. And the human-knowledge approach tends to complicate methods in ways that make them less suited to taking advantage of general methods leveraging computation. There were many examples of AI researchers' belated learning of this bitter lesson, and it is instructive to review some of the most prominent.

In computer chess, the methods that defeated the world champion, Kasparov, in 1997, were based on massive, deep search. At the time, this was looked upon with dismay by the majority of computer-chess researchers who had pursued methods that leveraged human understanding of the special structure of chess. When a simpler, search-based approach with special hardware and software proved vastly more effective, these human-knowledge-based chess researchers were not good losers. They said that "brute force" search may have won this time, but it was not a general strategy, and anyway it was not how people played chess. These researchers wanted methods based on human input to win and were disappointed when they did not.

A similar pattern of research progress was seen in computer Go, only delayed by a further 20 years. Enormous initial efforts went into avoiding search by taking advantage of human knowledge, or of the special features of the game, but all those efforts proved irrelevant, or worse, once search was applied effectively at scale. Also important was the use of learning by self play to learn a value function (as it was in many other games and even in chess, although learning did not play a big role in the 1997 program that first beat a world champion). Learning by self play, and learning in general, is like search in that it enables massive computation to be brought to bear. Search and learning are the two most important classes of techniques for utilizing massive amounts of computation in AI research. In computer Go, as in computer chess, researchers' initial effort was directed towards utilizing human understanding (so that less search was needed) and only much later was much greater success had by embracing search and learning.

In speech recognition, there was an early competition, sponsored by DARPA, in the 1970s. Entrants included a host of special methods that took advantage of human knowledge---knowledge of words, of phonemes, of the human vocal tract, etc. On the other side were newer methods that were more statistical in nature and did much more computation, based on hidden Markov models (HMMs). Again, the statistical methods won out over the human-knowledge-based methods. This led to a major change in all of natural language processing, gradually over decades, where statistics and computation came to dominate the field. The recent rise of deep learning in speech recognition is the most recent step in this consistent direction. Deep learning methods rely even less on human knowledge, and use even more computation, together with learning on huge training sets, to produce dramatically better speech recognition systems. As in the games, researchers always tried to make systems that worked the way the researchers thought their own minds worked---they tried to put that knowledge in their systems---but it proved ultimately counterproductive, and a colossal waste of researcher's time, when, through Moore's law, massive computation became available and a means was found to put it to good use.

In computer vision, there has been a similar pattern. Early methods conceived of vision as searching for edges, or generalized cylinders, or in terms of SIFT features. But today all this is discarded. Modern deep-learning neural networks use only the notions of convolution and certain kinds of invariances, and perform much better.

This is a big lesson. As a field, we still have not thoroughly learned it, as we are continuing to make the same kind of mistakes. To see this, and to effectively resist it, we have to understand the appeal of these mistakes. We have to learn the bitter lesson that building in how we think we think does not work in the long run. The bitter lesson is based on the historical observations that 1) AI researchers have often tried to build knowledge into their agents, 2) this always helps in the short term, and is personally satisfying to the researcher, but 3) in the long run it plateaus and even inhibits further progress, and 4) breakthrough progress eventually arrives by an opposing approach based on scaling computation by search and learning. The eventual success is tinged with bitterness, and often incompletely digested, because it is success over a favored, human-centric approach.

One thing that should be learned from the bitter lesson is the great power of general purpose methods, of methods that continue to scale with increased computation even as the available computation becomes very great. The two methods that seem to scale arbitrarily in this way are search and learning.

The second general point to be learned from the bitter lesson is that the actual contents of minds are tremendously, irredeemably complex; we should stop trying to find simple ways to think about the contents of minds, such as simple ways to think about space, objects, multiple agents, or symmetries. All these are part of the arbitrary, intrinsically-complex, outside world. They are not what should be built in, as their complexity is endless; instead we should build in only the meta-methods that can find and capture this arbitrary complexity. Essential to these methods is that they can find good approximations, but the search for them should be by our methods, not by us. We want AI agents that can discover like we can, not which contain what we have discovered. Building in our discoveries only makes it harder to see how the discovering process can be done.

特別聲明：以上內(nèi)容(如有圖片或視頻亦包括在內(nèi))為自媒體平臺“網(wǎng)易號”用戶上傳并發(fā)布，本平臺僅提供信息存儲服務。

Notice: The content above (including the pictures and videos if any) is uploaded and posted by a user of NetEase Hao, which is a social media platform and only provides information storage services.