国产av一二三区|日本不卡动作网站|黄色天天久久影片|99草成人免费在线视频|AV三级片成人电影在线|成年人aV不卡免费播放|日韩无码成人一级片视频|人人看人人玩开心色AV|人妻系列在线观看|亚洲av无码一区二区三区在线播放

網(wǎng)易首頁 > 網(wǎng)易號 > 正文 申請入駐

重溫神作《苦澀的教訓》:預判了從 GPT 到 o1/r1 到 Manus,以及更多...

0
分享至


這是 AI 從業(yè)者的必讀指南

周末,讓我們重溫下《苦澀的教訓》這一神作,發(fā)布于 2019 年,預言全中,作者 Rich Sutton,是現(xiàn)代強化學習之父

Rich Sutton 寫下《苦澀的教訓》,核心觀點只有一句:搜索和學習這兩種通用方法,配合算力擴展,最終會碾壓一切精巧的人工設(shè)計

當時主流觀點還是「純堆算力不行,要嵌入人類知識」。然后 GPT-3 來了,Scaling Laws 被驗證了,語言學家設(shè)計了幾十年的 NLP 流水線被一個 Transformer 端到端取代,ChatGPT 爆發(fā)。預言全部兌現(xiàn)

這個預測,現(xiàn)在也正在 Agent 領(lǐng)域繼續(xù)驗證

推理模型把搜索內(nèi)化到模型內(nèi)部,o1、DeepSeek-R1 不需要外部設(shè)計思維鏈,模型自己在 token 空間里搜索推理路徑

Manus 這類 Agent 更進一步(他們在定方向的時候,復用了 Sutton 的結(jié)論:交給模型):模型自己判斷用什么工具、怎么拆解任務、如何執(zhí)行。不再需要人工編排 workflow

這和 Sutton 六年前的判斷完全一致:別折騰精巧設(shè)計了,通用方法配合算力擴展,最終會贏


苦澀的教訓(譯)

AI 研究 70 年,最大的教訓只有一個:利用算力的通用方法,最終總是最有效的,而且優(yōu)勢極其明顯

根本原因在于摩爾定律,或者更準確地說,在于單位算力成本持續(xù)指數(shù)級下降這一更普遍的規(guī)律。大多數(shù) AI 研究都有一個隱含假設(shè):智能體可用的算力是固定的。在這個假設(shè)下,嵌入人類知識幾乎是提升性能的唯一途徑。但只要把時間尺度稍微拉長,超出一個典型研究項目的周期,算力就會出現(xiàn)數(shù)量級的增長

為了在短期內(nèi)做出成果,研究者傾向于利用自己對領(lǐng)域的理解。但長期來看,真正重要的只有一件事:如何利用算力。這兩條路線理論上可以并行,實踐中卻往往相互排斥。時間花在一邊,就沒法花在另一邊。心理上也會形成路徑依賴。更麻煩的是,人類知識導向的方法往往把系統(tǒng)搞得很復雜,反而不利于發(fā)揮通用方法的算力優(yōu)勢。AI 研究者一次又一次地遲到才學會這個苦澀的教訓,回顧幾個最典型的案例很有啟發(fā)

在國際象棋領(lǐng)域,1997 年擊敗卡斯帕羅夫的方法,核心就是大規(guī)模深度搜索。當時大多數(shù)計算機象棋研究者對此很不滿。他們一直在研究如何利用人類對棋局結(jié)構(gòu)的理解。當一個更簡單的、基于搜索的方法配合專用硬件和軟件被證明遠遠更有效時,這些研究者輸?shù)貌⒉惑w面。他們說「暴力搜索」這次贏了,但這不是通用策略,而且人類下棋也不是這么下的。他們希望基于人類知識的方法獲勝,結(jié)果失望了

圍棋領(lǐng)域上演了同樣的劇情,只是晚了 20 年。早期投入了大量精力來避免搜索,想辦法利用人類知識,利用圍棋的特殊結(jié)構(gòu)。但當搜索被有效地大規(guī)模應用后,所有這些努力都變得無關(guān)緊要,甚至適得其反。同樣重要的是通過自我對弈,來學習價值函數(shù)(即:讓 AI 自己跟自己下棋,學習判斷局面好壞)

這個方法在很多游戲甚至國際象棋中都很關(guān)鍵,盡管學習在 1997 年首次擊敗世界冠軍的程序中并沒有起主要作用。自我對弈學習,乃至學習本身,和搜索一樣,都是讓大規(guī)模算力發(fā)揮作用的方式。搜索和學習是 AI 研究中利用海量算力的兩類最重要技術(shù)。在圍棋領(lǐng)域,和國際象棋一樣,研究者最初把精力放在利用人類理解上,希望減少搜索量,很久之后才通過擁抱搜索和學習取得了大得多的成功

在語音識別領(lǐng)域,1970 年代 DARPA 資助了一場早期競賽。參賽者中有大量利用人類知識的特殊方法,涉及關(guān)于單詞、音素、人類聲道等等的知識。另一邊是更新的統(tǒng)計方法,計算量更大,基于隱馬爾可夫模型(HMM)。統(tǒng)計方法再次戰(zhàn)勝了人類知識導向的方法。這引發(fā)了整個自然語言處理領(lǐng)域幾十年的漸變,統(tǒng)計和計算開始主導這個領(lǐng)域。近年來深度學習在語音識別中的崛起,是這個方向上最新的一步。深度學習方法對人類知識的依賴更少,使用更多算力,在海量訓練集上學習,產(chǎn)生了效果好得多的語音識別系統(tǒng)。和游戲領(lǐng)域一樣,研究者總是試圖讓系統(tǒng)按照他們認為自己大腦工作的方式來運作。他們試圖把這些知識嵌入系統(tǒng)。但最終證明這是適得其反的,是對研究者時間的巨大浪費。因為通過摩爾定律,海量算力變得可用,而且找到了利用它的方法

計算機視覺領(lǐng)域也是同樣的模式。早期方法把視覺理解為搜索邊緣、廣義圓柱體、或者 SIFT 特征。但今天這些全被拋棄了。現(xiàn)代深度學習神經(jīng)網(wǎng)絡只使用卷積和某些不變性的概念,效果卻好得多

這是一個重大教訓。作為一個領(lǐng)域,我們?nèi)匀粵]有徹底學會它,因為我們還在犯同樣的錯誤。要看清這一點并有效抵制它,我們必須理解這些錯誤為什么有吸引力。我們必須學會這個苦澀的教訓:把我們自以為的思維方式嵌入系統(tǒng),長期來看行不通

苦澀的教訓基于以下歷史觀察:
1)AI 研究者經(jīng)常試圖把知識嵌入智能體
2)這在短期內(nèi)總是有幫助的,而且讓研究者個人很有成就感
3)但長期來看會遇到瓶頸,甚至阻礙進一步發(fā)展
4)突破性進展最終來自相反的方法,即通過搜索和學習來擴展算力

最終的成功帶著苦澀,而且往往消化不完全,因為它是對一種受偏愛的、以人類為中心的方法的勝利

從苦澀的教訓中應該學到的第一點是:通用方法的力量是巨大的。這些方法能隨著算力增加而持續(xù)擴展,即使算力變得非常大也能繼續(xù)擴展。能夠這樣無限擴展的方法似乎只有兩種:搜索學習

第二點是:心智的實際內(nèi)容極其復雜,而且這種復雜性無法簡化。我們應該停止尋找簡單的方式來思考心智的內(nèi)容,比如關(guān)于空間、物體、多智能體或?qū)ΨQ性的簡單概念。所有這些都是外部世界的一部分,而外部世界是任意的、內(nèi)在復雜的。我們不應該把這些內(nèi)容嵌入系統(tǒng),因為它們的復雜性是無窮的。我們應該只嵌入能夠發(fā)現(xiàn)和捕捉這種任意復雜性的元方法。這些方法的關(guān)鍵在于:它們能夠找到好的近似解,但尋找的過程應該由我們的方法來完成,而不是由我們?nèi)祟愑H自來完成

我們想要的是能夠像我們一樣去發(fā)現(xiàn)的 AI 智能體,而不是包含我們已有發(fā)現(xiàn)的 AI 智能體

把我們的發(fā)現(xiàn)嵌入系統(tǒng),只會讓我們更難看清發(fā)現(xiàn)過程本身是如何運作的


The Bitter Lesson

The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin. The ultimate reason for this is Moore's law, or rather its generalization of continued exponentially falling cost per unit of computation. Most AI research has been conducted as if the computation available to the agent were constant (in which case leveraging human knowledge would be one of the only ways to improve performance) but, over a slightly longer time than a typical research project, massively more computation inevitably becomes available. Seeking an improvement that makes a difference in the shorter term, researchers seek to leverage their human knowledge of the domain, but the only thing that matters in the long run is the leveraging of computation. These two need not run counter to each other, but in practice they tend to. Time spent on one is time not spent on the other. There are psychological commitments to investment in one approach or the other. And the human-knowledge approach tends to complicate methods in ways that make them less suited to taking advantage of general methods leveraging computation. There were many examples of AI researchers' belated learning of this bitter lesson, and it is instructive to review some of the most prominent.

In computer chess, the methods that defeated the world champion, Kasparov, in 1997, were based on massive, deep search. At the time, this was looked upon with dismay by the majority of computer-chess researchers who had pursued methods that leveraged human understanding of the special structure of chess. When a simpler, search-based approach with special hardware and software proved vastly more effective, these human-knowledge-based chess researchers were not good losers. They said that "brute force" search may have won this time, but it was not a general strategy, and anyway it was not how people played chess. These researchers wanted methods based on human input to win and were disappointed when they did not.

A similar pattern of research progress was seen in computer Go, only delayed by a further 20 years. Enormous initial efforts went into avoiding search by taking advantage of human knowledge, or of the special features of the game, but all those efforts proved irrelevant, or worse, once search was applied effectively at scale. Also important was the use of learning by self play to learn a value function (as it was in many other games and even in chess, although learning did not play a big role in the 1997 program that first beat a world champion). Learning by self play, and learning in general, is like search in that it enables massive computation to be brought to bear. Search and learning are the two most important classes of techniques for utilizing massive amounts of computation in AI research. In computer Go, as in computer chess, researchers' initial effort was directed towards utilizing human understanding (so that less search was needed) and only much later was much greater success had by embracing search and learning.

In speech recognition, there was an early competition, sponsored by DARPA, in the 1970s. Entrants included a host of special methods that took advantage of human knowledge---knowledge of words, of phonemes, of the human vocal tract, etc. On the other side were newer methods that were more statistical in nature and did much more computation, based on hidden Markov models (HMMs). Again, the statistical methods won out over the human-knowledge-based methods. This led to a major change in all of natural language processing, gradually over decades, where statistics and computation came to dominate the field. The recent rise of deep learning in speech recognition is the most recent step in this consistent direction. Deep learning methods rely even less on human knowledge, and use even more computation, together with learning on huge training sets, to produce dramatically better speech recognition systems. As in the games, researchers always tried to make systems that worked the way the researchers thought their own minds worked---they tried to put that knowledge in their systems---but it proved ultimately counterproductive, and a colossal waste of researcher's time, when, through Moore's law, massive computation became available and a means was found to put it to good use.

In computer vision, there has been a similar pattern. Early methods conceived of vision as searching for edges, or generalized cylinders, or in terms of SIFT features. But today all this is discarded. Modern deep-learning neural networks use only the notions of convolution and certain kinds of invariances, and perform much better.

This is a big lesson. As a field, we still have not thoroughly learned it, as we are continuing to make the same kind of mistakes. To see this, and to effectively resist it, we have to understand the appeal of these mistakes. We have to learn the bitter lesson that building in how we think we think does not work in the long run. The bitter lesson is based on the historical observations that 1) AI researchers have often tried to build knowledge into their agents, 2) this always helps in the short term, and is personally satisfying to the researcher, but 3) in the long run it plateaus and even inhibits further progress, and 4) breakthrough progress eventually arrives by an opposing approach based on scaling computation by search and learning. The eventual success is tinged with bitterness, and often incompletely digested, because it is success over a favored, human-centric approach.

One thing that should be learned from the bitter lesson is the great power of general purpose methods, of methods that continue to scale with increased computation even as the available computation becomes very great. The two methods that seem to scale arbitrarily in this way are search and learning.

The second general point to be learned from the bitter lesson is that the actual contents of minds are tremendously, irredeemably complex; we should stop trying to find simple ways to think about the contents of minds, such as simple ways to think about space, objects, multiple agents, or symmetries. All these are part of the arbitrary, intrinsically-complex, outside world. They are not what should be built in, as their complexity is endless; instead we should build in only the meta-methods that can find and capture this arbitrary complexity. Essential to these methods is that they can find good approximations, but the search for them should be by our methods, not by us. We want AI agents that can discover like we can, not which contain what we have discovered. Building in our discoveries only makes it harder to see how the discovering process can be done.

特別聲明:以上內(nèi)容(如有圖片或視頻亦包括在內(nèi))為自媒體平臺“網(wǎng)易號”用戶上傳并發(fā)布,本平臺僅提供信息存儲服務。

Notice: The content above (including the pictures and videos if any) is uploaded and posted by a user of NetEase Hao, which is a social media platform and only provides information storage services.

相關(guān)推薦
熱點推薦
起泡膠、捏捏樂、水晶泥等網(wǎng)紅玩具 經(jīng)檢測:部分捏捏樂甲醛超標33倍

起泡膠、捏捏樂、水晶泥等網(wǎng)紅玩具 經(jīng)檢測:部分捏捏樂甲醛超標33倍

閃電新聞
2026-03-02 11:03:00
美以伊最新發(fā)聲

美以伊最新發(fā)聲

看看新聞Knews
2026-03-02 16:59:09
鏡報:伊朗可能抵制世界杯,這對于國際足聯(lián)高層而言非常棘手

鏡報:伊朗可能抵制世界杯,這對于國際足聯(lián)高層而言非常棘手

懂球帝
2026-03-02 23:22:52
莫迪殺紅了眼!吞了中企幾百億,又派人來偷火,中國不能坐以待斃

莫迪殺紅了眼!吞了中企幾百億,又派人來偷火,中國不能坐以待斃

梁訊
2026-02-28 15:40:12
北京首都國際機場工作人員穿明制漢服!

北京首都國際機場工作人員穿明制漢服!

小鹿姐姐情感說
2026-03-02 08:40:09
WTT新加坡大滿貫收官不到24小時,國乒傳來3大重磅消息,2大調(diào)整

WTT新加坡大滿貫收官不到24小時,國乒傳來3大重磅消息,2大調(diào)整

羅納爾說個球
2026-03-02 23:06:07
內(nèi)賈德大難不死活了下來,“反美斗士”終于等來翻身之日?

內(nèi)賈德大難不死活了下來,“反美斗士”終于等來翻身之日?

又是美好的日子
2026-03-03 03:59:32
中核集團的顧軍被查了。最讓人脊背發(fā)涼的是他的第一個身份

中核集團的顧軍被查了。最讓人脊背發(fā)涼的是他的第一個身份

南權(quán)先生
2026-02-02 16:05:36
“兒子下肢已壞了,你還讓他跳繩!”低認知的殘忍,只有自我感動

“兒子下肢已壞了,你還讓他跳繩!”低認知的殘忍,只有自我感動

蝴蝶花雨話教育
2026-02-24 15:29:04
突然大跌,15萬人爆倉!伊朗發(fā)射高超音速導彈,并封鎖霍爾木茲海峽,油價或飆升,國內(nèi)金飾克價突破1600元

突然大跌,15萬人爆倉!伊朗發(fā)射高超音速導彈,并封鎖霍爾木茲海峽,油價或飆升,國內(nèi)金飾克價突破1600元

每日經(jīng)濟新聞
2026-03-01 01:03:36
奔馳GLC價格“大跳水”!最高優(yōu)惠12.5萬,網(wǎng)友:還是選寶馬

奔馳GLC價格“大跳水”!最高優(yōu)惠12.5萬,網(wǎng)友:還是選寶馬

汽車網(wǎng)評
2026-03-02 22:56:03
面對霍爾姆茨海峽的封鎖,中國準備好了嗎?

面對霍爾姆茨海峽的封鎖,中國準備好了嗎?

勝研集
2026-03-02 12:20:33
沒想到這么快,幾個小時就舉了白旗,彈盡糧絕,不投降就沒命了!

沒想到這么快,幾個小時就舉了白旗,彈盡糧絕,不投降就沒命了!

科普100克克
2025-10-05 15:24:42
伊朗前王儲巴列維宣布將返回伊朗領(lǐng)導革命

伊朗前王儲巴列維宣布將返回伊朗領(lǐng)導革命

一種觀點
2026-01-19 19:36:11
Claude祭出「記憶搬家」,60秒搬空ChatGPT靈魂!70萬用戶退訂OpenAI

Claude祭出「記憶搬家」,60秒搬空ChatGPT靈魂!70萬用戶退訂OpenAI

新智元
2026-03-02 12:35:56
39歲李思思離開央視兩年,商演小縣城不擺架子

39歲李思思離開央視兩年,商演小縣城不擺架子

范櫳舍長
2026-03-02 20:28:08
伊朗稱已準備好長期戰(zhàn)爭

伊朗稱已準備好長期戰(zhàn)爭

界面新聞
2026-03-02 20:42:23
向太太敢說了!向華強今年已經(jīng)78了,但是她和向華強還有X生活!

向太太敢說了!向華強今年已經(jīng)78了,但是她和向華強還有X生活!

心靜物娛
2025-12-24 11:02:28
曝伊朗考慮退出世界杯 小組賽3場均在美國踢 4隊按規(guī)有望遞補參賽

曝伊朗考慮退出世界杯 小組賽3場均在美國踢 4隊按規(guī)有望遞補參賽

我愛英超
2026-03-02 22:59:13
43歲阿Sa承認與男友同居,已帶男友見過家長,疑好事將近

43歲阿Sa承認與男友同居,已帶男友見過家長,疑好事將近

扒蝦侃娛
2026-03-02 22:27:05
2026-03-03 04:28:49
賽博禪心
賽博禪心
拜AI古佛,修賽博禪心
309文章數(shù) 45關(guān)注度
往期回顧 全部

科技要聞

蘋果中國官網(wǎng)上線iPhone 17e,4499元起

頭條要聞

特朗普:對伊朗打擊或持續(xù)4至5周 已擊沉其10艘艦艇

頭條要聞

特朗普:對伊朗打擊或持續(xù)4至5周 已擊沉其10艘艦艇

體育要聞

“想要我簽名嗎” 梅西逆轉(zhuǎn)后嘲諷對手主帥

娛樂要聞

李亞鵬與哥哥和解 只有一條真心話短信

財經(jīng)要聞

油價飆升 美伊沖突將如何攪動全球經(jīng)濟

汽車要聞

國民SUV再添一員 瑞虎7L靜態(tài)體驗

態(tài)度原創(chuàng)

房產(chǎn)
健康
手機
數(shù)碼
親子

房產(chǎn)要聞

方案突然曝光!海口北師大附校,又有書包大盤殺出!

轉(zhuǎn)頭就暈的耳石癥,能開車上班嗎?

手機要聞

iPhone 17e外觀與前代幾乎一致 舊款iPhone 16e保護殼可繼續(xù)用

數(shù)碼要聞

高通MWC 2026發(fā)布多項通信技術(shù),定檔2029年開啟6G商用

親子要聞

45歲這年,我這個二胎媽媽決定做一件“瘋狂”的事

無障礙瀏覽 進入關(guān)懷版