国产av一二三区|日本不卡动作网站|黄色天天久久影片|99草成人免费在线视频|AV三级片成人电影在线|成年人aV不卡免费播放|日韩无码成人一级片视频|人人看人人玩开心色AV|人妻系列在线观看|亚洲av无码一区二区三区在线播放

網(wǎng)易首頁 > 網(wǎng)易號 > 正文 申請入駐

“流處理 vs. 批處理”是一個錯誤的二分法

0
分享至

Often times, "Stream vs. Batch" is discussed as if it’s oneorthe other, but to me this does not make that much sense really.
很多時候,“Stream vs. Batch”

被討論為非此彼,但對我來說,這并沒有多大意義。

Many streaming systems will apply batching too, i.e. processing or transferring multiple records (a "batch") at once, thus offsetting connection overhead, amortizing the cost of fanning out work to multiple threads, opening the door for highly efficient SIMD processing, etc., all to ensure high performance. The prevailing trend towards storage/compute separation in data streaming and processing architectures (for instance, thinking of platforms such as WarpStream, andDiskless Kafkaat large) further accelerates this development.
許多流系統(tǒng)也將應(yīng)用批處理,即一次處理或傳輸多條記錄(“批處理”),從而抵消連接開銷,將工作扇出的成本分?jǐn)偟蕉鄠€線程,為高效的 SIMD 處理打開大門等,所有這些都是為了確保高性能。數(shù)據(jù)流和處理架構(gòu)中存儲/計算分離的普遍趨勢(例如,考慮 WarpStream 和整個無盤 Kafka等平臺)進一步加速了這一發(fā)展。

Typically, this is happening transparently to users, done in an opportunistic way: handling all of those records (up to some limit) which have arrived in a buffer since the last batch. This makes for a very nice self-regulating system. High arrival rate of records: larger batches, improving throughput. Low arrival rate: smaller batches, perhaps with even just a single record, ensuring low latency. Columnar in-memory data formats likeApache Arroware of great help for implementing such a design.
通常,這對用戶是透明的,以機會主義的方式完成:處理自上一批以來到達(dá)緩沖區(qū)的所有這些記錄(最多達(dá)到某個限制)。這形成了一個非常好的自我調(diào)節(jié)系統(tǒng)。記錄到達(dá)率高:批次更大,提高吞吐量。低到達(dá)率:較小的批次,甚至可能只有一條記錄,確保低延遲。像Apache Arrow這樣的列式內(nèi)存數(shù)據(jù)格式對于實現(xiàn)這樣的設(shè)計有很大幫助。

In contrast, what the "Stream vs. Batch" discussion in my opinion should actually be about, are "Pull vs. Push" semantics: will the system query its sources for new records in a fixed interval, or will new records be pushed to the system as soon as possible? Now, no matter how often you pull, you can’t convert a pull-based solution into a streaming one. Unless a source represents a consumable stream of changes itself (you see where this is going), a pull system may miss updates happening between fetch attempts, as well as deletes.
相比之下,在我看來,“Stream vs. Batch”的討論實際上應(yīng)該是關(guān)于“Pull vs. Push”語義:系統(tǒng)會在固定的時間間隔內(nèi)查詢其源以獲取新記錄,還是會盡快將新記錄推送到系統(tǒng)?現(xiàn)在,無論您多久拉取一次,都無法將基于拉取的解決方案轉(zhuǎn)換為流式解決方案。除非源本身代表可消費的更改流(您知道這是怎么回事),否則拉取系統(tǒng)可能會錯過在獲取嘗試和刪除之間發(fā)生的更新。

This is what makes streaming so interesting and powerful: it provides you with a complete view of your data in real-time. A streaming system lets you put your data to thelocationwhere you need it, in theformatyou need it, and in theshapeyou need it (think denormalization), immediately as it gets produced or updated. The price for this is a potentially higher complexity, for example when reasoning about streaming joins (and their state), or handling out-of-order data. But the streaming community is working continuously to improve things here, e.g. via disaggregated state backends, transactional stream processing, and much more. I’m really excited about all the innovation happening in this space right now.
這就是流式處理如此有趣和強大的原因:它為您提供實時數(shù)據(jù)的完整視圖。流系統(tǒng)允許您將數(shù)據(jù)放在需要的位置、所需的格式形狀(想想非規(guī)范化),在數(shù)據(jù)生成或更新時立即。這樣做的代價是可能更高的復(fù)雜性,例如,在推理流式連接(及其狀態(tài))或處理無序數(shù)據(jù)時。但是流社區(qū)正在不斷努力改進這里的事情,例如通過分解的狀態(tài)后端、事務(wù)流處理等等。我對這個領(lǐng)域現(xiàn)在發(fā)生的所有創(chuàng)新感到非常興奮。

Now, you might wonder: "Do I really need streaming(push), though? I’m fine with batch(pull)."
現(xiàn)在,您可能會想:“不過,我真的需要流式處理(push)嗎?我對批處理(拉)沒問題。

That’s a common and fair question. In my experience, it is best answered by giving it a try yourself. Again and again I have seen how folks who were skeptical at first, very quickly wanted to get real-time streaming for more and more, if not all of their use cases, once they had seen it in action once. If you’ve experienced a data freshness of a second or two in your data warehouse, you don’t want to ever miss this magic again.
這是一個常見且公平的問題。根據(jù)我的經(jīng)驗,最好自己試一試來回答。我一次又一次地看到,起初持懷疑態(tài)度的人們,一旦他們曾經(jīng)看到過實時流,他們很快就希望為越來越多的用例(如果不是全部)獲得實時流。如果您在數(shù)據(jù)倉庫中體驗過一兩秒的數(shù)據(jù)新鮮度,那么您肯定不想再錯過這種神奇之處。

All that being said, it’s actually not even about pullorpush so much—
the approaches complement each other. For instance, backfills often are done via batching, i.e. querying, in an otherwise streaming-based system. Also, if you want the completeness of streaming but don’t require a super low latency, you may decide to suspend your streaming pipelines (thus saving cost) in times of low data volume, resume when there’s new data to process, and halt again.
話雖如此,實際上甚至與拉推無關(guān)——這些方法是相輔相成的。例如,回填通常是通過批處理(即查詢)在其他基于流的系統(tǒng)中完成的。此外,如果您想要流式處理的完整性,但不需要超低延遲,則可以決定在數(shù)據(jù)量較低時暫停流式處理管道(從而節(jié)省成本),在有新數(shù)據(jù)要處理時恢復(fù),然后再次停止。

Batch streaming, if you will.
批量流式處理(如果愿意)。

特別聲明:以上內(nèi)容(如有圖片或視頻亦包括在內(nèi))為自媒體平臺“網(wǎng)易號”用戶上傳并發(fā)布,本平臺僅提供信息存儲服務(wù)。

Notice: The content above (including the pictures and videos if any) is uploaded and posted by a user of NetEase Hao, which is a social media platform and only provides information storage services.

相關(guān)推薦
熱點推薦
許家印突發(fā)消息

許家印突發(fā)消息

新浪財經(jīng)
2026-04-04 18:41:17
鄭麗文“一國兩區(qū)”是比“一國兩制”更寬松,還是變相“獨臺”?

鄭麗文“一國兩區(qū)”是比“一國兩制”更寬松,還是變相“獨臺”?

取經(jīng)的兵
2026-04-04 09:42:41
伊朗伊斯蘭革命衛(wèi)隊最新任命

伊朗伊斯蘭革命衛(wèi)隊最新任命

第一財經(jīng)資訊
2026-04-04 13:38:58
今日油價|4月4日調(diào)整后92/95號汽油價格,下周油價將大漲!

今日油價|4月4日調(diào)整后92/95號汽油價格,下周油價將大漲!

豬友巴巴
2026-04-04 16:20:03
陳光標(biāo)送張雪勞斯萊斯后續(xù)!真實目的被扒,網(wǎng)友一邊倒:太虛偽了

陳光標(biāo)送張雪勞斯萊斯后續(xù)!真實目的被扒,網(wǎng)友一邊倒:太虛偽了

青橘罐頭
2026-04-03 07:21:11
這是張雪峰創(chuàng)業(yè)初期和女兒張姩菡的舊合照

這是張雪峰創(chuàng)業(yè)初期和女兒張姩菡的舊合照

歲月有情1314
2026-04-04 10:26:01
彈射逃生后,美飛行員會怎么做?

彈射逃生后,美飛行員會怎么做?

新京報
2026-04-04 15:48:14
學(xué)醫(yī)后才知道,心衰最危險信號,不是氣喘,而是頻繁出現(xiàn) 4 種異常

學(xué)醫(yī)后才知道,心衰最危險信號,不是氣喘,而是頻繁出現(xiàn) 4 種異常

今日養(yǎng)生之道
2026-04-04 13:45:35
一天兩架美軍戰(zhàn)機被擊落,特朗普“贏” 不下去了 | 京釀館

一天兩架美軍戰(zhàn)機被擊落,特朗普“贏” 不下去了 | 京釀館

新京報評論
2026-04-04 15:40:55
東部戰(zhàn)區(qū)發(fā)海報!描繪統(tǒng)一后臺灣省清明節(jié)場景

東部戰(zhàn)區(qū)發(fā)海報!描繪統(tǒng)一后臺灣省清明節(jié)場景

看看新聞Knews
2026-04-03 23:47:04
重慶官方通報廣陽島固定三角翼飛行器墜落,目擊者:從頭頂飛過,發(fā)動機聲音有些不對,不到10秒就墜機了

重慶官方通報廣陽島固定三角翼飛行器墜落,目擊者:從頭頂飛過,發(fā)動機聲音有些不對,不到10秒就墜機了

極目新聞
2026-04-04 19:19:00
有一種從不坑窮人的奢侈品,叫巴黎世家

有一種從不坑窮人的奢侈品,叫巴黎世家

不惑豬的頻道
2026-04-03 17:31:54
“美軍特種部隊已進入伊朗”

“美軍特種部隊已進入伊朗”

觀察者網(wǎng)
2026-04-04 20:03:27
朝鮮宣布停用中國衛(wèi)星,改用俄羅斯衛(wèi)星,無形中幫了中國一個忙

朝鮮宣布停用中國衛(wèi)星,改用俄羅斯衛(wèi)星,無形中幫了中國一個忙

花寒弦絮
2026-04-04 00:48:59
"豬肝紅"!上海人被堵傻眼,多個服務(wù)區(qū)一度進不去

"豬肝紅"!上海人被堵傻眼,多個服務(wù)區(qū)一度進不去

看看新聞Knews
2026-04-04 20:07:10
工信部連夜緊急提醒:你的iPhone正在被“看光”?請立即執(zhí)行這個操作

工信部連夜緊急提醒:你的iPhone正在被“看光”?請立即執(zhí)行這個操作

圓維度
2026-04-03 21:01:05
張雪因手掌太紅被網(wǎng)友提醒及時就醫(yī)!此前回應(yīng):肝沒問題!醫(yī)生提醒

張雪因手掌太紅被網(wǎng)友提醒及時就醫(yī)!此前回應(yīng):肝沒問題!醫(yī)生提醒

封面新聞
2026-04-04 00:47:37
為營救F-15飛行員,美軍特種兵冒死突入伊朗,地面戰(zhàn)激烈交火

為營救F-15飛行員,美軍特種兵冒死突入伊朗,地面戰(zhàn)激烈交火

共工之錨
2026-04-04 00:05:37
他問馬克思無產(chǎn)階級革命者掌權(quán)后還是無產(chǎn)階級嗎?不久他就被開除

他問馬克思無產(chǎn)階級革命者掌權(quán)后還是無產(chǎn)階級嗎?不久他就被開除

愛競彩的小周
2026-04-04 04:11:13
美國F-15E戰(zhàn)斗機被擊落,飛行員逃生,現(xiàn)已全部救回

美國F-15E戰(zhàn)斗機被擊落,飛行員逃生,現(xiàn)已全部救回

金召點評
2026-04-04 14:30:23
2026-04-04 21:28:49
親愛的數(shù)據(jù) incentive-icons
親愛的數(shù)據(jù)
《我看見了風(fēng)暴:人工智能基建革命》一書作者
693文章數(shù) 219913關(guān)注度
往期回顧 全部

科技要聞

內(nèi)存一年漲四倍!國產(chǎn)手機廠商集體漲價

頭條要聞

馬克龍在韓國發(fā)表講話:"中等強國"不能成為中美附庸

頭條要聞

馬克龍在韓國發(fā)表講話:"中等強國"不能成為中美附庸

體育要聞

剎不住的泰格·伍茲,口袋里的兩粒藥丸

娛樂要聞

Q女士反擊,否認(rèn)逼宋寧峰張婉婷離婚

財經(jīng)要聞

中微董事長,給半導(dǎo)體潑點冷水

汽車要聞

17萬級海豹07EV 不僅續(xù)航長還有9分鐘滿電的快樂

態(tài)度原創(chuàng)

房產(chǎn)
家居
親子
藝術(shù)
公開課

房產(chǎn)要聞

小陽春全面啟動!現(xiàn)房,才是這波行情里最穩(wěn)的上車票

家居要聞

溫馨多元 愛的具象化

親子要聞

這條線干嘛用的

藝術(shù)要聞

1111米!深圳要蓋世界第一高樓?結(jié)果連地基都沒挖

公開課

李玫瑾:為什么性格比能力更重要?

無障礙瀏覽 進入關(guān)懷版