網(wǎng)易首頁(yè) > 網(wǎng)易號(hào) > 正文申請(qǐng)入駐

Qwen3：Mac上也能“思深，行速”！模型詳解與本地運(yùn)行指南

2025-05-02 22:54:14　來(lái)源: JavaEdge

上海舉報(bào)

分享至

點(diǎn)擊下方“JavaEdge”，選擇“設(shè)為星標(biāo)”

第一時(shí)間關(guān)注技術(shù)干貨！

免責(zé)聲明~ 任何文章不要過(guò)度深思！萬(wàn)事萬(wàn)物都經(jīng)不起審視，因?yàn)槭郎蠜]有同樣的成長(zhǎng)環(huán)境，也沒有同樣的認(rèn)知水平，更「沒有適用于所有人的解決方案」；不要急著評(píng)判文章列出的觀點(diǎn)，只需代入其中，適度審視一番自己即可，能「跳脫出來(lái)從外人的角度看看現(xiàn)在的自己處在什么樣的階段」才不為俗人。怎么想、怎么做，全在乎自己「不斷實(shí)踐中尋找適合自己的大道」

0 引言

Qwen3-235B-A22B在代碼、數(shù)學(xué)、通用能力等基準(zhǔn)測(cè)試中，與 DeepSeek-R1、o1、o3-mini、Grok-3 和 Gemini-2.5-Pro 等頂級(jí)模型相比，表現(xiàn)極具競(jìng)爭(zhēng)力結(jié)果。

小型 MoE 模型Qwen3-30B-A3B的激活參數(shù)數(shù)量是 QwQ-32B 的 10%，表現(xiàn)更勝一籌：

甚至 Qwen3-4B 小模型也能匹敵 Qwen2.5-72B-Instruct 性能：

開源

兩個(gè) MoE 模型的權(quán)重：

Qwen3-235B-A22B，擁有 2350 多億總參數(shù)和 220 多億激活參數(shù)的大模型
Qwen3-30B-A3B，擁有約 300 億總參數(shù)和 30 億激活參數(shù)的小型 MoE 模型

六個(gè) Dense 模型也已開源，包括Qwen3-32B、Qwen3-14B、Qwen3-8B、Qwen3-4B、Qwen3-1.7B和Qwen3-0.6B，均在 Apache 2.0 許可下開源。

Models

Layers

Heads (Q / KV)

Tie Embedding

Context Length

Qwen3-0.6B

16 / 8

Yes

32K

Qwen3-1.7B

16 / 8

Yes

32K

Qwen3-4B

32 / 8

Yes

32K

Qwen3-8B

32 / 8

128K

Qwen3-14B

40 / 8

128K

Qwen3-32B

64 / 8

128K

Models

Layers

Heads (Q / KV)

# Experts (Total / Activated)

Context Length

Qwen3-30B-A3B

32 / 4

128 / 8

128K

Qwen3-235B-A22B

64 / 4

128 / 8

128K

經(jīng)過(guò)后訓(xùn)練的模型，如Qwen3-30B-A3B及它們的預(yù)訓(xùn)練基座模型（如Qwen3-30B-A3B-Base），現(xiàn)已在Hugging Face、ModelScope和Kaggle等平臺(tái)開放使用。

部署

推薦用SGLang和vLLM等框架

本地使用

推薦Ollama、LMStudio、MLX、llama.cpp和KTransformers。它們已確保用戶可輕松將 Qwen3 集成到他們的工作流程，無(wú)論用于研究、開發(fā)還是生產(chǎn)環(huán)境。

Qwen3發(fā)布和開源將極大地推動(dòng)大型基礎(chǔ)模型研究與開發(fā)。為全球研究人員、開發(fā)者和組織賦能，幫助他們利用這些前沿模型構(gòu)建創(chuàng)新解決方案。

試用入口

Qwen Chat 網(wǎng)頁(yè)版 (chat.qwen.ai) 和
手機(jī)APP

1 亮點(diǎn) 1.1 多種思考模式

Qwen3支持兩種思考模式：

思考模式：會(huì)逐步推理，經(jīng)深思熟慮后給出最終答案。適合需深入思考的復(fù)雜問題
非思考模式：提供快速、近乎即時(shí)響應(yīng)，適用速度要求高于深度的簡(jiǎn)單問題

這種靈活性使用戶能夠根據(jù)具體任務(wù)控制模型進(jìn)行“思考”度，如：

復(fù)雜的問題可以通過(guò)擴(kuò)展推理步驟來(lái)解決
簡(jiǎn)單的問題可直接快速作答，無(wú)需延遲

兩種模式結(jié)合大大增強(qiáng)模型實(shí)現(xiàn)穩(wěn)定且高效的“思考預(yù)算”控制能力。如上文，Qwen3 展現(xiàn)出可擴(kuò)展且平滑的性能提升，這與分配的計(jì)算推理預(yù)算直接相關(guān)。這樣的設(shè)計(jì)讓用戶能更輕松為不同任務(wù)配置特定預(yù)算，在成本效益和推理質(zhì)量之間實(shí)現(xiàn)更優(yōu)平衡。

1.2 多語(yǔ)言

Qwen3 模型支持119 種語(yǔ)言和方言。這一廣泛的多語(yǔ)言能力為國(guó)際應(yīng)用開辟了新的可能性，讓全球用戶都能受益于這些模型的強(qiáng)大功能。

1.3 增強(qiáng)的 Agent 能力

優(yōu)化 Qwen3 模型的 Agent 和代碼能力，同時(shí)也加強(qiáng)對(duì) MCP 支持。

示例展示 Qwen3 是如何思考并與環(huán)境進(jìn)行交互的。

2 預(yù)訓(xùn)練

Qwen2.5是在 18 萬(wàn)億個(gè) token 上預(yù)訓(xùn)練，Qwen3使用數(shù)據(jù)量幾乎是其兩倍，涵蓋 119 種語(yǔ)言方言。

為構(gòu)建這龐大數(shù)據(jù)集，不僅從網(wǎng)絡(luò)收集數(shù)據(jù)，還從 PDF 文檔中提取信息。使用 Qwen2.5-VL 從這些文檔中提取文本，并用 Qwen2.5 改進(jìn)提取內(nèi)容的質(zhì)量。為了增加數(shù)學(xué)和代碼數(shù)據(jù)的數(shù)量，利用 Qwen2.5-Math 和 Qwen2.5-Coder 這兩個(gè)數(shù)學(xué)和代碼領(lǐng)域的專家模型合成數(shù)據(jù)，合成了包括教科書、問答對(duì)以及代碼片段等多種形式的數(shù)據(jù)。

預(yù)訓(xùn)練過(guò)程分三階段。

2.1 第一階段（S1）

模型在超過(guò) 30 萬(wàn)億個(gè) token 進(jìn)行預(yù)訓(xùn)練，上下文長(zhǎng)度為 4K token。這一階段為模型提供基本的語(yǔ)言技能和通用知識(shí)。

2.2 第二階段（S2）

增加知識(shí)密集型數(shù)據(jù)（如 STEM、編程和推理任務(wù)）的比例來(lái)改進(jìn)數(shù)據(jù)集，隨后模型又在額外 5 萬(wàn)億個(gè) token 上進(jìn)行了預(yù)訓(xùn)練。

2.3 最后階段

使用高質(zhì)量的長(zhǎng)上下文數(shù)據(jù)將上下文長(zhǎng)度擴(kuò)展到 32K token，確保模型能夠有效地處理更長(zhǎng)的輸入。

由于模型架構(gòu)改進(jìn)、訓(xùn)練數(shù)據(jù)增加及更有效訓(xùn)練方法，Qwen3 Dense 基礎(chǔ)模型的整體性能與參數(shù)更多的Qwen2.5基礎(chǔ)模型相當(dāng)。如Qwen3-1.7B/4B/8B/14B/32B-Base 分別與 Qwen2.5-3B/7B/14B/32B/72B-Base 表現(xiàn)相當(dāng)。

特別在 STEM、編碼和推理領(lǐng)域，Qwen3 Dense 基礎(chǔ)模型的表現(xiàn)甚至＞更大規(guī)模的 Qwen2.5 模型。

Qwen3 MoE 基礎(chǔ)模型，僅用 10% 激活參數(shù)達(dá)到與 Qwen2.5 Dense 基礎(chǔ)模型相似性能。帶來(lái)訓(xùn)練和推理成本顯著節(jié)省。

3 后訓(xùn)練

為開發(fā)同時(shí)具備思考推理和快速響應(yīng)能力的混合模型，實(shí)施了四階段的訓(xùn)練流程：

（1）長(zhǎng)思維鏈冷啟動(dòng)
（2）長(zhǎng)思維鏈強(qiáng)化學(xué)習(xí)
（3）思維模式融合
（4）通用強(qiáng)化學(xué)習(xí)

3.1 第一階段

用多樣的的長(zhǎng)思維鏈數(shù)據(jù)對(duì)模型進(jìn)行了微調(diào)，涵蓋了數(shù)學(xué)、代碼、邏輯推理和 STEM 問題等多種任務(wù)和領(lǐng)域。這一過(guò)程旨在為模型配備基本的推理能力。

3.2 第二階段

重點(diǎn)是大規(guī)模強(qiáng)化學(xué)習(xí)，利用基于規(guī)則的獎(jiǎng)勵(lì)來(lái)增強(qiáng)模型的探索和鉆研能力。

3.3 第三階段

在一份包括長(zhǎng)思維鏈數(shù)據(jù)和常用的指令微調(diào)數(shù)據(jù)的組合數(shù)據(jù)上對(duì)模型進(jìn)行微調(diào)，將非思考模式整合到思考模型中。確保了推理和快速響應(yīng)能力的無(wú)縫結(jié)合。

3.4 第四階段

在包括指令遵循、格式遵循和 Agent 能力等在內(nèi)的 20 多個(gè)通用領(lǐng)域的任務(wù)上應(yīng)用了強(qiáng)化學(xué)習(xí)，以進(jìn)一步增強(qiáng)模型的通用能力并糾正不良行為。

4 實(shí)戰(zhàn)Qwen3

在不同框架用Qwen3。

注意，必須使用 Python3.11+。

transformers

在 Hugging Facetransformers中使用 Qwen3-30B-A3B 的標(biāo)準(zhǔn)示例：

from modelscope import AutoModelForCausalLM, AutoTokenizer

model_name = "Qwen/Qwen3-30B-A3B"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True # Switch between thinking and non-thinking modes. Default is True.
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=32768
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 

# parsing thinking content
try:
    # rindex finding 151668 ()
    index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
    index = 0

thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")

print("thinking content:", thinking_content)
print("content:", content)

禁用思考模式，只需修改參數(shù)enable_thinking：

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=False  # True is the default value for enable_thinking.
)

對(duì)于部署，可用sglang>=0.4.6.post1或vllm>=0.8.4來(lái)創(chuàng)建一個(gè)與 OpenAI API 兼容的 API endpoint：

SGLang：

python -m sglang.launch_server --model-path Qwen/Qwen3-30B-A3B --reasoning-parser qwen3

vLLM：

vllm serve Qwen/Qwen3-30B-A3B --enable-reasoning --reasoning-parser deepseek_r1

要禁用思考模式，可移除參數(shù)--reasoning-parser（以及--enable-reasoning）。

本地開發(fā)

可運(yùn)行：

ollama run qwen3:30b-a3b

用 ollama 與模型進(jìn)行交互。也可用 LMStudio 或 llama.cpp 及 ktransformers 等代碼庫(kù)進(jìn)行本地開發(fā)。

高級(jí)用法

提供一種軟切換機(jī)制，允許用戶在enable_thinking=True時(shí)動(dòng)態(tài)控制模型的行為?？稍谟脩籼崾净蛳到y(tǒng)消息中添加/think和/no_think來(lái)逐輪切換模型的思考模式。多輪對(duì)話中，模型會(huì)遵循最近指令。

多輪對(duì)話示例：

from transformers import AutoModelForCausalLM, AutoTokenizer

class QwenChatbot:
    def __init__(self, model_name="Qwen3-30B-A3B/Qwen3-30B-A3B"):
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
        self.model = AutoModelForCausalLM.from_pretrained(model_name)
        self.history = []

    def generate_response(self, user_input):
        messages = self.history + [{"role": "user", "content": user_input}]

        text = self.tokenizer.apply_chat_template(
            messages,
            tokenize=False,
            add_generation_prompt=True
        )

        inputs = self.tokenizer(text, return_tensors="pt")
        response_ids = self.model.generate(**inputs, max_new_tokens=32768)[0][len(inputs.input_ids[0]):].tolist()
        response = self.tokenizer.decode(response_ids, skip_special_tokens=True)

        # Update history
        self.history.append({"role": "user", "content": user_input})
        self.history.append({"role": "assistant", "content": response})

        return response

# Example Usage
if __name__ == "__main__":
    chatbot = QwenChatbot()

    # First input (without /think or /no_think tags, thinking mode is enabled by default)
    user_input_1 = "How many r's in strawberries?"
    print(f"User: {user_input_1}")
    response_1 = chatbot.generate_response(user_input_1)
    print(f"Bot: {response_1}")
    print("----------------------")

    # Second input with /no_think
    user_input_2 = "Then, how many r's in blueberries? /no_think"
    print(f"User: {user_input_2}")
    response_2 = chatbot.generate_response(user_input_2)
    print(f"Bot: {response_2}") 
    print("----------------------")

    # Third input with /think
    user_input_3 = "Really? /think"
    print(f"User: {user_input_3}")
    response_3 = chatbot.generate_response(user_input_3)
    print(f"Bot: {response_3}")

Agent示例

Qwen3 在工具調(diào)用能力方面表現(xiàn)出色。推薦 Qwen-Agent 充分發(fā)揮 Qwen3 的 Agent 能力。Qwen-Agent 內(nèi)部封裝了工具調(diào)用模板和工具調(diào)用解析器，大大降低了代碼復(fù)雜性。

要定義可用的工具，您可以使用 MCP 配置文件，使用 Qwen-Agent 內(nèi)置的工具，或者自行集成其他工具。

安裝依賴：

qwen_agent、uvx、dateutils

# mac安裝 uv 代替 uvx
# brew install uv
uv pip install mcp-server-time mcp-server-fetch
pip install "qwen-agent[code_interpreter]"

from qwen_agent.agents import Assistant

# Define LLM
llm_cfg = {
    'model': 'Qwen3-30B-A3B',

    # Use the endpoint provided by Alibaba Model Studio:
    # 'model_type': 'qwen_dashscope',
    # 'api_key': os.getenv('DASHSCOPE_API_KEY'),

    # Use a custom endpoint compatible with OpenAI API:
    'model_server': 'http://localhost:8000/v1',  # api_base
    'api_key': 'EMPTY',

    # Other parameters:
    # 'generate_cfg': {
    #         # Add: When the response content is ` this is the thought this is the answer;
    #         # Do not add: When the response has been separated by reasoning_content and content.
    #         'thought_in_content': True,
    #     },
}

# Define Tools
tools = [
    {'mcpServers': {  # You can specify the MCP configuration file
            'time': {
                'command': 'uvx',
                'args': ['mcp-server-time', '--local-timezone=Asia/Shanghai']
            },
            "fetch": {
                "command": "uvx",
                "args": ["mcp-server-fetch"]
            }
        }
    },
  'code_interpreter',  # Built-in tools
]

# Define Agent
bot = Assistant(llm=llm_cfg, function_list=tools)

# Streaming generation
messages = [{'role': 'user', 'content': 'https://qwenlm.github.io/blog/ Introduce the latest developments of Qwen'}]
for responses in bot.run(messages=messages):
    pass
print(responses)

mac 上運(yùn)行代碼改造：

from qwen_agent.agents import Assistant

llm_cfg = {
    'model': 'qwen3-32b-mlx',
    'model_server': 'http://127.0.0.1:1234/v1',  # api_base
    'api_key': 'lm-studio',

}

tools = [
    {'mcpServers': {
            'time': {
                'command': 'uv',
                'args': ['run', 'mcp-server-time', '--local-timezone=Asia/Shanghai']
            },
            "fetch": {
                "command": "uv",
                "args": ["run", "mcp-server-fetch"]
            }
        }
    },
  'code_interpreter',
]

bot = Assistant(llm=llm_cfg, function_list=tools)

messages = [{'role': 'user', 'content': 'https://qwenlm.github.io/blog/ Introduce the latest developments of Qwen'}]
for responses in bot.run(messages=messages):
    pass print(responses)

響應(yīng)：

[
    {
        "role": "assistant",
        "content": " \nOkay, let's see. The user wants me to introduce the latest developments of Qwen based on the provided blog URL. First, I need to fetch the content from that URL. The function 'fetch-fetch' is available for this. I'll use it to get the blog's content.\n\nWait, the URL is https://qwenlm.github.io/blog/, so I'll input that into the fetch function. The parameters required are 'url', and optionally 'max_length', 'start_index', and 'raw'. Since the user probably wants a summary, I'll set 'max_length' to 5000 to get enough content without being too long. The default for 'raw' is false, which means it'll simplify the HTML, so that's good. \n\nOnce I fetch the content, I need to process it. The blog might have sections about updates, new features, performance improvements, etc. I'll need to parse the markdown content to extract key points. But since the fetch function returns the content as markdown, I can then summarize that.\n\nAfter fetching, I'll check the response. If it's successful, I'll read through the markdown to find recent posts or sections mentioning Qwen's latest developments. Common topics might include model architecture changes, training data updates, new capabilities like code writing or multi-language support, performance benchmarks, and use cases.\n\nI should structure the answer by highlighting major updates, maybe in bullet points. Also, note any significant improvements or new features mentioned in the blog. If there are specific dates or version numbers, include those for clarity. Make sure to mention if the blog talks about open-source releases or integrations with other tools.\n\nI should be careful not to include outdated information and focus on the latest developments as of the blog's publication date. If there are multiple posts, prioritize the most recent ones. Also, check for any technical details that might be relevant to users interested in Qwen's advancements.\n \n\n",
        "reasoning_content": ""
    },
    {
        "role": "assistant",
        "content": "",
        "reasoning_content": "",
        "function_call": {
            "name": "fetch-fetch",
            "arguments": "{\"url\":\"https://qwenlm.github.io/blog/\", \"max_length\":5000}"
        }
    },
    {
        "role": "function",
        "content": "Contents of https://qwenlm.github.io/blog/:\nQWEN CHAT GitHub Hugging Face ModelScope Kaggle DEMO DISCORD\nIntroduction Today, we are excited to announce the release of Qwen3, the latest addition to the Qwen family of large language models. Our flagship model, Qwen3-235B-A22B, achieves competitive results in benchmark evaluations of coding, math, general capabilities, etc., when compared to other top-tier models such as DeepSeek-R1, o1, o3-mini, Grok-3, and Gemini-2.5-Pro. Additionally, the small MoE model, Qwen3-30B-A3B, outcompetes QwQ-32B with 10 times of activated parameters, and even a tiny model like Qwen3-4B can rival the performance of Qwen2....\n\nQWEN CHAT GITHUB HUGGING FACE MODELSCOPE DISCORD\nIntroduction Last December, we launched QVQ-72B-Preview as an exploratory model, but it had many issues. Today, we are officially releasing the first version of QVQ-Max, our visual reasoning model. This model can not only “understand” the content in images and videos but also analyze and reason with this information to provide solutions. From math problems to everyday questions, from programming code to artistic creation, QVQ-Max has demonstrated impressive capabilities....\n\nQWEN CHAT HUGGING FACE MODELSCOPE DASHSCOPE GITHUB PAPER DEMO DISCORD\nWe release Qwen2.5-Omni, the new flagship end-to-end multimodal model in the Qwen series. Designed for comprehensive multimodal perception, it seamlessly processes diverse inputs including text, images, audio, and video, while delivering real-time streaming responses through both text generation and natural speech synthesis. To try the latest model, feel free to visit Qwen Chat and choose Qwen2.5-Omni-7B. The model is now openly available on Hugging Face, ModelScope, DashScope,and GitHub, with technical documentation available in our Paper....\n\nQWEN CHAT GITHUB HUGGING FACE MODELSCOPE DISCORD\nIntroduction At the end of January this year, we launched the Qwen2.5-VL series of models, which received widespread attention and positive feedback from the community. Building on the Qwen2.5-VL series, we continued to optimize the model using reinforcement learning and open-sourced the new VL model with the beloved 32B parameter scale under the Apache 2.0 license — Qwen2.5-VL-32B-Instruct. Compared to the previously released Qwen2....\n\nQWEN CHAT Hugging Face ModelScope DEMO DISCORD\nScaling Reinforcement Learning (RL) has the potential to enhance model performance beyond conventional pretraining and post-training methods. Recent studies have demonstrated that RL can significantly improve the reasoning capabilities of models. For instance, DeepSeek R1 has achieved state-of-the-art performance by integrating cold-start data and multi-stage training, enabling deep thinking and complex reasoning.\nOur research explores the scalability of Reinforcement Learning (RL) and its impact on enhancing the intelligence of large language models....",
        "reasoning_content": "",
        "name": "fetch-fetch"
    },
    {
        "role": "assistant",
        "content": " \nOkay, let me try to figure out how to approach this. The user asked about the latest developments of Qwen, and I used the fetch tool to get the blog content. Now I need to parse the response from the tool and summarize the key points.\n\nFirst, looking at the fetched content, there are several sections. The first one mentions Qwen3 with a 235B-A22B model, which is the flagship. It says it's competitive with other top models in coding, math, and general capabilities. There's also a smaller MoE model, Qwen3-30B-A3B, and even a 4B version.\n\nNext section talks about QVQ-Max, the visual reasoning model. It can understand images and videos, analyze them, and provide solutions for math problems to artistic creation. This is an official release after a preview version with issues.\n\nThen there's Qwen2.5-Omni, the end-to-end multimodal model that handles text, images, audio, video, and does real-time streaming with speech synthesis. Available on platforms like Hugging Face.\n\nAnother part is about Qwen2.5-VL-32B-Instruct, optimized with reinforcement learning and open-sourced under Apache 2.0 license.\n\nLastly, there's a mention of Scaling Reinforcement Learning (RL) in their research to improve model performance beyond pretraining.\n\nI need to structure these points clearly. Start with the latest models released, their features, performance compared to others, multimodal capabilities, open-source availability, and research advancements. Make sure to highlight key numbers like 235B parameters, 4B for smaller models, and the platforms where they're available. Also note the open-source licenses and any notable achievements like outperforming other models.\n \n\nThe latest developments in Qwen, as outlined on the official blog, include:\n\n1. **Qwen3 Flagship Models**:\n   - Launched **Qwen3-235B-A22B**, a 235-billion-parameter model with competitive performance in coding, math, and general knowledge compared to top models like DeepSeek-R1, o1, Grok-3, and Gemini-2.5-Pro.\n   - Introduced a **small MoE variant (Qwen3-30B-A3B)** with 30B parameters and **Qwen3-4B**, a compact model that rivals the performance of Qwen2 while maintaining efficiency.\n\n2. **Visual Reasoning with QVQ-Max**:\n   - Released the first stable vision-language (VL) model **QVQ-Max**, which can interpret images/videos and perform reasoning for tasks ranging from math problems to creative applications. This follows the preview QVQ-72B-Preview but with improved stability and capabilities.\n\n3. **Multimodal Model Qwen2.5-Omni**:\n   - Launched **Qwen2.5-Omni-7B**, an end-to-end multimodal model that processes text, images, audio, and video with real-time streaming responses (text + speech synthesis). Available for public use on Hugging Face, ModelScope, and GitHub.\n\n4. **Enhanced VL Model Qwen2.5-VL-32B-Instruct**:\n   - Optimized the **Qwen2.5-VL series** using reinforcement learning (RL), open-sourced under Apache 2.0 license, and expanded to a 32B parameter scale for improved vision-language reasoning.\n\n5. **Research in Scaling Reinforcement Learning**:\n   - Explored RL to enhance model intelligence beyond traditional pretraining, with techniques inspired by DeepSeek R1’s success in multi-stage training for complex reasoning tasks.\n\nAll models are accessible via platforms like Hugging Face, ModelScope, and GitHub. Notably, Qwen3-4B and the open-sourced VL models prioritize accessibility without compromising performance.",
        "reasoning_content": ""
    }
]

5 Qwen的朋友們

歡迎更多新朋友加入社區(qū)，一起變得更好！

6 未來(lái)發(fā)展

Qwen3代表我們通往通用人工智能（AGI）和超級(jí)人工智能（ASI）旅程中的一個(gè)重要里程碑。通過(guò)擴(kuò)大預(yù)訓(xùn)練和強(qiáng)化學(xué)習(xí)的規(guī)模，我們實(shí)現(xiàn)了更高層次的智能。我們無(wú)縫集成了思考模式與非思考模式，為用戶提供了靈活控制思考預(yù)算的能力。此外，我們還擴(kuò)展了對(duì)多種語(yǔ)言的支持，幫助全球更多用戶。

計(jì)劃從多個(gè)維度提升我們的模型。這包括優(yōu)化模型架構(gòu)和訓(xùn)練方法，以實(shí)現(xiàn)幾個(gè)關(guān)鍵目標(biāo)：擴(kuò)展數(shù)據(jù)規(guī)模、增加模型大小、延長(zhǎng)上下文長(zhǎng)度、拓寬模態(tài)范圍，并利用環(huán)境反饋推進(jìn)強(qiáng)化學(xué)習(xí)以進(jìn)行長(zhǎng)周期推理。

我們正從專注于訓(xùn)練模型的時(shí)代過(guò)渡到以訓(xùn)練 Agent 為中心的時(shí)代。下一代迭代將為大家的工作和生活帶來(lái)有意義的進(jìn)步。

本文已收錄在Github Java-Interview-Tutorial，關(guān)注我，緊跟本系列專欄文章，咱們下篇再續(xù)！

魔都架構(gòu)師 | 全網(wǎng)30W技術(shù)追隨者
大廠分布式系統(tǒng)/數(shù)據(jù)中臺(tái)實(shí)戰(zhàn)專家
主導(dǎo)交易系統(tǒng)億級(jí)流量調(diào)優(yōu) & 車聯(lián)網(wǎng)平臺(tái)架構(gòu)
AIGC應(yīng)用開發(fā)先行者 | 區(qū)塊鏈落地實(shí)踐者
以技術(shù)驅(qū)動(dòng)創(chuàng)新，我們的征途是改變世界！
實(shí)戰(zhàn)干貨：編程嚴(yán)選網(wǎng)

關(guān)注我，緊跟本系列專欄文章，咱們下篇再續(xù)！

加我好友，一起AI探索交流：

特別聲明：以上內(nèi)容(如有圖片或視頻亦包括在內(nèi))為自媒體平臺(tái)“網(wǎng)易號(hào)”用戶上傳并發(fā)布，本平臺(tái)僅提供信息存儲(chǔ)服務(wù)。

Notice: The content above (including the pictures and videos if any) is uploaded and posted by a user of NetEase Hao, which is a social media platform and only provides information storage services.