Qwen3

Open weights

by Alibaba Cloud·China·Released Apr 29, 2025

Alibaba's latest open-weights generation — dense + MoE variants with hybrid reasoning mode.

textcodechatreasoningagentstoolslong-context

Vendor site Paper

— · 0 reviews

About this model

Qwen3 (April 2025) is Alibaba Cloud's latest open-weights generation, shipping eight variants across two architectures: dense (0.6B → 32B) and Mixture-of-Experts (30B-A3B and 235B-A22B). The headline feature is a unified 'hybrid thinking' mode — a single model can flip between fast non-thinking responses and deeper chain-of-thought reasoning controlled by a flag in the prompt, similar to Claude Opus 4's extended thinking but exposed differently.

Qwen3-235B-A22B competes with closed frontier models on most benchmarks (MMLU-Pro 75%, AIME 2025 81.5%) while shipping under Apache 2.0. The dense 32B variant is particularly popular for fine-tuning given its size class. Alibaba has positioned Qwen3 as the lab's serious bet on owning the global open-weights conversation alongside DeepSeek and Meta.

Strengths

•Apache 2.0 — most permissive license in its quality tier
•Hybrid thinking mode toggleable per request
•Eight variants covering 0.6B → 235B param scale
•Strong multilingual: 119 languages supported
•Tight fine-tuning ecosystem (LoRA / QLoRA / vLLM)

Limitations

•Hybrid thinking adds prompt complexity vs single-mode models
•MoE serving still requires specialist infra (vLLM, SGLang)
•English chat quality marginally trails Llama 3.3 70B on subjective tests

When to use it

→Open-weights deployments needing top-tier reasoning
→Cost-sensitive serving where Apache 2.0 matters
→Multilingual applications (119 languages)
→Fine-tuning for vertical specialisation

Architecture & training

Qwen3 is trained on Alibaba's PAI infrastructure. The technical report (arXiv 2505.09388) details a three-stage process: standard pretraining, mid-training on long-context + reasoning data, and post-training using both SFT and RLHF. The MoE variants use 128 experts with top-8 routing. Hybrid thinking mode is achieved via a fine-tuned chat template that responds to a `/think` or `/no_think` flag.

Benchmarks

Benchmark	Score	Bar
MMLU-Pro	75.2
AIME 2025	81.5
LiveCodeBench	70.7

Qwen3

About this model

Strengths

Limitations

When to use it

Architecture & training

Benchmarks

Reviews · 0

Stories about Qwen3

Qwen Debuts Safety Guardrail Model and Expands AI Toolkit

Alibaba's Qwen team launches Qwen3Guard, a safety classifier for live AI outputs

Compare against

Qwen3-Coder

Qwen 2.5-Max

Qwen 2.5-72B Instruct

GLM-4.5

About this model

✓ Strengths

× Limitations

When to use it

Architecture & training

Benchmarks

Reviews · 0

Stories about Qwen3

Qwen Debuts Safety Guardrail Model and Expands AI Toolkit

Alibaba's Qwen team launches Qwen3Guard, a safety classifier for live AI outputs

Compare against

Qwen3-Coder

Qwen 2.5-Max

Qwen 2.5-72B Instruct

GLM-4.5

Strengths

Limitations