Qwen3
Open weightsby Alibaba Cloud·China·Released
Alibaba's latest open-weights generation — dense + MoE variants with hybrid reasoning mode.
About this model
Qwen3 (April 2025) is Alibaba Cloud's latest open-weights generation, shipping eight variants across two architectures: dense (0.6B → 32B) and Mixture-of-Experts (30B-A3B and 235B-A22B). The headline feature is a unified 'hybrid thinking' mode — a single model can flip between fast non-thinking responses and deeper chain-of-thought reasoning controlled by a flag in the prompt, similar to Claude Opus 4's extended thinking but exposed differently.
Qwen3-235B-A22B competes with closed frontier models on most benchmarks (MMLU-Pro 75%, AIME 2025 81.5%) while shipping under Apache 2.0. The dense 32B variant is particularly popular for fine-tuning given its size class. Alibaba has positioned Qwen3 as the lab's serious bet on owning the global open-weights conversation alongside DeepSeek and Meta.
Strengths
- •Apache 2.0 — most permissive license in its quality tier
- •Hybrid thinking mode toggleable per request
- •Eight variants covering 0.6B → 235B param scale
- •Strong multilingual: 119 languages supported
- •Tight fine-tuning ecosystem (LoRA / QLoRA / vLLM)
Limitations
- •Hybrid thinking adds prompt complexity vs single-mode models
- •MoE serving still requires specialist infra (vLLM, SGLang)
- •English chat quality marginally trails Llama 3.3 70B on subjective tests
When to use it
- →Open-weights deployments needing top-tier reasoning
- →Cost-sensitive serving where Apache 2.0 matters
- →Multilingual applications (119 languages)
- →Fine-tuning for vertical specialisation
Architecture & training
Qwen3 is trained on Alibaba's PAI infrastructure. The technical report (arXiv 2505.09388) details a three-stage process: standard pretraining, mid-training on long-context + reasoning data, and post-training using both SFT and RLHF. The MoE variants use 128 experts with top-8 routing. Hybrid thinking mode is achieved via a fine-tuned chat template that responds to a `/think` or `/no_think` flag.
Benchmarks
| Benchmark | Score | Bar |
|---|---|---|
| MMLU-Pro | 75.2 | |
| AIME 2025 | 81.5 | |
| LiveCodeBench | 70.7 |