DeepSeek R1
Open weightsby DeepSeek·China·Released
Open-weights reasoning model — o1-comparable quality with full chain-of-thought visible.
About this model
DeepSeek R1 (January 2025) was DeepSeek's answer to OpenAI's o1 — and the first open-weights reasoning model to reach o1-comparable quality. R1 is built on the same 671B MoE backbone as V3 but post-trained with large-scale RL on chain-of-thought generation.
Unlike OpenAI's o-series, R1's full reasoning trace is visible to users (which OpenAI hides). The reasoning traces have become a popular dataset for distilling reasoning capability into smaller open-weights models — DeepSeek released several R1-distilled variants (Qwen-based and Llama-based) alongside the main model.
Released under MIT license — the most permissive license used by any frontier-class model. The combination of R1's release timing, open weights, and low API pricing triggered a substantial market reaction and ongoing industry rethinking of competitive moats.
Strengths
- •o1-comparable reasoning quality with open weights
- •MIT license — most permissive of any frontier-class model
- •Visible reasoning traces — usable as training data for smaller models
- •Cheap via DeepSeek API ($2.19/M output)
- •R1-distilled variants extend reasoning to Qwen and Llama base models
Limitations
- •64K context — smaller than top frontier
- •Reasoning traces can be very long, making total token cost unpredictable
- •Same Chinese-origin procurement friction as V3
- •Less polished UI / developer ergonomics than OpenAI's o-series
When to use it
- →Math and competition-style reasoning at lower cost than o1
- →Open-weights reasoning research and distillation experiments
- →Self-hosted reasoning agents under permissive license
- →Educational tools needing visible chain-of-thought
Architecture & training
DeepSeek's R1 technical report is one of the most-cited papers of 2025 for its detailed account of post-training methodology — including a 'cold-start' SFT phase on long reasoning traces, followed by large-scale RL using a rule-based reward model that emphasises correctness and reasoning-trace coherence. The paper also documents the distillation procedure used for the smaller R1-Distill variants.
Benchmarks
| Benchmark | Score | Bar |
|---|---|---|
| AIME | 79.8 | |
| GPQA | 71.5 | |
| MATH | 97.3 | |
| Codeforces | 2029.0 |