WED, 03 JUN 2026 · 18:34:56 UTC

DeepSeek R1

Open weights

by DeepSeek·China·Released

Open-weights reasoning model — o1-comparable quality with full chain-of-thought visible.

textcodereasoningmathcode
Vendor site Paper
· 0 reviews

About this model

DeepSeek R1 (January 2025) was DeepSeek's answer to OpenAI's o1 — and the first open-weights reasoning model to reach o1-comparable quality. R1 is built on the same 671B MoE backbone as V3 but post-trained with large-scale RL on chain-of-thought generation.

Unlike OpenAI's o-series, R1's full reasoning trace is visible to users (which OpenAI hides). The reasoning traces have become a popular dataset for distilling reasoning capability into smaller open-weights models — DeepSeek released several R1-distilled variants (Qwen-based and Llama-based) alongside the main model.

Released under MIT license — the most permissive license used by any frontier-class model. The combination of R1's release timing, open weights, and low API pricing triggered a substantial market reaction and ongoing industry rethinking of competitive moats.

Strengths

  • o1-comparable reasoning quality with open weights
  • MIT license — most permissive of any frontier-class model
  • Visible reasoning traces — usable as training data for smaller models
  • Cheap via DeepSeek API ($2.19/M output)
  • R1-distilled variants extend reasoning to Qwen and Llama base models

Limitations

  • 64K context — smaller than top frontier
  • Reasoning traces can be very long, making total token cost unpredictable
  • Same Chinese-origin procurement friction as V3
  • Less polished UI / developer ergonomics than OpenAI's o-series

When to use it

  • Math and competition-style reasoning at lower cost than o1
  • Open-weights reasoning research and distillation experiments
  • Self-hosted reasoning agents under permissive license
  • Educational tools needing visible chain-of-thought

Architecture & training

DeepSeek's R1 technical report is one of the most-cited papers of 2025 for its detailed account of post-training methodology — including a 'cold-start' SFT phase on long reasoning traces, followed by large-scale RL using a rule-based reward model that emphasises correctness and reasoning-trace coherence. The paper also documents the distillation procedure used for the smaller R1-Distill variants.

Benchmarks

BenchmarkScoreBar
AIME79.8
GPQA71.5
MATH97.3
Codeforces2029.0

Reviews · 0

Sign in to leave a rating.

Compare against

All models →