WED, 03 JUN 2026 · 18:33:33 UTC

o1

by OpenAI·USA·Released

OpenAI's reasoning flagship — chain-of-thought trained via large-scale RL.

textvisionreasoningmathcode
Vendor site Paper
· 0 reviews

About this model

o1 (December 2024) was OpenAI's first publicly-released reasoning model — trained via large-scale reinforcement learning to spend more time 'thinking' before responding. The model generates a long internal chain-of-thought (hidden from the user) and then produces a final answer. On competition math (AIME 2024) o1 scored 83.3%, vs ~13% for GPT-4o, demonstrating that test-time compute scaling works.

o1's reasoning trace is not visible to the user — OpenAI hides it both for product clarity and to discourage distillation of the chain-of-thought into competitor models. The hidden tokens are still billed at the output rate, which can make o1 surprisingly expensive on hard problems.

o1 doesn't support function calling or streaming in the same way as GPT-4o — it's a 'thinking, then answer' model rather than a conversational agent. For agent workloads, o3-mini or GPT-4.1 are usually better choices.

Strengths

  • Massive leap on competition math (AIME, MATH benchmarks)
  • Strong on PhD-level science (GPQA Diamond at 78%)
  • First publicly-released model demonstrating test-time compute scaling works

Limitations

  • No function calling or streaming — not built for agent workflows
  • Hidden reasoning tokens still billed at output rate; hard queries get expensive
  • Slow: typical responses take 10-60 seconds
  • Beaten by o3-mini on most benchmarks at a fraction of the cost

When to use it

  • Hard math and competition-style problems
  • PhD-level scientific reasoning
  • Code generation requiring careful step-by-step thinking
  • One-shot answers where latency doesn't matter

Architecture & training

OpenAI's published o1 system card describes the model as 'trained with reinforcement learning to think before responding.' The key innovation is large-scale RL on chain-of-thought generation — the model learns to produce longer, more useful internal reasoning traces. Architecture details are not disclosed. The o1 family was followed in early 2025 by o3 (then o3-mini) which extended the approach with significantly improved math and code performance.

Benchmarks

BenchmarkScoreBar
AIME83.3
GPQA78.0
MATH94.8

Reviews · 0

Sign in to leave a rating.

Stories about o1

More →

Compare against

All models →