WED, 03 JUN 2026 · 18:35:42 UTC

GPT-4.1

Superseded

by OpenAI·USA·Released

OpenAI's coding-focused refresh with a full 1M-token context window.

textvisioncodechattoolslong-context
Vendor site
· 0 reviews

About this model

GPT-4.1 (April 2025) was OpenAI's coding-focused refresh. The headline feature is a full 1M-token context window — up from 128K in GPT-4o — which matches Gemini 2.5 Pro and unlocks workflows like whole-codebase analysis. The model is also priced lower than GPT-4o ($2/M input, $8/M output) despite the larger context.

On SWE-bench Verified, GPT-4.1 scored 54.6% at launch — a meaningful jump over GPT-4o's mid-30s but still behind Claude Sonnet 4 (72.7%) and DeepSeek V3 (42%). OpenAI positioned GPT-4.1 explicitly as a workhorse for production coding workflows, with GPT-4.1 mini and nano variants for cheaper tiers.

Strengths

  • 1M-token context — full GPT-4-era multimodal API expanded
  • Cheaper than GPT-4o at $2/M input despite larger context
  • Strong on instruction-following and structured outputs
  • GPT-4.1 mini and nano variants extend the family downward

Limitations

  • SWE-bench score (54.6%) trails Claude Sonnet 4 and Opus 4 substantially
  • No native audio (use GPT-4o or Whisper for voice)
  • Closed weights, no fine-tuning access for the full 1M context

When to use it

  • Whole-codebase analysis and refactor planning
  • Long-document Q&A (200K+ token inputs)
  • Structured-output pipelines (JSON schema generation)
  • Production coding agents where Claude isn't an option

Architecture & training

OpenAI has not disclosed the architecture beyond noting that GPT-4.1 was trained 'to better serve developers' with explicit weighting toward coding tasks in the RLHF stage. The 1M-context capability uses positional encoding extensions similar to those documented in earlier OpenAI research on long-context inference; OpenAI cautions in the model card that quality degrades somewhat past the 200K mark.

Benchmarks

BenchmarkScoreBar
MMLU88.7
HumanEval91.5

Reviews · 0

Sign in to leave a rating.

Stories about GPT-4.1

More →

Compare against

All models →