GPT-4.1

Superseded

by OpenAI·USA·Released Apr 14, 2025

OpenAI's coding-focused refresh with a full 1M-token context window.

textvisioncodechattoolslong-context

Vendor site

— · 0 reviews

About this model

GPT-4.1 (April 2025) was OpenAI's coding-focused refresh. The headline feature is a full 1M-token context window — up from 128K in GPT-4o — which matches Gemini 2.5 Pro and unlocks workflows like whole-codebase analysis. The model is also priced lower than GPT-4o ($2/M input, $8/M output) despite the larger context.

On SWE-bench Verified, GPT-4.1 scored 54.6% at launch — a meaningful jump over GPT-4o's mid-30s but still behind Claude Sonnet 4 (72.7%) and DeepSeek V3 (42%). OpenAI positioned GPT-4.1 explicitly as a workhorse for production coding workflows, with GPT-4.1 mini and nano variants for cheaper tiers.

Strengths

•1M-token context — full GPT-4-era multimodal API expanded
•Cheaper than GPT-4o at $2/M input despite larger context
•Strong on instruction-following and structured outputs
•GPT-4.1 mini and nano variants extend the family downward

Limitations

•SWE-bench score (54.6%) trails Claude Sonnet 4 and Opus 4 substantially
•No native audio (use GPT-4o or Whisper for voice)
•Closed weights, no fine-tuning access for the full 1M context

When to use it

→Whole-codebase analysis and refactor planning
→Long-document Q&A (200K+ token inputs)
→Structured-output pipelines (JSON schema generation)
→Production coding agents where Claude isn't an option

Architecture & training

OpenAI has not disclosed the architecture beyond noting that GPT-4.1 was trained 'to better serve developers' with explicit weighting toward coding tasks in the RLHF stage. The 1M-context capability uses positional encoding extensions similar to those documented in earlier OpenAI research on long-context inference; OpenAI cautions in the model card that quality degrades somewhat past the 200K mark.

Benchmarks

Benchmark	Score	Bar
MMLU	88.7
HumanEval	91.5

Reviews · 0

Stories about GPT-4.1

OpenAI BlogJul 18, 2026

Compare against

All models →

GLM-4.5

China

Zhipu's flagship — agentic-first MoE with strong coding + tool-use benchmarks.

Zhipu AI128K ctx$0.60 / $2.20Open

Qwen3-Coder

China

Open-weights coding specialist — 480B MoE, agentic by design.

Alibaba Cloud256K ctx$0.60 / $2.40Open

Kimi K2

China

Open-weights 1T-parameter MoE — agentic, long-context, the model behind Kimi Chat.

Moonshot AI128K ctx$0.60 / $2.50Open

MiniMax-M1

China

Open-weights 456B MoE with the largest free-tier context window in production: 1M tokens.

MiniMax1M ctx$0.30 / $1.65Open

GPT-4.1