GPT-4o

by OpenAI·USA·Released May 13, 2024

OpenAI's multimodal flagship — text, vision, audio, and image in one model.

textvisionaudiocodechattoolslong-contextvoice

Vendor site

— · 0 reviews

About this model

GPT-4o (released May 2024) was OpenAI's first model with native voice, vision, and text in a single network. The 'o' stands for 'omni' — it can accept any combination of those modalities as input and respond in voice or text in real time. ChatGPT's Advanced Voice Mode is powered by GPT-4o.

On text tasks GPT-4o is broadly comparable to Claude 3.5 Sonnet — 88.7% MMLU, 90.2% HumanEval — with the multimodal capability as the main differentiator. Pricing has dropped multiple times since launch and currently sits at $2.50/M input, $10/M output as of the most recent revision.

For text-only workloads, newer specialist models (GPT-4.1 for long context, o1/o3-mini for reasoning) often outperform GPT-4o. But for any workflow that needs vision + voice + text in one model, GPT-4o remains the default OpenAI choice.

Strengths

•Native multimodal: voice, vision, text, image generation in one model
•Real-time Advanced Voice Mode in ChatGPT
•Aggressive price drops since launch — now $2.50/M input
•Mature ecosystem of fine-tuning and tool support

Limitations

•Beaten by GPT-4.1 on coding and long-context tasks
•Beaten by o1/o3-mini on reasoning-heavy tasks
•128K context — smaller than GPT-4.1 (1M) or Gemini 2.5 Pro (2M)
•No native video generation (Sora is a separate model)

When to use it

→Voice-enabled assistants (ChatGPT Advanced Voice Mode)
→Multimodal chat with image upload and analysis
→Customer support combining voice + vision
→General-purpose default when one model needs all modalities

Architecture & training

OpenAI has not disclosed parameter count for GPT-4o. The technical innovation is the unified end-to-end architecture: voice input is tokenised directly into the same model rather than routed through a separate ASR system, which is why response latency is much lower than the original GPT-4-with-Whisper voice pipeline. Post-training uses RLHF and follows OpenAI's standard model spec.

Benchmarks

Benchmark	Score	Bar
MATH	76.6
MMLU	88.7
HumanEval	90.2

GPT-4o

About this model

Strengths

Limitations

When to use it

Architecture & training

Benchmarks

Reviews · 0

Stories about GPT-4o

OpenAI Outlines Strategies for AI Investments and Business Models in New Blog Series

OpenAI Sitemap Shows Widespread Business, Partner, and Academy Page Updates

OpenAI Blog Links Reveal Wave of Enterprise AI Case Studies and Internal Tools

OpenAI's Latest Blog Posts Show AI Tackling Diseases and Black Holes

Compare against

GLM-4.5

Qwen3-Coder

Kimi K2

MiniMax-M1

About this model

✓ Strengths

× Limitations

When to use it

Architecture & training

Benchmarks

Reviews · 0

Stories about GPT-4o

OpenAI Outlines Strategies for AI Investments and Business Models in New Blog Series

OpenAI Sitemap Shows Widespread Business, Partner, and Academy Page Updates

OpenAI Blog Links Reveal Wave of Enterprise AI Case Studies and Internal Tools

OpenAI's Latest Blog Posts Show AI Tackling Diseases and Black Holes

Compare against

GLM-4.5

Qwen3-Coder

Kimi K2

MiniMax-M1

Strengths

Limitations