WED, 03 JUN 2026 · 18:34:03 UTC

Grok 3

by xAI·USA·Released

xAI's flagship with real-time X integration and a Think reasoning mode.

textvisionchatreasoningtools
Vendor site
· 0 reviews

About this model

Grok 3 (February 2025) is xAI's flagship — released after the team rapidly scaled their Colossus supercluster to 100K+ Nvidia H100 GPUs (later expanded toward 200K). The model ships with a 'Think' reasoning mode that's roughly analogous to OpenAI's o-series and Google's Gemini Thinking.

Grok 3 is integrated with X (formerly Twitter) — the model has access to real-time public posts, search results, and trending topics, making it uniquely strong on current-events questions where other models are constrained by training cutoffs.

At launch, Grok 3 scored 93.3% on AIME 2025 (in Think mode), making it briefly the top model on competition math. The lead has since narrowed as competitors released their own reasoning models, but Grok 3 remains a strong tier-1 flagship.

Strengths

  • Real-time X integration — uniquely strong on current events
  • Think mode delivers strong competition-math scores (93.3% AIME 2025)
  • Fast inference courtesy of the Colossus supercluster
  • Looser content moderation than competitors — answers questions others refuse

Limitations

  • 128K context — smaller than GPT-4.1 (1M) and Gemini 2.5 (2M)
  • Tied to X Premium+ ecosystem; standalone API less mature than OpenAI / Anthropic
  • Lighter safety training is a feature or a bug depending on use case
  • Limited enterprise compliance certifications

When to use it

  • Real-time news and social-media analysis
  • Current-events Q&A leveraging X integration
  • Competitive-math and STEM tutoring (Think mode)
  • Use cases where stricter content moderation causes friction

Architecture & training

Trained on xAI's Colossus supercluster — built in Memphis in approximately 122 days, scaled to 100K H100 GPUs at the time of the Grok 3 training run. xAI has not disclosed architecture details but has confirmed Grok 3 uses a Mixture-of-Experts design. Post-training is described as 'minimal RLHF, primarily for harmful-output reduction' — explicitly less safety tuning than competitors, by design.

Benchmarks

BenchmarkScoreBar
AIME93.3
GPQA84.6
LiveCodeBench79.4

Reviews · 0

Sign in to leave a rating.

Compare against

All models →