WED, 03 JUN 2026 · 17:45:26 UTC

Kimi K2

Open weights

by Moonshot AI·China·Released

Open-weights 1T-parameter MoE — agentic, long-context, the model behind Kimi Chat.

textcodechatreasoningagentstoolslong-context
Vendor site Paper
· 0 reviews

About this model

Kimi K2 (July 2025) is Moonshot AI's flagship — a 1T-parameter Mixture-of-Experts (32B activated per token) that the lab open-sourced under a modified MIT license, free for commercial use. K2 is the model behind the Kimi consumer chat app (popular in China) and is increasingly used internationally for its strong agentic + coding capabilities.

K2 was notable on release for its 65.8% SWE-bench Verified score — the highest open-weights coding agent benchmark at the time, narrowing the gap to Claude Sonnet 4 considerably. The Moonshot team has emphasised 'agentic capability' as the primary design goal, which shows in the model's tool-use behaviour.

aigpt itself uses Kimi (via OpenRouter) for the news-rewrite and tool-extraction pipelines.

Strengths

  • Top open-weights SWE-bench Verified score at launch (65.8%)
  • 1T-parameter MoE with permissive licensing
  • Strong agentic + tool-use behaviour by design
  • Aggressive pricing via the official Moonshot API
  • Strong Chinese + English bilingual capability

Limitations

  • Higher latency from US/EU regions than Western providers
  • Tool-call format is Moonshot-specific, not MCP
  • Less mature compliance posture for Western enterprise
  • 128K context — smaller than the original Kimi Chat that made the lab famous for long context

When to use it

  • Coding agents needing open weights
  • Tool-use workflows where R1 / V3 fall short
  • Chinese-market deployments needing top-tier open quality
  • Self-hosted alternatives to Claude for agentic workloads

Architecture & training

1T total parameters, 32B activated. The Moonshot K2 technical paper describes a custom Mixture-of-Experts variant with optimisations for agentic post-training. Trained primarily on a Chinese-English mix with substantial code and tool-use training data. The model is served from APAC infrastructure by default, with international deployments via partner providers.

Benchmarks

BenchmarkScoreBar
MATH73.5
MMLU89.5
HumanEval85.7
SWE-bench Verified65.8

Reviews · 0

Sign in to leave a rating.

Compare against

All models →