Kimi K2

Open weights

by Moonshot AI·China·Released Jul 11, 2025

Open-weights 1T-parameter MoE — agentic, long-context, the model behind Kimi Chat.

textcodechatreasoningagentstoolslong-context

Vendor site Paper

— · 0 reviews

About this model

Kimi K2 (July 2025) is Moonshot AI's flagship — a 1T-parameter Mixture-of-Experts (32B activated per token) that the lab open-sourced under a modified MIT license, free for commercial use. K2 is the model behind the Kimi consumer chat app (popular in China) and is increasingly used internationally for its strong agentic + coding capabilities.

K2 was notable on release for its 65.8% SWE-bench Verified score — the highest open-weights coding agent benchmark at the time, narrowing the gap to Claude Sonnet 4 considerably. The Moonshot team has emphasised 'agentic capability' as the primary design goal, which shows in the model's tool-use behaviour.

aigpt itself uses Kimi (via OpenRouter) for the news-rewrite and tool-extraction pipelines.

Strengths

•Top open-weights SWE-bench Verified score at launch (65.8%)
•1T-parameter MoE with permissive licensing
•Strong agentic + tool-use behaviour by design
•Aggressive pricing via the official Moonshot API
•Strong Chinese + English bilingual capability

Limitations

•Higher latency from US/EU regions than Western providers
•Tool-call format is Moonshot-specific, not MCP
•Less mature compliance posture for Western enterprise
•128K context — smaller than the original Kimi Chat that made the lab famous for long context

When to use it

→Coding agents needing open weights
→Tool-use workflows where R1 / V3 fall short
→Chinese-market deployments needing top-tier open quality
→Self-hosted alternatives to Claude for agentic workloads

Architecture & training

1T total parameters, 32B activated. The Moonshot K2 technical paper describes a custom Mixture-of-Experts variant with optimisations for agentic post-training. Trained primarily on a Chinese-English mix with substantial code and tool-use training data. The model is served from APAC infrastructure by default, with international deployments via partner providers.

Benchmarks

Benchmark	Score	Bar
MATH	73.5
MMLU	89.5
HumanEval	85.7
SWE-bench Verified	65.8

Kimi K2

About this model

Strengths

Limitations

When to use it

Architecture & training

Benchmarks

Reviews · 0

Compare against

GLM-4.5

Qwen3-Coder

MiniMax-M1

Claude Opus 4

About this model

✓ Strengths

× Limitations

When to use it

Architecture & training

Benchmarks

Reviews · 0

Compare against

GLM-4.5

Qwen3-Coder

MiniMax-M1

Claude Opus 4

Strengths

Limitations