WED, 03 JUN 2026 · 18:34:03 UTC
Department · Open Source17 tools

The open source AI directory.

Permissive licensing, downloadable weights, fork-friendly code. The tools you can run on your own infrastructure — audited, fine-tuned, or both. China is leading several of the categories; we cover everything on merit.

Why open source AI matters in 2026

Two years ago "open source AI" was Llama 2 and a handful of community fine-tunes. In 2026 it's a serious procurement category — frontier-class models with permissive licensing, mature inference stacks, and a Chinese open-weights ecosystem that has redrawn the competitive map. For any team building on AI, the question is no longer whether to consider open weights; it's when.

The case for open source comes down to four things: audit (you can read the weights and the training recipe), portability (no vendor lock-in, you can switch infrastructure overnight), customisation (fine-tuning on your data, no commercial cap), and cost (per-token serving costs are 5–20× lower at the same quality tier). The case against — capability gap, support overhead, and the operational tax of running your own infrastructure — has narrowed every quarter since Llama 3.1 in mid-2024.

The model layer

The open-weights frontier is now genuinely competitive with closed models. Meta's Llama 3.3 70B matches its own 405B sibling at a fifth of the serving cost. DeepSeek's V3 (671B MoE, 37B active per token) was trained for a reported $5.6M and ships with quality competitive with Claude 3.5 Sonnet — released under a permissive licence that allows commercial use without caveats. Its sibling R1 is the first open reasoning model at o1 quality, released under MIT.

Moonshot's Kimi K2 (1T MoE, 32B active) is the strongest open-weights coding agent at the time of writing — 65.8% on SWE-bench Verified, comparable to Claude Sonnet 4. Alibaba's Qwen 2.5 72B is Apache 2.0 and ships alongside specialist Coder, Math, and Audio variants that share the same backbone. Mistral's European stack — Large 2 for chat, Codestral 25 for autocomplete — extends the picture with EU-resident inference. And the small-model regime is dominated by Microsoft Research's Phi-4 at 14B, MIT-licensed and small enough to run on a single consumer GPU.

The Chinese surge — and why the geopolitics is the wrong frame

The story of 2025 was the Chinese labs proving that frontier capability and permissive licensing aren't mutually exclusive. DeepSeek, Qwen, Kimi, and the smaller open-weight efforts from 01.AI and Yi collectively shipped four world-class model families with quality matching or beating their American closed-weight counterparts — all open under MIT or Apache variants.

Procurement teams sometimes hesitate on Chinese-origin models on jurisdictional grounds. The pragmatic view: the model weights are static files you serve on your own infrastructure, in your own region, with your own observability. The training-data provenance question is real but applies to every closed model equally — at least with open weights you can audit them. Where geopolitical concern is genuinely warranted is the hosted API layer (chat.deepseek.com, chat.qwen.ai) where queries leave your perimeter. That's a different decision than the model itself.

The tooling layer

Around the model layer sits an increasingly mature open-source tooling stack. Aider, Cline, and Continue are the open coding agents — three different takes on bringing Claude-Code-style workflows to your editor, all running against whichever model you point them at. Civitai is the hub for open image-generation models. Hugging Face remains the gravitational centre of model distribution.

The infrastructure layer is similarly open: vLLM and Ollama for serving, LangChain and LlamaIndex for orchestration, Weaviate, Chroma, Qdrant, and Milvus for vector search, LangFuse and Helicone for observability. None of these have a closed-source equivalent that meaningfully outperforms them; the closed alternatives compete on support and integration rather than capability.

When to pick open over closed

Open weights win cleanly when any of these apply: you need to fine-tune on proprietary data, you have strict data-residency or audit requirements, you serve enough volume that per-token cost dominates (typically above 50M tokens/day), or you simply want infrastructure independence from the trio of frontier labs.

Closed models still win for the hardest agentic workloads (Claude Opus 4 leads SWE-bench Verified), the latest reasoning research (o1, o3-mini), and the most integrated multimodal experiences (GPT-4o voice, Gemini 2.5 Pro video). For most production workloads in 2026 the answer is hybrid: open for the bulk, closed for the hard edge cases.

Below: every open-source tool we currently track, grouped by category. Tools with the ◯ Open source tag are downloadable / forkable; Chinese-origin tools are surfaced exactly the same way as Western ones — we don't maintain separate listings.

The list

Agents — open source.

All tools