Kimi K2.6
Kimi K2.6 is Moonshot AI’s open model for agentic coding and long-horizon task execution. It has 1 trillion total parameters and a 262,144 token context window, and it’s designed to run for hours at a time making tool calls, writing code, and coordinating with other agents.
What’s new in K2.6
Moonshot built K2.6 around three ideas:
Long-horizon coding. K2.6 can keep working on a single task for hours without losing its footing. In Moonshot’s showcase, it spent 13 hours overhauling an 8-year-old financial matching engine — 4,000+ lines of code, 12 optimization strategies, 1,000+ tool calls — and came out with a 185% throughput improvement. In another run it downloaded and deployed Qwen3.5-0.8B locally on a Mac, implemented inference in Zig, and tuned it from ~15 to ~193 tokens/sec across 14 iterations.
Coding-driven design. K2.6 turns short prompts into complete frontends with proper layouts, scroll-triggered animations, hero sections, and auth + database wiring. It’s fluent in Rust, Go, Python, TypeScript, and can use image and video generation tools to produce full landing pages.
Agent swarms. K2.6 can decompose a task into heterogeneous subtasks and run them concurrently across up to 300 sub-agents over 4,000 coordinated steps, producing documents, spreadsheets, slides, and websites in a single autonomous run.
Benchmarks
K2.6 posts frontier-level scores on agentic and coding benchmarks and consistently outperforms K2.5:
- Terminal-Bench 2.0: 66.7
- SWE-Bench Pro: 58.6
- SWE-Bench Verified: 80.2
- SWE-Bench Multilingual: 76.7
- LiveCodeBench v6: 89.6
- HLE-Full with tools: 54.0
- BrowseComp: 83.2 (86.3 with agent swarm)
- DeepSearchQA (f1): 92.5
- GPQA-Diamond: 90.5
- AIME 2026: 96.4
- V* with python: 96.9
Using K2.6 on Replicate
This is a streaming chat model. Pass a prompt, optionally a system prompt, optionally images, and read the streamed output.
import replicate
for chunk in replicate.stream(
"moonshotai/kimi-k2.6",
input={
"prompt": "Refactor this Python function to be idempotent and add tests.",
"max_tokens": 4096,
"reasoning_effort": "medium",
},
):
print(chunk, end="", flush=True)
Inputs
prompt— the user message.system_prompt— sets the assistant’s behavior.image_input— one or more images to include alongside the prompt. K2.6 is a vision-language model.reasoning_effort—none(default, fast),low,medium, orhigh. Higher effort produces a longer thinking trace before the final answer.max_tokens— up to 32,768 per response.temperature,top_p,presence_penalty,frequency_penalty— standard sampling controls.
When to turn on reasoning
Reasoning is off by default. When you turn it on, K2.6 generates a hidden thinking trace before it answers, which eats into your max_tokens budget. For math, code, and multi-step problems, turn reasoning on and give it a generous max_tokens (4,000+). For short chat replies, leave it off.
License and links
- Weights: moonshotai/Kimi-K2.6 on Hugging Face
- License: modified MIT
- Launch post: kimi.com/blog/kimi-k2-6
- Moonshot AI: platform.moonshot.ai