moonshotai/kimi-k2.6

Moonshot AI's frontier open model, built for long-horizon coding, agent swarms, and autonomous software engineering. 1 trillion parameters, 262k context window, vision and tool use.

175 runs

Readme

Kimi K2.6

Kimi K2.6 is Moonshot AI’s open model for agentic coding and long-horizon task execution. It has 1 trillion total parameters and a 262,144 token context window, and it’s designed to run for hours at a time making tool calls, writing code, and coordinating with other agents.

What’s new in K2.6

Moonshot built K2.6 around three ideas:

Long-horizon coding. K2.6 can keep working on a single task for hours without losing its footing. In Moonshot’s showcase, it spent 13 hours overhauling an 8-year-old financial matching engine — 4,000+ lines of code, 12 optimization strategies, 1,000+ tool calls — and came out with a 185% throughput improvement. In another run it downloaded and deployed Qwen3.5-0.8B locally on a Mac, implemented inference in Zig, and tuned it from ~15 to ~193 tokens/sec across 14 iterations.

Coding-driven design. K2.6 turns short prompts into complete frontends with proper layouts, scroll-triggered animations, hero sections, and auth + database wiring. It’s fluent in Rust, Go, Python, TypeScript, and can use image and video generation tools to produce full landing pages.

Agent swarms. K2.6 can decompose a task into heterogeneous subtasks and run them concurrently across up to 300 sub-agents over 4,000 coordinated steps, producing documents, spreadsheets, slides, and websites in a single autonomous run.

Benchmarks

K2.6 posts frontier-level scores on agentic and coding benchmarks and consistently outperforms K2.5:

  • Terminal-Bench 2.0: 66.7
  • SWE-Bench Pro: 58.6
  • SWE-Bench Verified: 80.2
  • SWE-Bench Multilingual: 76.7
  • LiveCodeBench v6: 89.6
  • HLE-Full with tools: 54.0
  • BrowseComp: 83.2 (86.3 with agent swarm)
  • DeepSearchQA (f1): 92.5
  • GPQA-Diamond: 90.5
  • AIME 2026: 96.4
  • V* with python: 96.9

Using K2.6 on Replicate

This is a streaming chat model. Pass a prompt, optionally a system prompt, optionally images, and read the streamed output.

import replicate

for chunk in replicate.stream(
    "moonshotai/kimi-k2.6",
    input={
        "prompt": "Refactor this Python function to be idempotent and add tests.",
        "max_tokens": 4096,
        "reasoning_effort": "medium",
    },
):
    print(chunk, end="", flush=True)

Inputs

  • prompt — the user message.
  • system_prompt — sets the assistant’s behavior.
  • image_input — one or more images to include alongside the prompt. K2.6 is a vision-language model.
  • reasoning_effortnone (default, fast), low, medium, or high. Higher effort produces a longer thinking trace before the final answer.
  • max_tokens — up to 32,768 per response.
  • temperature, top_p, presence_penalty, frequency_penalty — standard sampling controls.

When to turn on reasoning

Reasoning is off by default. When you turn it on, K2.6 generates a hidden thinking trace before it answers, which eats into your max_tokens budget. For math, code, and multi-step problems, turn reasoning on and give it a generous max_tokens (4,000+). For short chat replies, leave it off.

Model created
Model updated