google/gemini-3.5-flash

Google's fast multimodal model with frontier reasoning across agents, coding, and long-context tasks

259 runs

Gemini 3.5 Flash

Frontier performance for agents and coding, at Flash speed.

Gemini 3.5 Flash is Google’s latest fast model in the Gemini 3 family. It pairs frontier-level reasoning with Flash-level latency and cost, and is designed for agentic workflows, iterative coding, and long-horizon multi-step tasks.

It understands text, images, audio, code, and video, and works well for both real-time interactive apps and high-volume production pipelines.

What’s new vs Gemini 3 Flash

  • 42% better long-range, multi-turn performance on Armadin’s cyber benchmark, with 72% fewer tokens used.
  • +19.6% on Box’s enterprise work evaluation vs Gemini 3 Flash.
  • +10–20% on low-reasoning coding tasks vs the previous Flash generation.
  • Coding and reasoning quality close to Gemini Pro, while keeping Flash’s speed and cost profile.

Benchmarks

Gemini 3.5 Flash leads or matches frontier models across a wide range of evaluations:

Benchmark Gemini 3.5 Flash Gemini 3 Flash
Terminal-bench 2.1 (agentic terminal coding) 76.2% 58.0%
SWE-Bench Pro (Public) 55.1% 49.6%
MCP Atlas (multi-step MCP workflows) 83.6% 62.0%
Toolathlon (general tool use) 56.5% 49.4%
OSWorld-Verified (agentic computer use) 78.4% 65.1%
Finance Agent v2 57.9% 42.6%
CharXiv Reasoning 84.2% 80.3%
MMMU-Pro (multimodal) 83.6% 81.2%
Humanity’s Last Exam 40.2% 33.7%
ARC-AGI-2 72.1% 33.6%
MRCR v2 (128k long context) 77.3% 67.2%

What it’s good at

  • Agentic coding — terminal use, multi-step tool calls, long-running tasks.
  • Multimodal reasoning — synthesize information from charts, screenshots, and documents.
  • Long-context tasks — strong recall and reasoning across 128k+ token inputs.
  • High-volume workflows — extraction, classification, moderation, translation at scale.
  • Interactive apps — low latency makes it a good fit for assistants, in-app agents, and live tools.

Inputs

  • prompt — the text prompt to send to the model.
  • images — up to 10 images (each up to 7 MB).
  • videos — up to 10 videos (each up to 45 minutes).
  • audio — a single audio file (up to 8.4 hours).
  • system_instruction — optional system prompt to steer behavior.
  • thinking_levelnone, low, or high. Controls how much the model reasons before answering. Higher levels improve quality on hard problems at the cost of latency.
  • temperature, top_p, max_output_tokens — standard sampling controls.

Pricing

Token type Price
Input tokens $1.50 / 1M
Output tokens $9.00 / 1M

You only pay for the tokens you use. There’s no charge for cold starts or idle time.

Model created
Model updated