Gemini 3.5 Flash
Frontier performance for agents and coding, at Flash speed.
Gemini 3.5 Flash is Google’s latest fast model in the Gemini 3 family. It pairs frontier-level reasoning with Flash-level latency and cost, and is designed for agentic workflows, iterative coding, and long-horizon multi-step tasks.
It understands text, images, audio, code, and video, and works well for both real-time interactive apps and high-volume production pipelines.
What’s new vs Gemini 3 Flash
- 42% better long-range, multi-turn performance on Armadin’s cyber benchmark, with 72% fewer tokens used.
- +19.6% on Box’s enterprise work evaluation vs Gemini 3 Flash.
- +10–20% on low-reasoning coding tasks vs the previous Flash generation.
- Coding and reasoning quality close to Gemini Pro, while keeping Flash’s speed and cost profile.
Benchmarks
Gemini 3.5 Flash leads or matches frontier models across a wide range of evaluations:
| Benchmark | Gemini 3.5 Flash | Gemini 3 Flash |
|---|---|---|
| Terminal-bench 2.1 (agentic terminal coding) | 76.2% | 58.0% |
| SWE-Bench Pro (Public) | 55.1% | 49.6% |
| MCP Atlas (multi-step MCP workflows) | 83.6% | 62.0% |
| Toolathlon (general tool use) | 56.5% | 49.4% |
| OSWorld-Verified (agentic computer use) | 78.4% | 65.1% |
| Finance Agent v2 | 57.9% | 42.6% |
| CharXiv Reasoning | 84.2% | 80.3% |
| MMMU-Pro (multimodal) | 83.6% | 81.2% |
| Humanity’s Last Exam | 40.2% | 33.7% |
| ARC-AGI-2 | 72.1% | 33.6% |
| MRCR v2 (128k long context) | 77.3% | 67.2% |
What it’s good at
- Agentic coding — terminal use, multi-step tool calls, long-running tasks.
- Multimodal reasoning — synthesize information from charts, screenshots, and documents.
- Long-context tasks — strong recall and reasoning across 128k+ token inputs.
- High-volume workflows — extraction, classification, moderation, translation at scale.
- Interactive apps — low latency makes it a good fit for assistants, in-app agents, and live tools.
Inputs
prompt— the text prompt to send to the model.images— up to 10 images (each up to 7 MB).videos— up to 10 videos (each up to 45 minutes).audio— a single audio file (up to 8.4 hours).system_instruction— optional system prompt to steer behavior.thinking_level—none,low, orhigh. Controls how much the model reasons before answering. Higher levels improve quality on hard problems at the cost of latency.temperature,top_p,max_output_tokens— standard sampling controls.
Pricing
| Token type | Price |
|---|---|
| Input tokens | $1.50 / 1M |
| Output tokens | $9.00 / 1M |
You only pay for the tokens you use. There’s no charge for cold starts or idle time.