Gemini 3.5 Flash

Frontier performance for agents and coding, at Flash speed.

Gemini 3.5 Flash is Google’s latest fast model in the Gemini 3 family. It pairs frontier-level reasoning with Flash-level latency and cost, and is designed for agentic workflows, iterative coding, and long-horizon multi-step tasks.

It understands text, images, audio, code, and video, and works well for both real-time interactive apps and high-volume production pipelines.

What’s new vs Gemini 3 Flash

42% better long-range, multi-turn performance on Armadin’s cyber benchmark, with 72% fewer tokens used.
+19.6% on Box’s enterprise work evaluation vs Gemini 3 Flash.
+10–20% on low-reasoning coding tasks vs the previous Flash generation.
Coding and reasoning quality close to Gemini Pro, while keeping Flash’s speed and cost profile.

Benchmarks

Gemini 3.5 Flash leads or matches frontier models across a wide range of evaluations:

Benchmark	Gemini 3.5 Flash	Gemini 3 Flash
Terminal-bench 2.1 (agentic terminal coding)	76.2%	58.0%
SWE-Bench Pro (Public)	55.1%	49.6%
MCP Atlas (multi-step MCP workflows)	83.6%	62.0%
Toolathlon (general tool use)	56.5%	49.4%
OSWorld-Verified (agentic computer use)	78.4%	65.1%
Finance Agent v2	57.9%	42.6%
CharXiv Reasoning	84.2%	80.3%
MMMU-Pro (multimodal)	83.6%	81.2%
Humanity’s Last Exam	40.2%	33.7%
ARC-AGI-2	72.1%	33.6%
MRCR v2 (128k long context)	77.3%	67.2%

What it’s good at

Agentic coding — terminal use, multi-step tool calls, long-running tasks.
Multimodal reasoning — synthesize information from charts, screenshots, and documents.
Long-context tasks — strong recall and reasoning across 128k+ token inputs.
High-volume workflows — extraction, classification, moderation, translation at scale.
Interactive apps — low latency makes it a good fit for assistants, in-app agents, and live tools.

Inputs

prompt — the text prompt to send to the model.
images — up to 10 images (each up to 7 MB).
videos — up to 10 videos (each up to 45 minutes).
audio — a single audio file (up to 8.4 hours).
system_instruction — optional system prompt to steer behavior.
thinking_level — none, low, or high. Controls how much the model reasons before answering. Higher levels improve quality on hard problems at the cost of latency.
temperature, top_p, max_output_tokens — standard sampling controls.

Pricing

Token type	Price
Input tokens	$1.50 / 1M
Output tokens	$9.00 / 1M

You only pay for the tokens you use. There’s no charge for cold starts or idle time.

Model created 2 months ago

Model updated 1 month ago