prunaai/qwen-3.5-35b-a3b-fast

This is a version of the MoE Qwen 3.5 35B optimised by Pruna AI.

Public
40 runs

Qwen3.5-35B-A3B

Multimodal reasoning model for text, images, and video.

This Replicate endpoint serves an optimized version of Qwen3.5-35B-A3B, a 35B-parameter vision-language MoE (Mixture of Experts) model designed for instruction following, reasoning, coding, document understanding, and agent-style workflows.

Compared with the original Hugging Face model card, this page focuses on the hosted experience: fast access to a production-ready version of the model without the self-hosting setup. What it does

Qwen3.5-35B-A3B is a general-purpose multimodal model that can:

answer questions about text, images, and video
reason over diagrams, charts, and visual documents
follow complex instructions
perform coding and agent-style tasks
handle long-context workloads
work across many languages

Why use this model

Qwen3.5-35B-A3B combines strong language reasoning with native multimodal understanding. It is well suited for:

visual question answering
document and OCR-heavy workflows
coding and technical assistance
multilingual assistants
long-context analysis
agentic applications with tool use

This Replicate deployment runs an optimized version of the model to make it easier to use in production. Highlights

Unified multimodal foundation: one model for text, image, and video understanding
Strong reasoning: competitive performance across knowledge, long-context, STEM, and coding evaluations
Tool-friendly: built for agentic and tool-calling workflows
Long context: native support up to 262,144 tokens, with extension strategies available beyond that
Broad language coverage: designed for multilingual use across 200+ languages and dialects

Model details

Model: Qwen3.5-35B-A3B
Architecture: Causal language model with vision encoder
Parameters: 35B
Training stage: Pre-training and post-training
Context length: 262,144 tokens natively, extensible up to 1,010,000 tokens
Modality support: text, images, video

Performance overview

Qwen3.5-35B-A3B is positioned as a high-capability mid-size multimodal model with strong results across:

knowledge and instruction following
long-context reasoning
STEM and coding
multilingual benchmarks
visual reasoning and VQA
document understanding and OCR
video understanding
tool use and visual agent tasks

It is especially strong for a model in its size class on multimodal reasoning, document understanding, and agent-oriented evaluation. Best use cases

Use this model when you need a single endpoint that can handle:

chat with image input
screenshot or UI understanding
OCR and document Q&A
video-based question answering
multilingual assistants
reasoning-heavy product features
agent pipelines that mix perception and action

Notes

Qwen3.5 models are designed to reason before answering. Depending on how the endpoint is configured, responses may include internal reasoning-style output or may return direct answers only.

Because this is a hosted and optimized Replicate deployment, behavior and latency may differ from raw self-hosted Hugging Face checkpoints.

Limitations

Like other large multimodal models, Qwen3.5-35B-A3B can still:

hallucinate facts or visual details
make mistakes on fine-grained counting or localization
underperform on highly domain-specific inputs without careful prompting
produce variable outputs across languages and long contexts

Human review is recommended for high-stakes use cases.

Model created
Model updated