prunaai/qwen-3.5-27b-fast

This is a version of Qwen 3.5 27B optimised by Pruna AI.

Public
56 runs

Qwen3.5-27B

Multimodal reasoning model for text, images, and video.

This Replicate endpoint serves an optimized version of Qwen3.5-27B, a 27B-parameter vision-language model designed for instruction following, reasoning, coding, document understanding, and agent-style workflows.

Compared with the original Hugging Face model card, this page focuses on the hosted experience: fast access to a production-ready version of the model without the self-hosting setup.

What it does

Qwen3.5-27B is a general-purpose multimodal model that can:

  • answer questions about text, images, and video
  • reason over diagrams, charts, and visual documents
  • follow complex instructions
  • perform coding and agent-style tasks
  • handle long-context workloads
  • work across many languages

Why use this model

Qwen3.5-27B combines strong language reasoning with native multimodal understanding. It is well suited for:

  • visual question answering
  • document and OCR-heavy workflows
  • coding and technical assistance
  • multilingual assistants
  • long-context analysis
  • agentic applications with tool use

This Replicate deployment runs an optimized version of the model to make it easier to use in production.

Highlights

  • Unified multimodal foundation: one model for text, image, and video understanding
  • Strong reasoning: competitive performance across knowledge, long-context, STEM, and coding evaluations
  • Tool-friendly: built for agentic and tool-calling workflows
  • Long context: native support up to 262,144 tokens, with extension strategies available beyond that
  • Broad language coverage: designed for multilingual use across 200+ languages and dialects

Model details

  • Model: Qwen3.5-27B
  • Architecture: Causal language model with vision encoder
  • Parameters: 27B
  • Training stage: Pre-training and post-training
  • Context length: 262,144 tokens natively, extensible up to 1,010,000 tokens
  • Modality support: text, images, video

Performance overview

Qwen3.5-27B is positioned as a high-capability mid-size multimodal model with strong results across:

  • knowledge and instruction following
  • long-context reasoning
  • STEM and coding
  • multilingual benchmarks
  • visual reasoning and VQA
  • document understanding and OCR
  • video understanding
  • tool use and visual agent tasks

It is especially strong for a model in its size class on multimodal reasoning, document understanding, and agent-oriented evaluation.

Best use cases

Use this model when you need a single endpoint that can handle:

  • chat with image input
  • screenshot or UI understanding
  • OCR and document Q&A
  • video-based question answering
  • multilingual assistants
  • reasoning-heavy product features
  • agent pipelines that mix perception and action

Notes

Qwen3.5 models are designed to reason before answering. Depending on how the endpoint is configured, responses may include internal reasoning-style output or may return direct answers only.

Because this is a hosted and optimized Replicate deployment, behavior and latency may differ from raw self-hosted Hugging Face checkpoints.

Limitations

Like other large multimodal models, Qwen3.5-27B can still:

  • hallucinate facts or visual details
  • make mistakes on fine-grained counting or localization
  • underperform on highly domain-specific inputs without careful prompting
  • produce variable outputs across languages and long contexts

Human review is recommended for high-stakes use cases.

Model created
Model updated