prunaai/qwen-3.5-27b-fast

This is a version of Qwen 3.5 27B optimised by Pruna AI.

Public
56 runs

Run time and cost

This model runs on Nvidia H100 GPU hardware. We don't yet have enough runs of this model to provide performance information.

Readme

Qwen3.5-27B

Multimodal reasoning model for text, images, and video.

This Replicate endpoint serves an optimized version of Qwen3.5-27B, a 27B-parameter vision-language model designed for instruction following, reasoning, coding, document understanding, and agent-style workflows.

Compared with the original Hugging Face model card, this page focuses on the hosted experience: fast access to a production-ready version of the model without the self-hosting setup.

What it does

Qwen3.5-27B is a general-purpose multimodal model that can:

  • answer questions about text, images, and video
  • reason over diagrams, charts, and visual documents
  • follow complex instructions
  • perform coding and agent-style tasks
  • handle long-context workloads
  • work across many languages

Why use this model

Qwen3.5-27B combines strong language reasoning with native multimodal understanding. It is well suited for:

  • visual question answering
  • document and OCR-heavy workflows
  • coding and technical assistance
  • multilingual assistants
  • long-context analysis
  • agentic applications with tool use

This Replicate deployment runs an optimized version of the model to make it easier to use in production.

Highlights

  • Unified multimodal foundation: one model for text, image, and video understanding
  • Strong reasoning: competitive performance across knowledge, long-context, STEM, and coding evaluations
  • Tool-friendly: built for agentic and tool-calling workflows
  • Long context: native support up to 262,144 tokens, with extension strategies available beyond that
  • Broad language coverage: designed for multilingual use across 200+ languages and dialects

Model details

  • Model: Qwen3.5-27B
  • Architecture: Causal language model with vision encoder
  • Parameters: 27B
  • Training stage: Pre-training and post-training
  • Context length: 262,144 tokens natively, extensible up to 1,010,000 tokens
  • Modality support: text, images, video

Performance overview

Qwen3.5-27B is positioned as a high-capability mid-size multimodal model with strong results across:

  • knowledge and instruction following
  • long-context reasoning
  • STEM and coding
  • multilingual benchmarks
  • visual reasoning and VQA
  • document understanding and OCR
  • video understanding
  • tool use and visual agent tasks

It is especially strong for a model in its size class on multimodal reasoning, document understanding, and agent-oriented evaluation.

Best use cases

Use this model when you need a single endpoint that can handle:

  • chat with image input
  • screenshot or UI understanding
  • OCR and document Q&A
  • video-based question answering
  • multilingual assistants
  • reasoning-heavy product features
  • agent pipelines that mix perception and action

Notes

Qwen3.5 models are designed to reason before answering. Depending on how the endpoint is configured, responses may include internal reasoning-style output or may return direct answers only.

Because this is a hosted and optimized Replicate deployment, behavior and latency may differ from raw self-hosted Hugging Face checkpoints.

Limitations

Like other large multimodal models, Qwen3.5-27B can still:

  • hallucinate facts or visual details
  • make mistakes on fine-grained counting or localization
  • underperform on highly domain-specific inputs without careful prompting
  • produce variable outputs across languages and long contexts

Human review is recommended for high-stakes use cases.

Model created
Model updated