light770/qwen3-embedding-0.6b | Run with an API on Replicate

Readme

Qwen3-Embedding-0.6B is a lightweight yet high-performing text embedding model from Alibaba’s Qwen team, purpose-built for production RAG pipelines and pgvector deployments. Despite its small footprint, it delivers competitive performance on the MTEB benchmark

Flexible Dimensions: Supports Matryoshka Representation Learning (MRL) — generate embeddings from 32 to 1024 dimensions to optimize pgvector storage vs. accuracy trade-offs
Long Context: 32K token context window handles long documents without chunking overhead
Instruction-Aware: Task-specific instructions boost retrieval accuracy by 1–5% — perfect for domain-specific pgvector search
Multilingual: Supports 100+ languages including code, enabling cross-lingual vector search in a single pgvector table

Specification	Value
Parameters	0.6B (600M)
Architecture	Dense Transformer decoder
Layers	28
Context Length	32,768 tokens
Embedding Dimensions	32–1024 (user-configurable)
MRL Support	Yes
License	Apache 2.0
Release Date	June 2025

Model created 1 week, 4 days ago

Run time and cost

Readme