lucataco/qwen3-embedding-8b

The Qwen3 Embedding model series is specifically designed for text embedding and ranking tasks

Public
888.3K runs

Run time and cost

This model costs approximately $0.00098 to run on Replicate, or 1020 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia L40S GPU hardware. Predictions typically complete within 1 seconds.

Readme

Qwen3 Embedding 8B

Turn text into numbers that capture meaning. This lets you build search systems, find similar content, and organize text by what it’s actually about.

What this model does

This model reads text and converts it into a list of numbers called an embedding. The clever part: text with similar meanings gets similar numbers, even if the actual words are different.

Think of it like giving every piece of text coordinates on a map. Related ideas end up close together, unrelated ones far apart. You can then measure how close two pieces of text are by comparing their coordinates.

What you can build with it

Search that understands meaning

Build a search engine that finds relevant answers, not just keyword matches. When someone asks “how do I fix a leaky faucet?”, you’ll surface articles about plumbing repairs, even if they never use the word “leaky.”

Question answering systems

Feed the model your documentation, knowledge base, or support articles. When users ask questions, find the most relevant passages to answer them.

Code search

Let developers search your codebase using plain English. “Function that validates email addresses” will find the right code, even if it’s named something like check_address_format().

Recommendation engines

Find similar articles, products, or content based on what users are reading. The model captures semantic similarity, so it can suggest related items that share meaning but use different vocabulary.

Content classification

Automatically categorize support tickets, emails, or documents based on their meaning rather than just keywords. Group similar items together or sort them into predefined categories.

Multilingual search

Search across languages. A query in English can retrieve results in Chinese, Spanish, or any of the 100+ languages the model supports. Everything lives in the same semantic space.

Key features

This eight billion parameter model ranks first on the MTEB multilingual leaderboard with a score of 70.58 (as of June 2025). It’s built on the Qwen3 foundation model and works across more than 100 languages.

Long context: Process up to 32,000 tokens in a single pass. That’s enough for articles, research papers, long documentation, or entire codebases.

Flexible dimensions: The model uses something called Matryoshka Representation Learning, which means you can choose how detailed you want your embeddings to be. Smaller embeddings are faster to work with, larger ones capture more nuance.

Custom instructions: You can improve results by telling the model what task you’re doing. Adding an instruction like “Given a web search query, retrieve relevant passages that answer the query” typically improves performance by 1-5%.

How to use custom instructions

Instructions help the model understand your specific use case. They work best when written in English, since that’s what the model was primarily trained on.

For web search: “Given a web search query, retrieve relevant passages that answer the query”

For classification: “Classify the following text into one of these categories: technical support, billing question, feature request”

For semantic similarity: “Determine if these two texts express the same idea”

Pass the instruction before your actual text. The model will adjust its embeddings to work better for your task.

Technical details

The model processes text and outputs a vector of floating-point numbers. You measure similarity between vectors using cosine similarity or dot product. Higher scores mean more similar content.

This is an embedding model, not a language model. It doesn’t generate text or answer questions directly. Instead, it creates representations you can use in other systems for retrieval, classification, or similarity search.

The model was trained using a multi-stage approach: contrastive pre-training on large amounts of data, supervised fine-tuning on high-quality labeled examples, then merging multiple candidate models to improve overall performance.

Learn more

For detailed benchmarks, training methodology, and technical architecture, see the Qwen3 Embedding paper and the official documentation.

You can try this model on the Replicate Playground at replicate.com/playground