These models generate vector representations that capture the semantics of text, images, and more. Embeddings power search, recommendations, and clustering.
For most text applications, we recommend beautyyuyanli/multilingual-e5-large. It's fast, cheap and produces high-quality embeddings suitable for semantic search, topic modeling, and classification.
CLIP is the go-to model for image similarity search and clustering. Incredibly popular and cost-effective, CLIP embeddings capture the semantic content of images, making it easy to find similar ones. Just pass in an image URL or a text string and you're good to go.
To jointly embed text, images, and audio, ImageBind is in a class of its own. While more expensive than unimodal models, its ability to unify different data types enables unique applications like searching images with text queries or finding relevant audio clips. If you're working on multimodal search or retrieval, ImageBind is worth the investment.
Featured models


beautyyuyanli/multilingual-e5-large
multilingual-e5-large: A multi-language text embedding model
Updated 1 year, 10 months ago
32M runs


daanelson/imagebind
A model for text, audio, and image embeddings in one space
Updated 2 years, 5 months ago
9.4M runs


andreasjansson/clip-features
Return CLIP features for the clip-vit-large-patch14 model
Updated 2 years, 8 months ago
120.4M runs
Recommended Models
If you need quick results for text embeddings, beautyyuyanli/multilingual-e5-large and replicate/all-mpnet-base-v2 are both optimized for speed and work well for search and clustering tasks.
For image or multimodal data, andreasjansson/clip-features and krthr/clip-embeddings provide fast inference times without sacrificing much accuracy.
beautyyuyanli/multilingual-e5-large offers excellent performance for text embeddings in multiple languages while remaining efficient to run.
For images, andreasjansson/clip-features hits the sweet spot—it’s reliable, cost-effective, and widely used in production for similarity search.
If you need multimodal flexibility, daanelson/imagebind provides strong results across text, image, and audio at a higher compute cost.
beautyyuyanli/multilingual-e5-large is the top choice for text-based search, classification, and clustering.
It performs well across languages, making it suitable for multilingual apps, chat search, or semantic retrieval.
Other options like ibm-granite/granite-embedding-278m-multilingual and lucataco/snowflake-arctic-embed-l are great alternatives for large-scale enterprise or research use.
For image-only tasks, andreasjansson/clip-features and krthr/clip-embeddings are the most dependable.
They’re based on CLIP’s ViT-L/14 architecture and produce embeddings that capture both visual detail and semantic meaning—ideal for image similarity, clustering, and cross-modal search.
daanelson/imagebind excels at embedding text, images, and audio in the same space. This makes it perfect for applications like searching for sounds using text prompts or finding images that match an audio clip.
For multilingual or multimodal-heavy datasets, zsxkib/jina-clip-v2 is another strong choice with 89-language support and high-resolution inputs.
All embedding models output numeric vectors, usually arrays of floats.
These vectors can be stored in a database or used directly for semantic search, recommendation systems, clustering, or retrieval-augmented generation (RAG).
Many embedding models, like replicate/all-mpnet-base-v2 or beautyyuyanli/multilingual-e5-large, can be self-hosted using Cog or Docker.
To publish your own embedding model, create a replicate.yaml defining the input and output schema, push it to your account, and Replicate handles deployment.
Yes, most embedding models are licensed for commercial use. That includes E5, CLIP, and ImageBind, though you should always double-check the License tab on each model’s page for any restrictions.
Provide text, image URLs, or audio inputs depending on the model type.
For example:
The output will be a numerical vector you can use for similarity or clustering.
Recommended Models


ibm-granite/granite-embedding-278m-multilingual
Granite-Embedding-278M-Multilingual is a 278M parameter model from the Granite Embeddings suite that can be used to generate high quality text embeddings
Updated 5 months, 4 weeks ago
1.2K runs


zsxkib/jina-clip-v2
Jina-CLIP v2: 0.9B multimodal embedding model with 89-language multilingual support, 512x512 image resolution, and Matryoshka representations
Updated 11 months, 2 weeks ago
657K runs


cuuupid/gte-qwen2-7b-instruct
Embed text with Qwen2-7b-Instruct
Updated 1 year, 3 months ago
1.1M runs


lucataco/snowflake-arctic-embed-l
snowflake-arctic-embed is a suite of text embedding models that focuses on creating high-quality retrieval models optimized for performance
Updated 1 year, 6 months ago
398.5K runs


adirik/e5-mistral-7b-instruct
E5-mistral-7b-instruct language embedding model
Updated 1 year, 8 months ago
649 runs


lucataco/nomic-embed-text-v1
nomic-embed-text-v1 is 8192 context length text encoder that surpasses OpenAI text-embedding-ada-002 and text-embedding-3-small performance on short and long context tasks
Updated 1 year, 9 months ago
34.3K runs


nateraw/bge-large-en-v1.5
BAAI's bge-en-large-v1.5 for embedding text sequences
Updated 2 years, 1 month ago
297.1K runs


center-for-curriculum-redesign/bge_1-5_query_embeddings
Query embedding generator for BAAI's bge-large-en v1.5 embedding model
Updated 2 years, 1 month ago
7.7K runs


andreasjansson/llama-2-13b-embeddings
Llama2 13B with embedding output
Updated 2 years, 1 month ago
243.1K runs


mark3labs/embeddings-gte-base
General Text Embeddings (GTE) model.
Updated 2 years, 2 months ago
1.1M runs


krthr/clip-embeddings
Generate CLIP (clip-vit-large-patch14) text & image embeddings
Updated 2 years, 3 months ago
47.6M runs


replicate/all-mpnet-base-v2
This is a language model that can be used to obtain document embeddings suitable for downstream tasks like semantic search and clustering.
Updated 2 years, 5 months ago
2.4M runs