Collections

Get embeddings

These models generate vector representations that capture the semantics of text, images, and more. Embeddings power search, recommendations, and clustering.

Our pick for text: Multilingual E5

For most text applications, we recommend beautyyuyanli/multilingual-e5-large. It’s fast, cheap and produces high-quality embeddings suitable for semantic search, topic modeling, and classification.

Our pick for images: CLIP

CLIP is the go-to model for image similarity search and clustering. Incredibly popular and cost-effective, CLIP embeddings capture the semantic content of images, making it easy to find similar ones. Just pass in an image URL or a text string and you’re good to go.

Best for multimodal: ImageBind

To jointly embed text, images, and audio, ImageBind is in a class of its own. While more expensive than unimodal models, its ability to unify different data types enables unique applications like searching images with text queries or finding relevant audio clips. If you’re working on multimodal search or retrieval, ImageBind is worth the investment.

Recommended models

ibm-granite / granite-embedding-278m-multilingual

Granite-Embedding-278M-Multilingual is a 278M parameter model from the Granite Embeddings suite that can be used to generate high quality text embeddings

Updated 1 month ago

945 runs

zsxkib / jina-clip-v2

Jina-CLIP v2: 0.9B multimodal embedding model with 89-language multilingual support, 512x512 image resolution, and Matryoshka representations

Updated 6 months, 3 weeks ago

213.6K runs

cuuupid / gte-qwen2-7b-instruct

Embed text with Qwen2-7b-Instruct

Updated 10 months, 1 week ago

1M runs

lucataco / snowflake-arctic-embed-l

snowflake-arctic-embed is a suite of text embedding models that focuses on creating high-quality retrieval models optimized for performance

Updated 1 year, 2 months ago

398.4K runs

adirik / e5-mistral-7b-instruct

E5-mistral-7b-instruct language embedding model

Updated 1 year, 3 months ago

645 runs

lucataco / nomic-embed-text-v1

nomic-embed-text-v1 is 8192 context length text encoder that surpasses OpenAI text-embedding-ada-002 and text-embedding-3-small performance on short and long context tasks

Updated 1 year, 4 months ago

8.8K runs

nateraw / bge-large-en-v1.5

BAAI's bge-en-large-v1.5 for embedding text sequences

Updated 1 year, 8 months ago

295.1K runs

andreasjansson / llama-2-13b-embeddings

Llama2 13B with embedding output

Updated 1 year, 8 months ago

237.5K runs

mark3labs / embeddings-gte-base

General Text Embeddings (GTE) model.

Updated 1 year, 9 months ago

1.1M runs

replicate / all-mpnet-base-v2

This is a language model that can be used to obtain document embeddings suitable for downstream tasks like semantic search and clustering.

Updated 2 years ago

2.3M runs