Get embeddings
These models generate vector representations that capture the semantics of text, images, and more. Embeddings power search, recommendations, and clustering.
Our Pick for Text: all-mpnet-base-v2
For most text applications, we recommend all-mpnet-base-v2. It’s fast, cheap ($0.00022/run), and produces high-quality embeddings suitable for semantic search, topic modeling, and classification. With 800K+ runs, it’s a proven performer.
Our Pick for Images: CLIP
CLIP is the go-to model for image similarity search and clustering. Incredibly popular (52M runs) and cost-effective, CLIP embeddings capture the semantic content of images, making it easy to find similar ones. Just pass in an image URL and you’re good to go.
Best for Multimodal: ImageBind
To jointly embed text, images, and audio, ImageBind is in a class of its own. While more expensive than unimodal models, its ability to unify different data types enables unique applications like searching images with text queries or finding relevant audio clips. If you’re working on multimodal search or retrieval, ImageBind is worth the investment.
Recommended models
andreasjansson / clip-features
Return CLIP features for the clip-vit-large-patch14 model
daanelson / imagebind
A model for text, audio, and image embeddings in one space
replicate / all-mpnet-base-v2
This is a language model that can be used to obtain document embeddings suitable for downstream tasks like semantic search and clustering.
nateraw / bge-large-en-v1.5
BAAI's bge-en-large-v1.5 for embedding text sequences
lucataco / nomic-embed-text-v1
nomic-embed-text-v1 is 8192 context length text encoder that surpasses OpenAI text-embedding-ada-002 and text-embedding-3-small performance on short and long context tasks
adirik / e5-mistral-7b-instruct
E5-mistral-7b-instruct language embedding model
adirik / multilingual-e5-large
Multilingual E5-large language embedding model