Collections

Get embeddings

These models generate vector representations that capture the semantics of text, images, and more. Embeddings power search, recommendations, and clustering.

Our pick for text: Multilingual E5

For most text applications, we recommend beautyyuyanli/multilingual-e5-large. It’s fast, cheap and produces high-quality embeddings suitable for semantic search, topic modeling, and classification.

Our pick for images: CLIP

CLIP is the go-to model for image similarity search and clustering. Incredibly popular and cost-effective, CLIP embeddings capture the semantic content of images, making it easy to find similar ones. Just pass in an image URL or a text string and you’re good to go.

Best for multimodal: ImageBind

To jointly embed text, images, and audio, ImageBind is in a class of its own. While more expensive than unimodal models, its ability to unify different data types enables unique applications like searching images with text queries or finding relevant audio clips. If you’re working on multimodal search or retrieval, ImageBind is worth the investment.

Recommended models