Readme
Note: This model is licensed under a non-commercial license, and so should only be used for research and experimentation purposes.
Model description
ImageBind is a model from MetaAI that learns a joint embedding across six different modalities - images, text, audio, depth, thermal, and IMU data. It enables novel emergent applications ‘out-of-the-box’ including cross-modal retrieval, composing modalities with arithmetic, cross-modal detection and generation.
This implementation has image, text, and audio modalities.