aliakbarghayoori / dfn5b-clip-vit-h-14-384

return CLIP features for the dfn5b-clip-vit-h-14-384, current highest average perf. in openclip models leaderboard.

  • Public
  • 382 runs
  • GitHub
  • Paper
  • License

Input

Output

Run time and cost

This model costs approximately $0.0026 to run on Replicate, or 384 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia T4 GPU hardware. Predictions typically complete within 12 seconds.

Readme

From huggingface: “A CLIP (Contrastive Language-Image Pre-training) model trained on DFN-5B. Data Filtering Networks (DFNs) are small networks used to automatically filter large pools of uncurated data. This model was trained on 5B images that were filtered from a pool of 43B uncurated image-text pairs (12.8B image-text pairs from CommonPool-12.8B + 30B additional public image-text pairs).”

from openclip leaderboard that you can see it here: https://github.com/mlfoundations/open_clip/blob/main/docs/openclip_results.csv

This model has the highest average score among the all models of open-clip.