aliakbarghayoori / dfn5b-clip-vit-h-14-384

return CLIP features for the dfn5b-clip-vit-h-14-384, current highest average perf. in openclip models leaderboard.

  • Public
  • 59 runs
  • GitHub
  • Paper
  • License

Input

Output

Run time and cost

This model runs on Nvidia T4 GPU hardware.

Readme

From huggingface: “A CLIP (Contrastive Language-Image Pre-training) model trained on DFN-5B. Data Filtering Networks (DFNs) are small networks used to automatically filter large pools of uncurated data. This model was trained on 5B images that were filtered from a pool of 43B uncurated image-text pairs (12.8B image-text pairs from CommonPool-12.8B + 30B additional public image-text pairs).”

from openclip leaderboard that you can see it here: https://github.com/mlfoundations/open_clip/blob/main/docs/openclip_results.csv

This model has the highest average score among the all models of open-clip.