aliakbarghayoori / dfn5b-clip-vit-h-14-384

return CLIP features for the dfn5b-clip-vit-h-14-384, current highest average perf. in openclip models leaderboard.

  • Public
  • 62 runs
  • GitHub
  • Paper
  • License

Input

Output

Run time and cost

This model runs on Nvidia T4 GPU hardware. We don't yet have enough runs of this model to provide performance information.

Readme

From huggingface: “A CLIP (Contrastive Language-Image Pre-training) model trained on DFN-5B. Data Filtering Networks (DFNs) are small networks used to automatically filter large pools of uncurated data. This model was trained on 5B images that were filtered from a pool of 43B uncurated image-text pairs (12.8B image-text pairs from CommonPool-12.8B + 30B additional public image-text pairs).”

from openclip leaderboard that you can see it here: https://github.com/mlfoundations/open_clip/blob/main/docs/openclip_results.csv

This model has the highest average score among the all models of open-clip.