arielreplicate / gscorecam-clip-analyzer

Shows what CLIP looks at in an image given text

  • Public
  • 743 runs
  • T4
  • GitHub
  • Paper
  • License

Input

input_image
*file

Path to an image to investigate clip on

string
Shift + Return to add a new line

What to look for in the image

Default: "An object"

string

Name of the clip model

Default: "RN50x16"

boolean

Use only part of the channels

Default: true

integer
(minimum: 1, maximum: 3072)

Number of channels used by gscorecam (ignored when drop=False)

Default: 300

boolean

Show the output heatmap on-top of the input

Default: true

Output

output
Generated in

This example was created by a different version, arielreplicate/gscorecam-clip-analyzer:8ffb602b.

Run time and cost

This model costs approximately $0.0054 to run on Replicate, or 185 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia T4 GPU hardware. Predictions typically complete within 24 seconds. The predict time for this model varies significantly based on the inputs.

Readme

gScoreCAM: What is CLIP looking at?

Show which parts of an image are most correlated with a text juged by CLIP embedded space.

tldr: Based on the observations that CLIP ResNet-50 channels are very noisy compared to typical ImageNet-trained ResNet-50, and most saliency methods obtain pretty low object localization scores with CLIP. By visualizing the top 10% most sensitive (highest-gradient) channels, our gScoreCAM obtains the state of the art weakly supervised localization results using CLIP (in both ResNet and ViT versions).

Official Implementation for the paper gScoreCAM: What is CLIP looking at? (2022) by Peijie Chen, Qi Li, Saad Biaz, Trung Bui, and Anh Nguyen. Oral paper at ACCV 2022.

If you use this software, please consider citing:

@inproceedings{chen2022gScoreCAM,
  title={gScoreCAM: What is CLIP looking at?},
  author={Peijie Chen, Qi Li, Saad Biaz, Trung Bui, and Anh Nguyen},
  booktitle={Proceedings of the Asian Conference on Computer Vision (ACCV)},
  year={2022}
}

:star2: Interactive Colab demo :star2: