j-min / clip-caption-reward

Fine-grained Image Captioning with CLIP Reward

  • Public
  • 296.1K runs
  • GitHub
  • Paper
  • License

😵 Uh oh! This model can't be run on Replicate because it was built with a version of Cog or Python that is no longer supported. Consider opening an issue on the model's GitHub repository to see if it can be updated to use a recent version of Cog. If you need any help, please hop into our Discord channel or Contact us about it.

Run time and cost

This model costs approximately $0.0014 to run on Replicate, or 714 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia T4 GPU hardware. Predictions typically complete within 7 seconds. The predict time for this model varies significantly based on the inputs.

Readme

Fine-grained Image Captioning with CLIP Reward

teaser image

Acknowledgments

We thank the developers of CLIP-ViL, ImageCaptioning.pytorch, CLIP, coco-caption, cider for their public code release.

Reference

Please cite our paper if you use our models in your works:

```bibtex @inproceedings{Cho2022CLIPReward, title = {Fine-grained Image Captioning with CLIP Reward}, author = {Jaemin Cho and Seunghyun Yoon and Ajinkya Kale and Franck Dernoncourt and Trung Bui and Mohit Bansal}, booktitle = {Findings of NAACL}, year = {2022} }