j-min/clip-caption-reward

Public
Fine-grained Image Captioning with CLIP Reward
2.1K runs

Run time and cost

Predictions run on Nvidia T4 GPU hardware. Predictions typically complete within 123 seconds. The predict time for this model varies significantly based on the inputs.

Readme

Fine-grained Image Captioning with CLIP Reward

teaser image

Acknowledgments

We thank the developers of CLIP-ViL, ImageCaptioning.pytorch, CLIP, coco-caption, cider for their public code release.

Reference

Please cite our paper if you use our models in your works:

```bibtex
@inproceedings{Cho2022CLIPReward,
title = {Fine-grained Image Captioning with CLIP Reward},
author = {Jaemin Cho and Seunghyun Yoon and Ajinkya Kale and Franck Dernoncourt and Trung Bui and Mohit Bansal},
booktitle = {Findings of NAACL},
year = {2022}
}

Replicate