🚀 Want to run this model with an API? Get started

j-min/clip-caption-reward

Public
Fine-grained Image Captioning with CLIP Reward
12.9K runs

Run time and cost

Predictions run on Nvidia T4 GPU hardware. Predictions typically complete within 6 seconds.

Fine-grained Image Captioning with CLIP Reward

teaser image

Acknowledgments

We thank the developers of CLIP-ViL, ImageCaptioning.pytorch, CLIP, coco-caption, cider for their public code release.

Reference

Please cite our paper if you use our models in your works:

```bibtex
@inproceedings{Cho2022CLIPReward,
title = {Fine-grained Image Captioning with CLIP Reward},
author = {Jaemin Cho and Seunghyun Yoon and Ajinkya Kale and Franck Dernoncourt and Trung Bui and Mohit Bansal},
booktitle = {Findings of NAACL},
year = {2022}
}