✋ This model is not published yet.

You can claim this model if you're @kdexd on GitHub.


Image captioning with VirTex
200 runs

Run time and cost

Predictions run on CPU hardware. Predictions typically complete within 52 seconds. The predict time for this model varies significantly based on the inputs.


Karan Desai and Justin Johnson
University of Michigan

Model Zoo, Usage Instructions and API docs: kdexd.github.io/virtex

VirTex is a pretraining approach which uses semantically dense captions to learn visual representations. We train CNN + Transformers from scratch on COCO Captions, and transfer the CNN to downstream vision tasks including image classification, object detection, and instance segmentation. VirTex matches or outperforms models which use ImageNet for pretraining – both supervised or unsupervised – despite using up to 10x fewer images.

This is a demo of image captioning using VirTex.