✋ This model is not published yet.

You can claim this model if you're @kdexd on GitHub. Contact us.

kdexd / virtex-image-captioning

Image captioning with VirTex

  • Public
  • 328 runs
  • GitHub
  • Paper
  • License

😵 Uh oh! This model can't be run on Replicate because it was built with a version of Cog or Python that is no longer supported. Consider opening an issue on the model's GitHub repository to see if it can be updated to use a recent version of Cog. If you need any help, please Contact us about it.

Run time and cost

This model costs approximately $0.0047 to run on Replicate, or 212 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on CPU hardware. Predictions typically complete within 47 seconds. The predict time for this model varies significantly based on the inputs.

Readme

Karan Desai and Justin Johnson University of Michigan

Model Zoo, Usage Instructions and API docs: kdexd.github.io/virtex

VirTex is a pretraining approach which uses semantically dense captions to learn visual representations. We train CNN + Transformers from scratch on COCO Captions, and transfer the CNN to downstream vision tasks including image classification, object detection, and instance segmentation. VirTex matches or outperforms models which use ImageNet for pretraining – both supervised or unsupervised – despite using up to 10x fewer images.

This is a demo of image captioning using VirTex.