nvlabs / prismer

A Vision-Language Model with An Ensemble of Experts

  • Public
  • 1.5K runs
  • GitHub
  • Paper
  • License

Input

Output

Run time and cost

This model runs on Nvidia T4 (High-memory) GPU hardware. Predictions typically complete within 10 minutes. The predict time for this model varies significantly based on the inputs.

Readme

Prismer

This repository contains the source code of Prismer and PrismerZ from the paper, Prismer: A Vision-Language Model with An Ensemble of Experts.

Citation

If you found this code/work to be useful in your own research, please consider citing the following:

@article{liu2023prismer,
    title={Prismer: A Vision-Language Model with An Ensemble of Experts},
    author={Liu, Shikun and Fan, Linxi and Johns, Edward and Yu, Zhiding and Xiao, Chaowei and Anandkumar, Anima},
    journal={arXiv preprint arXiv:2303.02506},
    year={2023}
}

License

Copyright © 2023, NVIDIA Corporation. All rights reserved.

This work is made available under the Nvidia Source Code License-NC.

The model checkpoints are shared under CC-BY-NC-SA-4.0. If you remix, transform or build upon the material, you must distribute your contributions under the same license as the original.

For business inquiries, please visit our website and submit the form: NVIDIA Research Licensing.

Acknowledgement

We would like to thank all the researchers who open source their works to make this project possible. @bjoernpl for contributing an automated checkpoint download script.