nvlabs / prismer

A Vision-Language Model with An Ensemble of Experts

  • Public
  • 1.6K runs
  • GitHub
  • Paper
  • License



This repository contains the source code of Prismer and PrismerZ from the paper, Prismer: A Vision-Language Model with An Ensemble of Experts.


If you found this code/work to be useful in your own research, please consider citing the following:

    title={Prismer: A Vision-Language Model with An Ensemble of Experts},
    author={Liu, Shikun and Fan, Linxi and Johns, Edward and Yu, Zhiding and Xiao, Chaowei and Anandkumar, Anima},
    journal={arXiv preprint arXiv:2303.02506},


Copyright © 2023, NVIDIA Corporation. All rights reserved.

This work is made available under the Nvidia Source Code License-NC.

The model checkpoints are shared under CC-BY-NC-SA-4.0. If you remix, transform or build upon the material, you must distribute your contributions under the same license as the original.

For business inquiries, please visit our website and submit the form: NVIDIA Research Licensing.


We would like to thank all the researchers who open source their works to make this project possible. @bjoernpl for contributing an automated checkpoint download script.