gnobitab/fusedream

Training-Free Text-to-Image Generation

  • Public
  • 2.4K runs

😵 Uh oh! This model can't be run on Replicate because it was built with a version of Cog that is no longer supported. Consider opening an issue on the model's GitHub repository to see if it can be updated to use a recent version of Cog. If you need any help, please hop into our Discord channel or Contact us about it.

Run time and cost

This model runs on Nvidia T4 GPU hardware. Predictions typically complete within 6 minutes. The predict time for this model varies significantly based on the inputs.

Readme

FuseDream

FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+GAN Space Optimization

by Xingchao Liu, Chengyue Gong, Lemeng Wu, Shujian Zhang, Hao Su and Qiang Liu from UCSD and UT Austin.

Introduction

FuseDream uses pre-trained GANs (we support BigGAN-256 and BigGAN-512 for now) and CLIP to achieve high-fidelity text-to-image generation.

Citations

If you use the code, please cite:

@inproceedings{
brock2018large,
title={Large Scale {GAN} Training for High Fidelity Natural Image Synthesis},
author={Andrew Brock and Jeff Donahue and Karen Simonyan},
booktitle={International Conference on Learning Representations},
year={2019},
url={https://openreview.net/forum?id=B1xsqj09Fm},
}

and

@misc{
liu2021fusedream,
title={FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+GAN Space Optimization}, 
author={Xingchao Liu and Chengyue Gong and Lemeng Wu and Shujian Zhang and Hao Su and Qiang Liu},
year={2021},
eprint={2112.01573},
archivePrefix={arXiv},
primaryClass={cs.CV}
}