chenxwh / latent-diffusion-text2img

text-to-image with latent diffusion

  • Public
  • 4.1K runs
  • GitHub
  • Paper
  • License

Input

Output

Run time and cost

This model runs on Nvidia T4 (High-memory) GPU hardware. Predictions typically complete within 55 seconds. The predict time for this model varies significantly based on the inputs.

Readme

This is a cog implementation for https://github.com/CompVis/latent-diffusion

Latent Diffusion Models - Text-to-Image

arXiv | BibTeX

High-Resolution Image Synthesis with Latent Diffusion Models
Robin Rombach*, Andreas Blattmann*, Dominik Lorenz, Patrick Esser, Björn Ommer
* equal contribution

Text-to-Image

text2img-figure

Comments

BibTeX

@misc{rombach2021highresolution,
      title={High-Resolution Image Synthesis with Latent Diffusion Models}, 
      author={Robin Rombach and Andreas Blattmann and Dominik Lorenz and Patrick Esser and Björn Ommer},
      year={2021},
      eprint={2112.10752},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}