arielreplicate / paella_fast_image_variation

Fast image variation model

  • Public
  • 696 runs
  • GitHub
  • Paper
  • License

Run time and cost

This model costs approximately $0.021 to run on Replicate, or 47 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia T4 (High-memory) GPU hardware. Predictions typically complete within 96 seconds. The predict time for this model varies significantly based on the inputs.

Readme

Open In Colab Huggingface Space

Paella

Conditional text-to-image generation has seen countless recent improvements in terms of quality, diversity and fidelity. Nevertheless, most state-of-the-art models require numerous inference steps to produce faithful generations, resulting in performance bottlenecks for end-user applications. In this paper we introduce Paella, a novel text-to-image model requiring less than 10 steps to sample high-fidelity images, using a speed-optimized architecture allowing to sample a single image in less than 500 ms, while having 573M parameters. The model operates on a compressed & quantized latent space, it is conditioned on CLIP embeddings and uses an improved sampling function over previous works. Aside from text-conditional image generation, our model is able to do latent space interpolation and image manipulations such as inpainting, outpainting, and structural editing.

cover-figure

Please find all details about the model and how it was trained in our preprint paper on arxiv.


License

The model code and weights are released under the MIT license.