lucataco / pixart-lcm-xl-2

PixArt-Alpha LCM is a transformer-based text-to-image diffusion system trained on text embeddings from T5

  • Public
  • 3K runs
  • GitHub
  • Paper
  • License

Input

Output

Run time and cost

This model runs on Nvidia A40 (Large) GPU hardware. Predictions typically complete within 76 seconds. The predict time for this model varies significantly based on the inputs.

Readme

This is an implementation of PixArt-alpha/PixArt-LCM-XL-2-1024-MS. Inspired by the Huggingface space: PixArt-alpha/PixArt-LCM

About

Pixart-α consists of pure transformer blocks for latent diffusion: It can directly generate 1024px images from text prompts within a single sampling process.

LCMs is a diffusion distillation method which predict PF-ODE’s solution directly in latent space, achieving super fast inference with few steps.

Source code of PixArt-LCM is available at https://github.com/PixArt-alpha/PixArt-alpha.

Model Description

  • Developed by: Pixart & LCM teams
  • Model type: Diffusion-Transformer-based text-to-image generative model
  • License: CreativeML Open RAIL++-M License
  • Model Description: This is a model that can be used to generate and modify images based on text prompts. It is a Transformer Latent Diffusion Model that uses one fixed, pretrained text encoders (T5)) and one latent feature encoder (VAE).
  • Resources for more information: Check out our PixArt-α, LCM GitHub Repository and the Pixart-α, LCM reports on arXiv.