PixArt-Alpha LCM is a transformer-based text-to-image diffusion system trained on text embeddings from T5

  • Public
  • 2.8K runs

Run time and cost

This model runs on Nvidia A40 (Large) GPU hardware. Predictions typically complete within 8 seconds.


This is an implementation of PixArt-alpha/PixArt-LCM-XL-2-1024-MS. Inspired by the Huggingface space: PixArt-alpha/PixArt-LCM


Pixart-α consists of pure transformer blocks for latent diffusion: It can directly generate 1024px images from text prompts within a single sampling process.

LCMs is a diffusion distillation method which predict PF-ODE’s solution directly in latent space, achieving super fast inference with few steps.

Source code of PixArt-LCM is available at

Model Description

  • Developed by: Pixart & LCM teams
  • Model type: Diffusion-Transformer-based text-to-image generative model
  • License: CreativeML Open RAIL++-M License
  • Model Description: This is a model that can be used to generate and modify images based on text prompts. It is a Transformer Latent Diffusion Model that uses one fixed, pretrained text encoders (T5)) and one latent feature encoder (VAE).
  • Resources for more information: Check out our PixArt-α, LCM GitHub Repository and the Pixart-α, LCM reports on arXiv.