lucataco / sdxs-512-0.9

sdxs-512-0.9 can generate high-resolution images in real-time based on prompt texts, trained using score distillation and feature matching

  • Public
  • 654 runs
  • GitHub
  • Paper
  • License

Input

Output

Run time and cost

This model runs on Nvidia A40 (Large) GPU hardware. Predictions typically complete within 74 seconds. The predict time for this model varies significantly based on the inputs.

Readme

SDXS-512-0.9

SDXS is a model that can generate high-resolution images in real-time based on prompt texts, trained using score distillation and feature matching. For more information, please refer to our research paper: SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions. We open-source the model as part of the research.

SDXS-512-0.9 is a old version of SDXS-512. For some reasons, we are only releasing this version for the time being, and will gradually release other versions.

Model Information: - Teacher DM: SD Turbo - Offline DM: SD v2.1 base - VAE: TAESD

The main differences between this model and version 1.0 are in three aspects: 1. This version employs TAESD, which may produce low-quality images when weight_type is float16. Our image decoder is not compatible with the current version of diffusers, so it will not be provided now. 2. This version did not perform the LoRA-GAN finetune mentioned in the implementation details section, which may result in slightly inferior image details. 3. This version replaces self-attention with cross-attention in the highest resolution stages, which introduces minimal overhead compared to directly removing them.

There is a third-party Demo from @ameerazam08. We’ll provide an official demo when 1.0 is officially released, which hopefully won’t be long.

Cite Our Work

@article{song2024sdxs,
  author    = {Yuda Song, Zehao Sun, Xuanwu Yin},
  title     = {SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions},
  journal   = {arxiv},
  year      = {2024},
}