lucataco / sdxs-512-0.9

sdxs-512-0.9 can generate high-resolution images in real-time based on prompt texts, trained using score distillation and feature matching

  • Public
  • 525 runs
  • GitHub
  • Paper
  • License

Input

Output

Run time and cost

This model runs on Nvidia A40 (Large) GPU hardware. Predictions typically complete within 3 seconds. The predict time for this model varies significantly based on the inputs.

Readme

SDXS-512-0.9

SDXS is a model that can generate high-resolution images in real-time based on prompt texts, trained using score distillation and feature matching. For more information, please refer to our research paper: SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions. We open-source the model as part of the research.

SDXS-512-0.9 is a old version of SDXS-512. For some reasons, we are only releasing this version for the time being, and will gradually release other versions.

Model Information: - Teacher DM: SD Turbo - Offline DM: SD v2.1 base - VAE: TAESD

The main differences between this model and version 1.0 are in three aspects: 1. This version employs TAESD, which may produce low-quality images when weight_type is float16. Our image decoder is not compatible with the current version of diffusers, so it will not be provided now. 2. This version did not perform the LoRA-GAN finetune mentioned in the implementation details section, which may result in slightly inferior image details. 3. This version replaces self-attention with cross-attention in the highest resolution stages, which introduces minimal overhead compared to directly removing them.

There is a third-party Demo from @ameerazam08. We’ll provide an official demo when 1.0 is officially released, which hopefully won’t be long.

Cite Our Work

@article{song2024sdxs,
  author    = {Yuda Song, Zehao Sun, Xuanwu Yin},
  title     = {SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions},
  journal   = {arxiv},
  year      = {2024},
}