cjwbw / pixart-sigma

Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

  • Public
  • 591 runs
  • GitHub
  • Paper
  • License

Input

Output

Run time and cost

This model runs on Nvidia A40 (Large) GPU hardware. Predictions typically complete within 61 seconds. The predict time for this model varies significantly based on the inputs.

Readme

PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

This repo contains PyTorch model definitions, pre-trained weights and inference/sampling code for our paper exploring Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation. You can find more visualizations on our project page.