shefa / turbo-enigma

SDXL based text-to-image model applying Distribution Matching Distillation, supporting zero-shot identity generation in 2-5s. https://ai-visionboard.com

  • Public
  • 2.3M runs
  • Paper
  • License

Input

Output

Run time and cost

This model costs approximately $0.012 to run on Replicate, or 83 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia A40 GPU hardware. Predictions typically complete within 21 seconds. The predict time for this model varies significantly based on the inputs.

Readme

The heart of AI VisionBoard app.

Turbo enigma used in:

Implementation of One-step Diffusion with Distribution Matching Distillation.

This model rivals the efficiency of SDXL-Turbo and LCM, yet surpasses both in delivering superior image quality. It is also more than 10x faster than Photomaker and InstantID, when creating images from a single reference photo (e.g. selfie).

Update 02/2024 - Added zero-shot faceswap with various hacks, achieving more than 10x speed boost over Photomaker or InstandID, while preserving quality - the model runs in under 2 seconds on Replicate, where Photomaker / InstanID run for 20-40seconds - see the examples for yourself!

Update 01/2024 - implemented Fast Diffusion optimization, now runs in ~2 seconds on Replicate, added NSFW filter input in API.

Advantages:

  • Speed: One of the fastest models in its class, enabling rapid generation of complex images.
  • Quality: Despite its speed, the model maintains a high standard of image quality, producing vivid and detailed visuals out-of-the-box.
  • Versatility: Capable of interpreting and visualizing a wide range of textual inputs with accuracy and creativity, based on community merges of best SDXL checkpoints before distillation.

Use Cases:

This model is perfect for scenarios requiring quick turnaround without sacrificing image quality – ideal for content creators, digital artists, and businesses seeking efficient visual representation of ideas. Possible application in video generation as the model can run generate ~15fps @ 720p

Future plans:

Model is still very much WIP and testing different post-training improvements (merges) to enhance both speed and quality.

Comparison vs SDXL-Turbo / LCM / Midjourney

TODO(shefa): Coming soon.