Readme
Gigascale 3D Generation: Direct3D-S2 enables training at 1024^3 resolution with only 8 GPUs. Spatial Sparse Attention (SSA): A novel attention mechanism designed for sparse volumetric data, enabling efficient processing of large token sets. Unified Sparse VAE: A variational autoencoder that maintains a consistent sparse volumetric format across input, latent, and output stages, improving training efficiency and stability.
Usage: provide an image with easily separable background.
Generating at 512 resolution requires at least 10GB of VRAM, and 1024 resolution needs around >24GB. We don’t recommend generating models at 512 resolution, as it’s just an intermediate step >of the 1024 model and the quality is noticeably lower.
Use mesh decimation 0.95 to significantly reduce mesh size (300 mb -> 30mb) without noticeable loss.
!!! Mesh has uniform gray texture, please set in viewer brightness to 0 and contrast to 0.9 (max) !!!
June 3, 2025: We are preparing the v1.2 release, featuring enhanced character generation. Stay tuned!
May 30, 2025: 🤯 We have released both v1.0 and v1.1. The new model offers even greater speed compared to FlashAttention-2, with 12.2× faster forward pass and 19.7× faster backward pass, resulting in nearly 2× inference speedup over v1.0.
May 30, 2025: 🔨 Release inference code and model.
May 26, 2025: 🎁 Release live demo on 🤗 Hugging Face.
May 26, 2025: 🚀 Release paper and project page.