hcl14/direct3d_s2

Direct3D‑S2: Gigascale 3D Generation Made Easy with Spatial Sparse Attention

Public
123 runs

Run time and cost

This model runs on Nvidia L40S GPU hardware. We don't yet have enough runs of this model to provide performance information.

Readme

Gigascale 3D Generation: Direct3D-S2 enables training at 1024^3 resolution with only 8 GPUs. Spatial Sparse Attention (SSA): A novel attention mechanism designed for sparse volumetric data, enabling efficient processing of large token sets. Unified Sparse VAE: A variational autoencoder that maintains a consistent sparse volumetric format across input, latent, and output stages, improving training efficiency and stability.


Usage: provide an image with easily separable background.

Generating at 512 resolution requires at least 10GB of VRAM, and 1024 resolution needs around >24GB. We don’t recommend generating models at 512 resolution, as it’s just an intermediate step >of the 1024 model and the quality is noticeably lower.

Use mesh decimation 0.95 to significantly reduce mesh size (300 mb -> 30mb) without noticeable loss.

!!! Mesh has uniform gray texture, please set in viewer brightness to 0 and contrast to 0.9 (max) !!!


June 3, 2025: We are preparing the v1.2 release, featuring enhanced character generation. Stay tuned!
May 30, 2025: 🤯 We have released both v1.0 and v1.1. The new model offers even greater speed compared to FlashAttention-2, with 12.2× faster forward pass and 19.7× faster backward pass, resulting in nearly 2× inference speedup over v1.0.
May 30, 2025: 🔨 Release inference code and model.
May 26, 2025: 🎁 Release live demo on 🤗 Hugging Face.
May 26, 2025: 🚀 Release paper and project page.