hcl14/direct3d_s2 | Run with an API on Replicate

Readme

Gigascale 3D Generation: Direct3D-S2 enables training at 1024^3 resolution with only 8 GPUs. Spatial Sparse Attention (SSA): A novel attention mechanism designed for sparse volumetric data, enabling efficient processing of large token sets. Unified Sparse VAE: A variational autoencoder that maintains a consistent sparse volumetric format across input, latent, and output stages, improving training efficiency and stability.

Usage: provide an image with easily separable background.

Generating at 512 resolution requires at least 10GB of VRAM, and 1024 resolution needs around >24GB. We don’t recommend generating models at 512 resolution, as it’s just an intermediate step >of the 1024 model and the quality is noticeably lower.

Use mesh decimation 0.95 to significantly reduce mesh size (300 mb -> 30mb) without noticeable loss.

!!! Mesh has uniform gray texture, please set in viewer brightness to 0 and contrast to 0.9 (max) !!!

June 3, 2025: We are preparing the v1.2 release, featuring enhanced character generation. Stay tuned!
May 30, 2025: 🤯 We have released both v1.0 and v1.1. The new model offers even greater speed compared to FlashAttention-2, with 12.2× faster forward pass and 19.7× faster backward pass, resulting in nearly 2× inference speedup over v1.0.
May 30, 2025: 🔨 Release inference code and model.
May 26, 2025: 🎁 Release live demo on 🤗 Hugging Face.
May 26, 2025: 🚀 Release paper and project page.

Model created 4 months ago

Model updated 3 months, 2 weeks ago

Run time and cost

Readme