DiT-based video generation model for generating high-quality videos in real-time
Audio-based Lip Synchronization for Talking Head Video
Updated to OpenVoice v2: Versatile Instant Voice Cloning
A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
Fast sdxl with higher quality
Convert LLM's coding to image generation
Depth estimation with faster inference speed, fewer parameters, and higher depth accuracy.
Extended video synthesis model that generates 128 frames
CogVLM2: Visual Language Models for Image and Video Understanding
Generating Consistent Long Depth Sequences for Open-world Videos
Finer and Faster Text-to-Image Generation via Relay Diffusion
Diffusion-based Visual Foundation Model for High-quality Dense Prediction
Sharp Monocular Metric Depth in Less Than a Second
Efficient Visual Generation with Hybrid Autoregressive Transformer
Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis
Depth Any Video with Scalable Synthetic Data
Minimal and Universal Control for Diffusion Transformer - demo for Subject-driven generation
High-Fidelity Single-Step Diffusion through Dynamic Adversarial Training
Autoregressive Video Generation without Vector Quantization
Autoregressive Image Generation without Vector Quantization
This model is cold. You'll get a fast response if the model is warm and already running, and a slower response if the model is cold and starting up.
This model runs on L40S. View more.