A fast image model with wide artistic range and resolutions up to 4096x4096
A Vision-Language Model with An Ensemble of Experts
🗣️ Nvidia + Suno.ai's speech-to-text conversion with high accuracy and efficiency 📝
SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation
Transform PDFs into AI podcasts for engaging on-the-go audio content.
🎤The best open-source speech-to-text model as of Jul 2025, transcribing audio with record 5.63% WER and enabling AI tasks like summarization directly from speech✨
This model is not yet booted but ready for API calls. Your first API call will boot the model and may take longer, but after that subsequent responses will be fast.
This model runs on H100.