Minimax's first image model, with character reference support
Generate 6s videos with prompts or images. (Also known as Hailuo). Use a subject reference to make a video with a character and the S2V-01 model.
An image-to-video (I2V) model specifically trained for Live2D and general animation use cases
Quickly generate up to 1 minute of music with lyrics and vocals in the style of a reference track
Generate videos with specific camera movements
Text-to-Audio (T2A) that offers voice synthesis, emotional expression, and multilingual capabilities. Optimized for high-fidelity applications like voiceovers and audiobooks.
Text-to-Audio (T2A) that offers voice synthesis, emotional expression, and multilingual capabilities. Designed for real-time applications with low latency
Clone voices to use with Minimax's speech-02-hd and speech-02-turbo
Hailuo 2 is a text-to-video and image-to-video model that can make 6s or 10s videos at 768p (standard) or 1080p (pro). It excels at real world physics.
A low cost and fast version of Hailuo 02. Generate 6s and 10s videos in 512p
This model is booted and ready for API calls.
This model is priced by output image. It costs $0.01 per output image, or 100 images for $1.