Generate videos with specific camera movements
Minimax's first image model, with character reference support
Quickly generate up to 1 minute of music with lyrics and vocals in the style of a reference track
Text-to-Audio (T2A) that offers voice synthesis, emotional expression, and multilingual capabilities. Optimized for high-fidelity applications like voiceovers and audiobooks.
Text-to-Audio (T2A) that offers voice synthesis, emotional expression, and multilingual capabilities. Designed for real-time applications with low latency
Generate 6s videos with prompts or images. (Also known as Hailuo). Use a subject reference to make a video with a character and the S2V-01 model.
An image-to-video (I2V) model specifically trained for Live2D and general animation use cases
Clone voices to use with Minimax's speech-02-hd and speech-02-turbo
This model is warm. You'll get a fast response if the model is warm and already running, and a slower response if the model is cold and starting up.
This model is priced by output video. It costs $0.50 per output video, or 20 videos for $10.