MiniMax Hailuo 02

MiniMax Hailuo 02 is a video generation model designed for high-quality video synthesis from text and image inputs. The model represents a significant architectural advancement over its predecessor, featuring improved efficiency and enhanced capabilities for complex video generation tasks.

There are two models, standard and pro:

768p (standard)
1080p (pro)

The pro model produces higher quality videos with better physics and coherency.

Supported Resolutions and Durations for last_frame_image:

768P (6s, 10s)
1080P (6s)

Architecture

The model is built on the Noise-aware Compute Redistribution (NCR) framework, a novel architecture that optimizes computational efficiency during both training and inference. This approach redistributes compute resources based on noise levels in the generation process, achieving 2.5x improvement in training and inference efficiency compared to conventional architectures at similar parameter scales.

Model Specifications

Parameters: 3x larger than predecessor model
Training Data: 4x larger dataset compared to previous version
Native Resolution: 1080p output capability
Output Formats:
768p at 6 and 10 seconds
1080p at 6 seconds

Key Capabilities

Instruction Following

The model demonstrates state-of-the-art performance in interpreting and executing complex text prompts, accurately translating detailed instructions into corresponding video content.

Physics Simulation

Advanced physics modelling capabilities enable realistic representation of complex physical interactions and movements, including challenging scenarios like gymnastics and intricate motion sequences.

Supports both text-to-video (T2V) and image-to-video (I2V) generation workflows, allowing for flexible content creation approaches.

Technical Performance

Efficiency: 2.5x improvement in computational efficiency over comparable models
Quality: Native 1080p generation capability
Stability: Enhanced alignment and generation success rates
Versatility: Handles complex scenarios requiring precise physics simulation

Training Details

The model was trained on a significantly expanded dataset featuring improved data quality and diversity. Training incorporated user feedback and usage patterns from the predecessor model to optimize performance on real-world video generation tasks.

Limitations

Generation speed continues to be an area for improvement
Model alignment and stability, while enhanced, remain ongoing development focuses
Current capabilities are primarily focused on T2V and I2V tasks

Model Card

Model Type: Video Generation
Architecture: Noise-aware Compute Redistribution (NCR)
Input Modalities: Text, Image
Output Modality: Video
Maximum Resolution: 1080p
Maximum Duration: 10 seconds

Development Status

This model represents an active development effort with ongoing improvements planned for:

Generation speed optimization
Enhanced model alignment and stability
Expansion beyond current T2V and I2V capabilities

Privacy policy

Data from this model is sent from Replicate to MiniMax.

Check their Privacy Policy for details:

https://intl.minimaxi.com/protocol/privacy-policy

Terms of Service

https://intl.minimaxi.com/protocol/terms-of-service

Model created over 1 year ago

Model updated 8 months, 2 weeks ago