Official

wavespeedai / hunyuan-video-fast

Accelerated inference for HunyuanVideo with high resolution (1280x720), a state-of-the-art text-to-video generation model capable of creating high-quality videos with realistic motion from text descriptions

  • Public
  • 4.9K runs
  • $0.20 per second of video
  • GitHub
  • Weights
  • Paper
  • License

Input

*string
Shift + Return to add a new line

Text prompt for video generation

integer

Length in output video, in seconds

Default: 5

integer

Random seed. Set for reproducible generation

Output

Generated in

Pricing

Official model
Pricing for official models works differently from other models. Instead of being billed by time, you’re billed by input and output, making pricing more predictable.

This model is priced by how many seconds of video are generated.

TypePer unitPer $1
Output
$0.20 / second of video
or
5 seconds of video / $1

For example, generating 100 seconds of video should cost around $20.00.

Check out our docs for more information about how per second of video pricing works on Replicate.

Readme

Accelerated Inference for HunyuanVideo with High Resolution (1280x720)

We are WaveSpeedAI, providing highly-optimized inference optimization for generative AI models.

We are excited to introduce our new product, a highly-optimized inference endpoint for HunyuanVideo model, a state-of-the-art text-to-video generation model capable of creating high-quality videos with realistic motion from text descriptions.

We utilize cutting-edge inference acceleration techniques to provide very fast inference for this model. And we are happy to bring this to you together with Replicate and DataCrunch.

Model Description ✨

This model is trained on a spatial-temporally compressed latent space and uses a large language model for text encoding. According to professional human evaluation results, HunyuanVideo outperforms previous state-of-the-art models in terms of text alignment, motion quality, and visual quality.

Key features:

  • 🎨 High-quality video generation from text descriptions
  • 📐 Support for various aspect ratios and resolutions
  • ✍️ Advanced prompt handling with a built-in rewrite system
  • 🎯 Stable motion generation and temporal consistency

Predictions Examples 💫

The model works well for prompts like: - “A cat walks on the grass, realistic style” - “A drone shot of mountains at sunset” - “A flower blooming in timelapse”

Limitations ⚠️

  • Generation time increases with video length and resolution
  • Higher resolutions require more GPU memory
  • Some complex motions may require prompt engineering for best results

Citation 📚

If you use this model in your research, please cite:

@misc{kong2024hunyuanvideo,
      title={HunyuanVideo: A Systematic Framework For Large Video Generative Models}, 
      author={Weijie Kong, et al.},
      year={2024},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}