Official

wavespeedai / wan-2.1-t2v-480p

Accelerated inference for Wan 2.1 14B text to video, a comprehensive and open suite of video foundation models that pushes the boundaries of video generation.

  • Public
  • 25.2K runs
  • $0.07 per second of video
  • Commercial use
  • GitHub
  • Weights
  • License

Input

*string
Shift + Return to add a new line

Prompt for video generation

integer
(minimum: 81, maximum: 100)

Number of video frames. 81 frames give the best results

Default: 81

string

Aspect ratio of video. 16:9 corresponds to 832x480px, and 9:16 is 480x832px

Default: "16:9"

integer
(minimum: 5, maximum: 24)

Frames per second. Note that the pricing of this model is based on the video duration at 16 fps

Default: 16

string

Speed up generation with different levels of acceleration. Faster modes may degrade quality somewhat. The speedup is dependent on the content, so different videos may see different speedups.

Default: "Balanced"

integer
(minimum: 1, maximum: 40)

Number of generation steps. Fewer steps means faster generation, at the expensive of output quality. 30 steps is sufficient for most prompts

Default: 30

number
(minimum: 0, maximum: 10)

Higher guide scale makes prompt adherence better, but can reduce variation

Default: 5

number
(minimum: 1, maximum: 10)

Sample shift factor

Default: 5

integer

Random seed. Leave blank for random

string
Shift + Return to add a new line

Load LoRA weights. Supports Replicate models in the format <owner>/<username> or <owner>/<username>/<version>, HuggingFace URLs in the format huggingface.co/<owner>/<model-name>, CivitAI URLs in the format civitai.com/models/<id>[/<model-name>], or arbitrary .safetensors URLs from the Internet. For example, 'fofr/flux-pixar-cars'

number

Determines how strongly the main LoRA should be applied. Sane results between 0 and 1 for base inference. For go_fast we apply a 1.5x multiplier to this value; we've generally seen good performance when scaling the base value by that amount. You may still need to experiment to find the best value for your particular lora.

Default: 1

Output

Generated in

Pricing

Official model
Pricing for official models works differently from other models. Instead of being billed by time, you’re billed by input and output, making pricing more predictable.

This model is priced by how many seconds of video are generated.

TypePer unitPer $1
Output
$0.07 / second of video
or
14 seconds of video / $1

For example, generating 100 seconds of video should cost around $7.00.

Check out our docs for more information about how per second of video pricing works on Replicate.

Readme

Accelerated Inference for Wan 2.1 14B

We are WaveSpeedAI, providing highly-optimized inference optimization for generative AI models.

We are excited to introduce our new product, a highly-optimized inference endpoint for Wan-2.1 14B model, a comprehensive and open suite of video foundation models that pushes the boundaries of video generation.

We utilize cutting-edge inference acceleration techniques to provide very fast inference for this model. And we are happy to bring this to you together with Replicate and DataCrunch.

Model Description ✨

Wan: Open and Advanced Large-Scale Video Generative Models

In this repository, we present Wan2.1, a comprehensive and open suite of video foundation models that pushes the boundaries of video generation. Wan2.1 offers these key features: - 👍 SOTA Performance: Wan2.1 consistently outperforms existing open-source models and state-of-the-art commercial solutions across multiple benchmarks. - 👍 Supports Consumer-grade GPUs: The T2V-1.3B model requires only 8.19 GB VRAM, making it compatible with almost all consumer-grade GPUs. It can generate a 5-second 480P video on an RTX 4090 in about 4 minutes (without optimization techniques like quantization). Its performance is even comparable to some closed-source models. - 👍 Multiple Tasks: Wan2.1 excels in Text-to-Video, Image-to-Video, Video Editing, Text-to-Image, and Video-to-Audio, advancing the field of video generation. - 👍 Visual Text Generation: Wan2.1 is the first video model capable of generating both Chinese and English text, featuring robust text generation that enhances its practical applications. - 👍 Powerful Video VAE: Wan-VAE delivers exceptional efficiency and performance, encoding and decoding 1080P videos of any length while preserving temporal information, making it an ideal foundation for video and image generation.