Collections

Generate videos

These models can generate and edit videos from text prompts and images. They use advanced AI techniques like diffusion models and latent space interpolation to create high-quality, controllable video content.

Key capabilities:

  • Text-to-video generation - Convert text prompts into video clips and animations. Useful for quickly prototyping video concepts.
  • Image-to-video generation - Animate still images into video.
  • Inpainting for infinite zoom - Use image inpainting to extrapolate video frames and create infinite zoom effects.
  • Stylization - Apply artistic filters like cartoonification to give videos a unique look and feel.

State of the art: google/veo-3-fast

For most people looking to generate custom videos from text prompts, we recommend google/veo-3

Open source: wan-video

The Wan video models model by Wan-AI is an excellent open-source option, competitive with the best proprietary video models. Try adjusting the number of steps used for each frame to trade off between generation speed and detail.

Other rankings

Generative video is a rapidly advancing field. Check out the arena and leaderboard at Artificial Analysis to see what's popular today.

Frequently asked questions

Which models are the fastest?

The open-source Wan suite (like wavespeedai/wan-2.1-t2v-480p) is among the faster text-to-video options on Replicate, especially at lower resolutions and shorter durations. Many models also have “fast” variants, like google/veo-3-fast, designed for quicker turnaround.
Note: Faster runs usually mean lower resolution or simpler motion.

Which models give the best balance of cost and quality?

pixverse/pixverse-v4 offers a strong balance for many use cases. It uses a unit-based system at $0.01 per unit — for example, a 5-second, 360p video costs about $0.30. minimax/hailuo-02 is another good middle-ground option, with both standard and pro modes for different quality levels. Your ideal choice depends on how much resolution and runtime you need and how much you want to spend.

Which models are best for specific use-cases within this collection?

  • For cinematic realism with high resolution and optional audio, try google/veo-3.
  • For fast prototyping (short clips, lower res), Wan models or pixverse/pixverse-v4 work well.
  • For both text-to-video and image-to-video, minimax/hailuo-02 supports both.
  • If you’re on a budget, stick with 480p or 360p outputs to keep costs low.

What’s the difference between key sub-types or approaches in this collection?

  • Text-to-video (T2V): You write a prompt and get a video.
  • Image-to-video (I2V): You provide a still image (or first frame) and animate it. Not all models support this.
  • Quality / resolution tiers: Some models focus on speed and lower res (e.g., Wan fast), while others aim for higher resolution and richer motion (e.g., minimax/hailuo-02, google/veo-3).
  • Open-source vs proprietary: Open models like Wan are cheaper and often faster. Licensed models like Veo 3 offer higher fidelity but can be more expensive.

Which models are good for one common style or output type?

For short, stylized clips (5–10 seconds at lower resolution), pixverse/pixverse-v4 and Wan models are great picks. They’re fast and relatively inexpensive, making them ideal for concept work, storyboarding, or rapid iteration.

Which models are good for another common style or output type?

If you want high-fidelity motion, longer clips, or more realistic physics, google/veo-3 or minimax/hailuo-02 are better options. minimax/hailuo-02 supports 768p in standard mode and 1080p in Pro mode, which makes it a solid choice for more polished results.

What types of outputs do these models produce?

Most text-to-video models generate short video clips (5–10 seconds) at 24 or 30 fps. Supported resolutions range from 360p to 1080p, depending on the model. Some, like google/veo-3, can include audio as part of the output.

How much do runs typically cost?

Costs vary by model and resolution:

  • pixverse/pixverse-v4: about $0.30 for a 5-second, 360p video.
  • Wan models: generally very inexpensive for short, low-res clips.
  • google/veo-3 and minimax/hailuo-02: prices vary and aren’t always listed publicly, so check the model page for up-to-date details.
    Generally, you’ll pay more for longer durations and higher resolutions.

How can I self-host or push a model to Replicate?

You can push your own model by packaging it with Cog and deploying it. If you’re working with open-source video models, you can also fine-tune them and publish your version for others to use.

Can I use these models for commercial work?

Yes, but always check the model’s license. Most text-to-video models on Replicate are available for commercial use, but some authors include additional restrictions.

How do I use or run these models?

You can use the Replicate playground or run them programmatically.

  1. Pick a model from the text-to-video collection.
  2. Add your prompt (and optionally an image for I2V).
  3. Run the model and wait for the video to generate.
  4. Download or embed your output.

Any other collection-specific tips or considerations?

  • Start with short durations and lower resolutions to experiment without overspending.
  • If animating a still image, choose a clean, well-framed starting image for better results.
  • Be specific in your prompts — details like camera motion or scene type improve output quality.
  • Not all models handle character consistency or motion equally well; higher-tier models tend to do better here.
  • Compare resolution and duration to match your budget and needs.
  • Check for updates, as text-to-video models evolve quickly and new versions can improve speed and quality.