vidu/q3-turbo

Fast video generation with text-to-video, image-to-video, and start-end-to-video modes. Up to 16 seconds at 1080p with synchronized audio.

116 runs

Readme

Vidu Q3 Turbo

Vidu Q3 Turbo generates video from text prompts, images, or a combination of both. It’s the faster variant of the Q3 series, optimized for quick iteration while still producing high-quality clips up to 16 seconds at up to 1080p with optional synchronized audio.

For maximum visual fidelity, use Vidu Q3 Pro. For faster generation at a lower price, use Q3 Turbo.

What it does

Vidu Q3 Turbo creates video in three modes, chosen automatically based on your inputs:

  • Text to video: Describe a scene and the model generates it
  • Image to video: Upload a starting image and a prompt describing the motion
  • Start-end to video: Upload both a starting and ending frame, and the model creates a smooth transition between them

The model produces natural-looking motion and camera movements with good temporal consistency. When audio is enabled, it generates synchronized sound effects, dialogue, and ambient audio.

How to use it

Text to video

Provide a prompt describing your scene. Use aspect_ratio to control the framing.

Image to video

Upload a start_image along with a prompt describing what should happen. The model animates your image into video. Supported formats: PNG, JPEG, WebP.

Start-end to video

Upload both start_image and end_image with a prompt. The model generates a video that transitions smoothly from the first frame to the last. Both images should have similar aspect ratios.

Writing effective prompts

  • Be specific about motion: “A woman in a red coat walks through falling snow” works better than “a person outside”
  • Describe camera movement if you want it: “slow dolly shot”, “aerial view pulling back”
  • For audio, describe sounds explicitly: “birds chirping”, “footsteps on gravel”

Parameters

  • prompt: Text description of the video (up to 5,000 characters)
  • start_image: Starting frame image (enables image-to-video mode)
  • end_image: Ending frame image (requires start_image, enables start-end mode)
  • duration: Video length in seconds (1–16, default: 5)
  • resolution: Output resolution — 540p, 720p, or 1080p (default: 720p)
  • aspect_ratio: 16:9, 9:16, 3:4, 4:3, or 1:1 (text-to-video only, default: 16:9)
  • audio: Generate synchronized audio (default: true)
  • seed: Random seed for reproducible results

Pricing

Billed per second of video output, based on resolution:

Resolution Price per second
540p $0.04
720p $0.06
1080p $0.08

For example, a 5-second video at 720p costs $0.30.

Q3 Pro vs Q3 Turbo

Q3 Pro Q3 Turbo
Visual fidelity Higher Good
Generation speed Slower Faster
Price (720p) $0.15/sec $0.06/sec
Best for Final renders, high-quality content Rapid prototyping, iteration, high-volume

What it’s good for

  • Rapid prototyping: Quickly test ideas before committing to a Pro render
  • Social media content: Generate short-form video at scale
  • Marketing and advertising: Create video content from text or product images
  • Animation: Bring still images to life with natural motion
  • Scene transitions: Use start-end mode to create smooth visual bridges between keyframes

Limitations

  • Maximum 16 seconds per generation
  • Audio generation adds dialogue and sound effects but doesn’t support background music control
  • Complex text rendering within the video may not be reliable
  • Very rapid fine-grained hand movements can sometimes look unnatural
Model created
Model updated