Vidu Q3 Turbo
Vidu Q3 Turbo generates video from text prompts, images, or a combination of both. It’s the faster variant of the Q3 series, optimized for quick iteration while still producing high-quality clips up to 16 seconds at up to 1080p with optional synchronized audio.
For maximum visual fidelity, use Vidu Q3 Pro. For faster generation at a lower price, use Q3 Turbo.
What it does
Vidu Q3 Turbo creates video in three modes, chosen automatically based on your inputs:
- Text to video: Describe a scene and the model generates it
- Image to video: Upload a starting image and a prompt describing the motion
- Start-end to video: Upload both a starting and ending frame, and the model creates a smooth transition between them
The model produces natural-looking motion and camera movements with good temporal consistency. When audio is enabled, it generates synchronized sound effects, dialogue, and ambient audio.
How to use it
Text to video
Provide a prompt describing your scene. Use aspect_ratio to control the framing.
Image to video
Upload a start_image along with a prompt describing what should happen. The model animates your image into video. Supported formats: PNG, JPEG, WebP.
Start-end to video
Upload both start_image and end_image with a prompt. The model generates a video that transitions smoothly from the first frame to the last. Both images should have similar aspect ratios.
Writing effective prompts
- Be specific about motion: “A woman in a red coat walks through falling snow” works better than “a person outside”
- Describe camera movement if you want it: “slow dolly shot”, “aerial view pulling back”
- For audio, describe sounds explicitly: “birds chirping”, “footsteps on gravel”
Parameters
- prompt: Text description of the video (up to 5,000 characters)
- start_image: Starting frame image (enables image-to-video mode)
- end_image: Ending frame image (requires start_image, enables start-end mode)
- duration: Video length in seconds (1–16, default: 5)
- resolution: Output resolution —
540p,720p, or1080p(default: 720p) - aspect_ratio:
16:9,9:16,3:4,4:3, or1:1(text-to-video only, default: 16:9) - audio: Generate synchronized audio (default: true)
- seed: Random seed for reproducible results
Pricing
Billed per second of video output, based on resolution:
| Resolution | Price per second |
|---|---|
| 540p | $0.04 |
| 720p | $0.06 |
| 1080p | $0.08 |
For example, a 5-second video at 720p costs $0.30.
Q3 Pro vs Q3 Turbo
| Q3 Pro | Q3 Turbo | |
|---|---|---|
| Visual fidelity | Higher | Good |
| Generation speed | Slower | Faster |
| Price (720p) | $0.15/sec | $0.06/sec |
| Best for | Final renders, high-quality content | Rapid prototyping, iteration, high-volume |
What it’s good for
- Rapid prototyping: Quickly test ideas before committing to a Pro render
- Social media content: Generate short-form video at scale
- Marketing and advertising: Create video content from text or product images
- Animation: Bring still images to life with natural motion
- Scene transitions: Use start-end mode to create smooth visual bridges between keyframes
Limitations
- Maximum 16 seconds per generation
- Audio generation adds dialogue and sound effects but doesn’t support background music control
- Complex text rendering within the video may not be reliable
- Very rapid fine-grained hand movements can sometimes look unnatural