Vidu Q3 Pro

Vidu Q3 Pro generates high-fidelity video from text prompts, images, or a combination of both. It produces cinematic clips up to 16 seconds long at up to 1080p resolution with optional synchronized audio—dialogue, sound effects, and ambient sounds generated alongside the video.

What it does

Vidu Q3 Pro creates video in three modes, chosen automatically based on your inputs:

Text to video: Describe a scene and the model generates it
Image to video: Upload a starting image and a prompt describing the motion
Start-end to video: Upload both a starting and ending frame, and the model creates a smooth transition between them

The model handles complex motion, maintains temporal consistency across frames, and produces natural-looking camera movements. When audio is enabled, it generates synchronized sound that matches the visual content.

How to use it

Text to video

Provide a prompt describing your scene. Use aspect_ratio to control the framing.

Image to video

Upload a start_image along with a prompt describing what should happen. The model animates your image into video. Supported formats: PNG, JPEG, WebP.

Start-end to video

Upload both start_image and end_image with a prompt. The model generates a video that transitions smoothly from the first frame to the last. Both images should have similar aspect ratios.

Writing effective prompts

Be specific about motion: “A woman in a red coat walks through falling snow” works better than “a person outside”
Describe camera movement if you want it: “slow dolly shot”, “aerial view pulling back”
For audio, describe sounds explicitly: “birds chirping”, “footsteps on gravel”

Parameters

prompt: Text description of the video (up to 5,000 characters)
start_image: Starting frame image (enables image-to-video mode)
end_image: Ending frame image (requires start_image, enables start-end mode)
duration: Video length in seconds (1–16, default: 5)
resolution: Output resolution — 540p, 720p, or 1080p (default: 720p)
aspect_ratio: 16:9, 9:16, 3:4, 4:3, or 1:1 (text-to-video only, default: 16:9)
audio: Generate synchronized audio (default: true)
seed: Random seed for reproducible results

Pricing

Billed per second of video output, based on resolution:

Resolution	Price per second
540p	$0.07
720p	$0.15
1080p	$0.16

For example, a 5-second video at 720p costs $0.75.

What it’s good for

Marketing and advertising: Create polished video content from text descriptions or product images
Social media: Generate short-form video in vertical, square, or widescreen formats
Storyboarding: Quickly visualize scenes from written descriptions
Animation: Bring still images to life with natural motion
Scene transitions: Use start-end mode to create smooth visual bridges between keyframes

Limitations

Maximum 16 seconds per generation
Audio generation adds dialogue and sound effects but doesn’t support background music control
Complex text rendering within the video may not be reliable
Very rapid fine-grained hand movements can sometimes look unnatural

Links

Model created 4 months, 1 week ago

Model updated 3 months ago