Readme

P-Video

P-Video is Pruna AI’s video generation model built for speed and creative iteration. It generates a 5-second 720p video in about 10 seconds, and includes a draft mode that’s 4× faster for quick previews before committing to a full render.

Features

All-in-one endpoint — text-to-video, image-to-video, and audio-to-video
Draft mode — 4× faster previews for rapid iteration
Built-in audio generation — native dialogue and sound, plus custom audio import
Up to 1080p at 48 FPS
Multi-aspect ratio support — 16:9, 9:16, 4:3, 3:4, 3:2, 2:3, 1:1
Prompt upsampling — automatic prompt enhancement with full user control

Pricing

	Draft OFF	Draft ON
720p	$0.02/sec	$0.005/sec
1080p	$0.04/sec	$0.01/sec

Inputs

prompt (required) — text description of the video you want to generate
image — input image for image-to-video generation (jpg, jpeg, png, webp)
audio — input audio to condition video generation (flac, mp3, wav)
duration — video length in seconds, 1–10 (default: 5). Ignored when audio is provided
aspect_ratio — 16:9, 9:16, 4:3, 3:4, 3:2, 2:3, or 1:1 (default: 16:9). Ignored when an input image is provided
resolution — 720p or 1080p (default: 720p)
fps — 24 or 48 frames per second (default: 24)
draft — enable draft mode for faster, lower-quality previews (default: false)
prompt_upsampling — enhance the prompt automatically (default: true)
seed — set for reproducible generation

What it’s good at

Talking avatars and lip sync — strong input-image consistency with reliable lip synchronization and native dialogue generation
Close-up subjects — particularly strong with foreground objects and close-up shots
Product animation — turn static product images into animated videos
Social ads and short-form content — fast iteration with multi-resolution output
Music videos — combine your own audio with generated visuals
Animating low-resolution assets — effective at bringing low-res images to life

Tips

Use draft mode for iteration. Start with draft mode on to quickly explore different prompts and compositions, then switch it off for the final render.
Vertical formats may work better at 1080p and 48 FPS.
Try different resolutions and FPS settings. Output quality can vary depending on the combination of resolution, FPS, and input framing.
Light prompt refinement helps. Like any generative model, a short experimentation phase with your prompts will get better results.

Limitations

Not designed for extreme cinematic camera motion or complex multi-scene storytelling
No native 4K output
Sound effects (SFX) performance is limited — for premium voice realism or advanced sound design, dedicated audio providers can deliver higher fidelity, and their output can be used as audio input to P-Video
Above two speakers, speaker separation can degrade
Speaker attribution drift can occur (e.g., one voice delivering multiple lines)

Model created 5 months, 3 weeks ago

Model updated 5 days, 10 hours ago

Examples