Readme
P-Video
P-Video is Pruna AI’s video generation model built for speed and creative iteration. It generates a 5-second 720p video in about 10 seconds, and includes a draft mode that’s 4× faster for quick previews before committing to a full render.
Features
- All-in-one endpoint — text-to-video, image-to-video, and audio-to-video
- Draft mode — 4× faster previews for rapid iteration
- Built-in audio generation — native dialogue and sound, plus custom audio import
- Up to 1080p at 48 FPS
- Multi-aspect ratio support — 16:9, 9:16, 4:3, 3:4, 3:2, 2:3, 1:1
- Prompt upsampling — automatic prompt enhancement with full user control
Pricing
| Draft OFF | Draft ON | |
|---|---|---|
| 720p | $0.02/sec | $0.005/sec |
| 1080p | $0.04/sec | $0.01/sec |
Inputs
- prompt (required) — text description of the video you want to generate
- image — input image for image-to-video generation (jpg, jpeg, png, webp)
- audio — input audio to condition video generation (flac, mp3, wav)
- duration — video length in seconds, 1–10 (default: 5). Ignored when audio is provided
- aspect_ratio — 16:9, 9:16, 4:3, 3:4, 3:2, 2:3, or 1:1 (default: 16:9). Ignored when an input image is provided
- resolution — 720p or 1080p (default: 720p)
- fps — 24 or 48 frames per second (default: 24)
- draft — enable draft mode for faster, lower-quality previews (default: false)
- prompt_upsampling — enhance the prompt automatically (default: true)
- seed — set for reproducible generation
What it’s good at
- Talking avatars and lip sync — strong input-image consistency with reliable lip synchronization and native dialogue generation
- Close-up subjects — particularly strong with foreground objects and close-up shots
- Product animation — turn static product images into animated videos
- Social ads and short-form content — fast iteration with multi-resolution output
- Music videos — combine your own audio with generated visuals
- Animating low-resolution assets — effective at bringing low-res images to life
Tips
- Use draft mode for iteration. Start with draft mode on to quickly explore different prompts and compositions, then switch it off for the final render.
- Vertical formats may work better at 1080p and 48 FPS.
- Try different resolutions and FPS settings. Output quality can vary depending on the combination of resolution, FPS, and input framing.
- Light prompt refinement helps. Like any generative model, a short experimentation phase with your prompts will get better results.
Limitations
- Not designed for extreme cinematic camera motion or complex multi-scene storytelling
- No native 4K output
- Sound effects (SFX) performance is limited — for premium voice realism or advanced sound design, dedicated audio providers can deliver higher fidelity, and their output can be used as audio input to P-Video
- Above two speakers, speaker separation can degrade
- Speaker attribution drift can occur (e.g., one voice delivering multiple lines)
Model created
Model updated