google/veo-3.1

New and improved version of Veo 3, with higher-fidelity video, context-aware audio, reference image and last frame support

54.7K runs

Readme

Veo 3.1

Google’s state-of-the-art video generation model that creates high-quality videos with synchronized native audio from text prompts or images. Veo 3.1 offers enhanced prompt adherence, improved audiovisual quality, and powerful creative controls for image-to-video generation.

Key Features

Synchronized Audio Generation – Veo 3.1 generates rich native audio automatically, from natural conversations and sound effects to ambient soundscapes, perfectly synchronized with your video content.

Enhanced Image-to-Video – Transform static images into dynamic videos with superior prompt adherence and visual quality. Veo 3.1 excels at maintaining character consistency and understanding your creative vision.

Superior Prompt Understanding – The model demonstrates remarkable comprehension of complex, nuanced prompts including intricate scenes, specific camera movements, and detailed artistic styles that previous models often missed.

Realistic Physics and Motion – Veo 3.1 delivers true-to-life textures, coherent motion across frames, and improved realism capturing natural movement and interactions.

Reference Image Support – Upload up to 3 reference images to guide the appearance, style, and character consistency across your generated video, ensuring visual continuity throughout.

Frame-to-Frame Generation – Provide a starting and ending frame, and Veo 3.1 generates smooth, seamless transitions between them, perfect for creating artful scene transitions.

Scene Extension – Extend your videos beyond the initial generation, creating longer sequences that maintain visual and audio consistency by building on the final seconds of your previous clip.

Multiple Output Formats – Generate videos at 720p or 1080p resolution at 24 FPS, with support for both landscape (16:9) and portrait (9:16) aspect ratios. Choose from 4, 6, or 8-second durations.

Cinematic Quality – Veo 3.1 incorporates enhanced understanding of cinematic styles and narrative control, delivering more polished and professional-looking results.

What You Can Create

Text-to-Video – Describe your vision in natural language and watch it come to life with synchronized audio. From realistic scenes to fantastical concepts, Veo 3.1 translates your words into stunning visuals.

Image-to-Video – Animate your static images with lifelike motion and accompanying audio. Perfect for bringing concept art, photos, or illustrations to life.

Character Consistency – Maintain the same character appearance across multiple video generations using reference images, ideal for storytelling and creating cohesive content series.

Cinematic Transitions – Create smooth scene transitions by defining start and end frames, letting Veo 3.1 generate the motion in between with natural camera movement.

Extended Sequences – Build longer narratives by chaining multiple generations together, with each new clip seamlessly continuing from where the last one ended.

Best Practices

Crafting Effective Prompts – Be specific and descriptive in your text prompts. Include details about camera angles, lighting, mood, and any audio elements you want. For example: “A medium shot of a wise owl circling above a moonlit forest clearing, with wings flapping sounds and a gentle orchestral score.”

Using Reference Images – When using reference images for character or style consistency, choose clear, well-lit images that show the subject from the desired angle. You can provide 1-3 images to guide the generation.

Image-to-Video Tips – For best results with image-to-video, use high-quality input images with clear subjects. Your prompt should describe the motion and action you want to see, not just describe what’s already in the image.

Audio Considerations – While Veo 3.1 generates synchronized audio automatically, you can guide it by describing desired sounds in your prompt using tags or descriptions like “with bird songs and wind rustling” or “accompanied by upbeat music.”

Frame Control – When using start and end frames, ensure they’re visually compatible and the transition you’re requesting is physically plausible. The model works best with natural motion sequences.

About Veo 3.1

Veo 3.1 builds on Google’s Veo 3 foundation with significant improvements in prompt adherence and audiovisual quality, particularly for image-to-video generation. The model was designed with creative professionals in mind, offering granular control over generated content while maintaining ease of use.

All videos generated with Veo 3.1 are marked with SynthID, Google’s watermarking technology for identifying AI-generated content. The model has been extensively tested for safety and content policy compliance.

Veo 3.1 also comes in a Fast variant (Veo 3.1 Fast) that offers faster generation times while maintaining high quality, perfect for rapid iteration and experimentation.

Learn More

For detailed API documentation and the latest updates, visit Google’s Gemini API documentation.


Try the model yourself on the Replicate Playground to explore its capabilities and see how it can enhance your creative workflow.