Readme

Kling Video 3.0

Generate cinematic videos up to 15 seconds long from text prompts or images. Kling Video 3.0 improves on previous versions with longer output, stronger consistency across shots, and native audio generation including lip-synced dialogue.

What it does

Kling Video 3.0 turns text descriptions or still images into video clips at up to 1080p resolution. The model generates videos between 3 and 15 seconds—a significant jump from the 10-second limit of earlier versions. It handles realistic scenes, stylized content, and complex multi-step actions within a single generation.

You can also generate native audio alongside the video, including dialogue with lip sync, sound effects, and ambient sound—all in one pass.

How to use it

The model supports two main input modes:

Text to video: Describe what you want to see. The model generates visuals (and optionally audio) from your description.

Image to video: Upload a starting image and describe the motion you want. You can also provide an end image to guide where the video should land.

Multi-shot mode

For videos with multiple scenes, use the multi_prompt parameter. Pass a JSON array of shot definitions, each with a prompt and duration. You can define up to 6 shots, with a minimum of 1 second per shot. The total duration of all shots must equal the duration parameter.

[
  {"prompt": "A woman walks through a sunlit forest", "duration": 5},
  {"prompt": "She stops and looks up at the canopy", "duration": 3},
  {"prompt": "Sunlight breaks through the leaves", "duration": 2}
]

Writing effective prompts

Structure your prompts to cover:

Scene setting: Where and when the action happens, lighting conditions
Subject details: What characters or objects appear, how they look
Motion: What happens, how things move, camera behavior
Audio (if enabled): Dialogue in quotation marks, ambient sounds, sound effects

Example: A chef in a busy kitchen plates a dish with careful precision, steam rising from the food. Camera slowly pushes in on the plate. Sizzling sounds, kitchen chatter in the background, the chef says "Perfect."

Parameters

mode: standard (720p) or pro (1080p)
duration: 3 to 15 seconds
aspect_ratio: 16:9, 9:16, or 1:1 (ignored when using a start image)
generate_audio: Toggle native audio on or off
negative_prompt: Describe what to exclude from the generation

What it’s good for

Marketing and advertising videos
Social media content with dialogue
Multi-scene narratives and short stories
Product demonstrations
Cinematic sequences with synchronized audio

Limitations

Maximum 15 seconds per generation
Audio works best in English and Chinese
Character appearance can vary across separate generations
Complex physics interactions may not look fully natural
For longer videos, generate multiple clips and stitch them together

Privacy policy

https://app.klingai.com/global/dev/document-api/protocols/privacyPolicy

API terms

https://app.klingai.com/global/dev/document-api/protocols/paidServiceProtocol

Service level agreement

https://app.klingai.com/global/dev/document-api/protocols/paidLevelProtocol

Model created 5 months, 1 week ago

Model updated 3 months ago

Examples