kwaivgi/kling-v3-video

Kling Video 3.0: Generate cinematic videos up to 15 seconds with multi-shot control, native audio, and improved consistency

247 runs

Readme

Kling Video 3.0

Generate cinematic videos up to 15 seconds long from text prompts or images. Kling Video 3.0 improves on previous versions with longer output, stronger consistency across shots, and native audio generation including lip-synced dialogue.

What it does

Kling Video 3.0 turns text descriptions or still images into video clips at up to 1080p resolution. The model generates videos between 3 and 15 seconds—a significant jump from the 10-second limit of earlier versions. It handles realistic scenes, stylized content, and complex multi-step actions within a single generation.

You can also generate native audio alongside the video, including dialogue with lip sync, sound effects, and ambient sound—all in one pass.

How to use it

The model supports two main input modes:

Text to video: Describe what you want to see. The model generates visuals (and optionally audio) from your description.

Image to video: Upload a starting image and describe the motion you want. You can also provide an end image to guide where the video should land.

Multi-shot mode

For videos with multiple scenes, use the multi_prompt parameter. Pass a JSON array of shot definitions, each with a prompt and duration. You can define up to 6 shots, with a minimum of 1 second per shot. The total duration of all shots must equal the duration parameter.

[
  {"prompt": "A woman walks through a sunlit forest", "duration": 5},
  {"prompt": "She stops and looks up at the canopy", "duration": 3},
  {"prompt": "Sunlight breaks through the leaves", "duration": 2}
]

Writing effective prompts

Structure your prompts to cover:

  • Scene setting: Where and when the action happens, lighting conditions
  • Subject details: What characters or objects appear, how they look
  • Motion: What happens, how things move, camera behavior
  • Audio (if enabled): Dialogue in quotation marks, ambient sounds, sound effects

Example: A chef in a busy kitchen plates a dish with careful precision, steam rising from the food. Camera slowly pushes in on the plate. Sizzling sounds, kitchen chatter in the background, the chef says "Perfect."

Parameters

  • mode: standard (720p) or pro (1080p)
  • duration: 3 to 15 seconds
  • aspect_ratio: 16:9, 9:16, or 1:1 (ignored when using a start image)
  • generate_audio: Toggle native audio on or off
  • negative_prompt: Describe what to exclude from the generation

What it’s good for

  • Marketing and advertising videos
  • Social media content with dialogue
  • Multi-scene narratives and short stories
  • Product demonstrations
  • Cinematic sequences with synchronized audio

Limitations

  • Maximum 15 seconds per generation
  • Audio works best in English and Chinese
  • Character appearance can vary across separate generations
  • Complex physics interactions may not look fully natural
  • For longer videos, generate multiple clips and stitch them together

Privacy policy

https://app.klingai.com/global/dev/document-api/protocols/privacyPolicy

API terms

https://app.klingai.com/global/dev/document-api/protocols/paidServiceProtocol

Service level agreement

https://app.klingai.com/global/dev/document-api/protocols/paidLevelProtocol

Model created
Model updated