kwaivgi/kling-o1

Modify an existing video through natural-language commands, changing subjects, environments, and visual style while preserving the original motion and timing.

154 runs

Readme

Transform Video with Context-Aware Editing

Kling O1 Edit approaches video editing differently from traditional frame-by-frame tools. Rather than relying on masking or manual adjustments, it analyzes the motion and spatial structure of the entire clip and applies changes that remain consistent with camera movement, subject actions, and scene layout.

Edits are guided by natural-language instructions, allowing you to modify characters, environments, or visual style while preserving the original motion and timing.


Core Capabilities

Multi-Reference Composition

Use up to four combined elements and reference images in a single edit. This enables detailed character replacements and stylistic transformations guided by multiple visual sources.

Motion Integrity

Camera paths, body movement, and timing remain unchanged. The model transforms visual content without altering the underlying motion of the footage.

Prompt-Led Editing

Edits are directed through simple written instructions instead of technical parameters.

Example:

“Replace the character with @Element1 while keeping the same movement and camera framing. Change the environment to match @Image1.”

Optional Audio Retention

You can preserve the original audio track or output a silent video using the keep_audio parameter.

Structured Visual Inputs

Each element supports a frontal image plus multiple angle references (frontal_image_url and reference_image_urls), providing richer visual context for more accurate transformations.


Pricing and Performance

Kling O1 Edit uses duration-based pricing that scales with the length of your source clip, reflecting the computational cost of motion-consistent transformations.

Metric Value Notes
Estimated cost $0.50–$1.68 Typical range for 3–10 second clips at $0.168/sec
Supported duration 3–10 seconds Output length matches input
Supported formats .mp4, .mov, .webm, .m4v, .gif Maximum file size: 200MB
Resolution support 720–2160px HD through 4K
Reference inputs Up to 4 total Elements and style images combined

Technical Overview

Specification Details
Model Kling O1 Edit
Inputs Video and reference images (.jpg, .png, .webp, .gif, .avif)
Output .mp4 video
Audio handling Optional retention via keep_audio (default: false)
Prompt notation @Element1, @Element2 for tracked elements; @Image1, @Image2 for style references

Comparison with Other Workflows

Sora 2 (Remix-Oriented Editing)

Kling O1 Edit is designed for precise, reference-driven transformations—particularly useful when you need controlled character or environment changes. Sora-style workflows are often better suited to broader stylistic reinterpretation or longer narrative sequences.

Wan (Parameter-Based Editing)

Kling O1 Edit focuses on ease of use through natural-language prompting, minimizing technical setup. Wan tools provide more granular parameter control for advanced users who need fine-tuned transformation pipelines.

AnimateDiff (Stylized Generation)

Kling O1 Edit maintains the exact motion and camera behavior of the original clip while altering appearance and setting. AnimateDiff is typically used for stylized or animation-heavy workflows that may generate or modify motion rather than strictly preserving it.

Model created