Seedance 2.0

Generate high-quality video from text, images, video clips, and audio — all in one pass with synchronized sound. Seedance 2.0 is ByteDance’s next-generation video model, built on a unified multimodal architecture that accepts mixed inputs and produces coherent, audio-synced output.

What’s new in 2.0

Seedance 2.0 is a significant upgrade from 1.5 Pro:

Multimodal reference inputs — combine up to 9 images, 3 video clips, and 3 audio files in a single generation. Reference them in your prompt as [Image1], [Video1], [Audio1], etc.
Better motion and physics — more realistic rendering of complex interactions like sports, dancing, and object collisions.
Video editing and extension — modify existing videos or extend them by providing a reference video and describing what should happen next.
Intelligent duration — set duration to -1 and let the model pick the best length for the content.
Adaptive aspect ratio — set aspect ratio to “adaptive” and the model will choose the best fit based on your inputs.

What you can create

Text to video

Describe a scene in natural language and get a video with matching audio. The model understands multi-subject interactions, camera movements, and emotional tone. For dialogue, put speech in double quotes in your prompt — the model generates matching lip movements and voice.

Image to video

Animate a still image by providing it as the first frame. You can also specify a last frame image to control where the video ends up. The model preserves the look and style of your input image while adding natural motion.

Multimodal reference

Combine images, videos, and audio as references. For example, provide a reference video for motion style, reference images for character appearance, and reference audio for rhythm — then describe how to combine them. This is powerful for outfit-change videos, product showcases, and music-synced content.

Video editing

Provide a reference video and describe changes — replace an object, change a background, or alter the style. The model preserves the original motion and camera work while making your edits.

Video extension

Provide a reference video and describe what should happen next. The model continues the scene with consistent characters, environment, and style.

Key features

Native audio generation

Audio and video are generated together, not separately. This means dialogue, sound effects, and background music are all synchronized with the visuals from the start. You can turn audio off if you just want silent video.

Character consistency

When using reference images, the model maintains facial features, clothing, and style across the generated video. This makes it possible to create multi-shot narratives with consistent characters.

Precise prompt following

The model handles complex prompts with multiple subjects, specific actions, and detailed camera movements. It understands spatial relationships and sequential actions.

Tips for good results

Be specific in your prompts — describe camera movements, lighting, mood, and specific actions.
For dialogue, put the spoken words in double quotes: The man stopped and said: "Remember this moment."
When using reference inputs, label them in your prompt: “The character from [Image1] performs the dance from [Video1].”
For video editing, describe what to change and what to keep: “Replace the perfume in [Video1] with the face cream from [Image1], keeping all original motion.”
Start with shorter durations (5 seconds) while experimenting, then increase once you’re happy with the style.

Supported resolutions

Resolution	16:9	4:3	1:1	3:4	9:16	21:9
480p	864×496	752×560	640×640	560×752	496×864	992×432
720p	1280×720	1112×834	960×960	834×1112	720×1280	1470×630

Learn more

For technical details and architecture, see the official Seedance 2.0 page.

You can try this model on the Replicate Playground.

Model created 1 month, 2 weeks ago

Model updated 4 weeks, 1 day ago