Wan 2.2 Animate Replace
Replace characters in videos while preserving motion, expressions, and scene lighting.
What does this do?
Wan 2.2 Animate Replace from Tongyi Lab takes a character image and a reference video, then swaps the person in the video with your character. The original movements, facial expressions, and scene stay intact. Your replacement character matches the lighting and color tone of the scene, so it looks like they were there all along.
This works differently from typical face swaps. Instead of just replacing a face, the model replaces the entire character while copying all their movements from the original video. The character you insert gets the same body language, gestures, and facial expressions as the person in the reference video.
How it works
The model uses two key techniques to make realistic replacements. First, it extracts skeleton signals from the reference video to understand body motion. These spatial coordinates tell the model exactly how the person moves frame by frame. Second, it captures facial features directly as encoded patterns rather than using simple landmark points, which helps preserve subtle expressions like slight smiles or raised eyebrows.
When you run the model, it also applies a relighting module. This adjusts your character’s lighting and color to match the video’s environment. Without this step, inserted characters often look pasted on, with mismatched shadows or highlights. The relighting makes your character blend naturally into each scene.
The model is built on a fourteen billion parameter architecture using a mixture of experts design. This means it actually contains two specialized models working together: one handles the early denoising stages focusing on overall layout, and another refines details in later stages. This setup gives you high quality results while keeping processing efficient.
What you need
You’ll need two inputs: a character image and a reference video.
For the character image, use a clear photo where the person is visible and well-lit. The image can be a portrait, half-body shot, or full-body shot. Make sure only one person appears in the image. The character’s body proportions should roughly match the person in your reference video for the best results.
For the reference video, choose footage that shows the motion and expressions you want to copy. The video should contain one person whose movements you want to replicate. Keep in mind that complex camera movements or very fast motion might be harder for the model to handle than simpler scenes.
What you get
The model outputs a video where your character performs all the actions from the reference video. The background, camera movement, and scene lighting from the original video stay exactly the same. Only the person gets replaced.
The output maintains the original video’s resolution and frame rate. You can generate videos at different quality levels: 720p gives you sharp results with longer processing time, while 480p processes faster with slightly lower detail.
Use cases
This model works well for creating marketing videos where you need the same presentation performed by different people or characters. Instead of filming multiple takes with different actors, you can film once and swap in different characters.
Content creators use this for storytelling where they want to place illustrated characters or mascots into real footage. The motion copying means your character moves naturally instead of looking like a static overlay.
Educators can take historical footage and replace people with illustrated historical figures, making the content more engaging while preserving authentic movements and context.
Things to know
The model works best with videos that have clear, well-lit subjects. Very dark scenes or footage with heavy motion blur might produce less accurate results.
Videos should contain one main subject. If multiple people appear in your reference video, the model might have trouble deciding which movements to copy or how to handle the replacement.
The character in your image should have similar proportions to the person in the video. If you try to replace a tall person with a short character, the result might look distorted because the model is copying exact movements that don’t match the proportions.
Very long videos process in segments. The model handles this by using frames from previous segments to maintain consistency, but you might notice slight variations between segments in extended videos.
Technical details
Tongyi Lab built this as part of the larger Wan 2.2 model family. The animate replace model uses spatially-aligned skeleton signals for body motion control and implicit facial features for expression replication.
The architecture processes video through a diffusion transformer with a VAE compression stage. The relighting capability comes from a lightweight LoRA module applied to the attention layers, which adjusts lighting without requiring full model retraining.
For more technical information, you can check the Wan 2.2 documentation on GitHub.
Try it yourself
You can try Tongyi Lab’s model on the Replicate Playground at replicate.com/playground.