zsxkib/animate-diff-scene-assembler | Run with an API on Replicate

Examples

Run time and cost

This model runs on Nvidia L40S GPU hardware. We don't yet have enough runs of this model to provide performance information.

Readme

What This Model Does 🎥

This model lets you create amazing animated videos by combining different elements like a video clip, a character image, and a background image. It uses cutting-edge techniques like motion tracking and image generation to seamlessly put these pieces together into one incredible video.

How to Use It 🛠️

To get started, you’ll need to provide the following:

Prompt: A detailed text description of what you want the video to show, including any specific changes at different points in time.
Negative Prompt: Words or phrases describing things you don’t want to appear in the video.
Video Clip: A video showing someone or something performing an action you want to include in the final video.
Character Image: A picture of the character or subject you want to feature in the video.
Background Image: A picture of the background or environment for your video.

You can also tweak various settings to control how the video is generated, such as the level of detail and the randomness factor.

What to Keep in Mind ⚠️

While this model is incredibly powerful, there are a few important things to consider:

Privacy 🔒: Always ensure you have permission to use any real-world video clips or images you include.
Bias 👥: Be aware that the model might produce biased or unfair results based on the data it was trained on.
Deepfakes 🌐: Use caution when creating videos that could be mistaken for real footage, as this could potentially spread misinformation.
Copyright ©️: Avoid using copyrighted images or videos without proper permission.

Tips and Tricks 💡

Experiment! 🔬: Don’t be afraid to try different prompts and settings to achieve the best results.
Resources 💻: Keep in mind that creating high-quality videos can be demanding on your computer, so ensure you have sufficient power.
Check Carefully 👀: Always review your videos thoroughly before sharing them to catch any potential issues.
Stay Updated 🆙: Keep an eye out for new model versions or improvements to stay on the cutting edge.

Model Weights 🏋️‍♀️

This workflow utilizes various pre-trained weights and checkpoints, which are automatically downloaded when you run the model. Here’s a list of the key weights and some important notes:

control_v11f1e_sd15_tile.pth: Controls the animation and tiling of the video.
SD1.5 clipvision: The standard SD1.5 clipvision model (not the one named “comfy”).
ControlGif: This is actually the “controlnet” model from crishhh, renamed. Find it at https://huggingface.co/crishhh/animatediff_controlnet/tree/main.
It works similarly to the tile controlnet, helping control how much the background stays true to the input image.
CV_LEFT.safetensors: Handles computer vision tasks for the left side of the video.
v3_sd15_mm.ckpt: Generates images and videos using Stable Diffusion version 3.
v3_sd15_adapter_lora.ckpt: Adapts the Stable Diffusion model using LoRA (Low-Rank Adaptation).
dreamshaper_8.safetensors: Shapes and stylizes the generated content.
150_16_swin_l_oneformer_coco_100ep.pth: Detects and tracks objects using the OneFormer model.
control_v11p_sd15_openpose.pth: Controls the animation based on OpenPose keypoints.
rife47.pth: Interpolates frames using the RIFE (Real-Time Intermediate Flow Estimation) model.
ip-adapter-plus_sd15.bin: Adapts the Stable Diffusion model for improved performance.
vae-ft-mse-840000-ema-pruned.safetensors: Encodes and decodes images using a pruned Variational Autoencoder.
yolox_l.onnx: Detects objects using the YOLOX (You Only Look Once X) model.
sk_model.pth and sk_model2.pth: Perform sketch-related tasks.
control_v11p_sd15_lineart.pth: Controls the animation based on lineart.
controlGIF_checkpoint.ckpt: Generates animated GIFs.
dw-ll_ucoco_384_bs5.torchscript.pt: Performs depth estimation and other tasks using a torchscript model.

⚠️ Important notes: - Ensure you use consistent SD1.5 models throughout, including checkpoints, LoRAs, IP Adapter, and AnimDiff. - When opening the workflow JSON, model names might not match what’s in your system. Make sure the correct models are hooked up, even if the names look correct in the workflow. - For better consistency, especially for hands and faces, use a higher resolution (e.g., 960x540) instead of downsampling too much.

The model weights are automatically downloaded and managed by the system, so you don’t need to manually acquire or update them. Focus on crafting your creative prompts and enjoy the amazing results! 🎥✨

By being mindful and responsible, you can use this model to unleash your creativity and make incredible animations! 🎉