zsxkib / animate-diff-scene-assembler

Dkamacho’s Scene Assembler

  • Public
  • 307 runs
  • L40S

Input

string
Shift + Return to add a new line

Prompt for changes in animation. Provide 'frame number : prompt at this frame', separate different prompts with '|'. Make sure the frame number does not exceed the length of video (frames)

Default: "0: a big (chrome:1.1) cyborg robot dance, (white science lab:1.1) | 100 :a (chrome:1.1) cyborg robot dance, (red led blinksp:1.1), (white science lab:1.1)"

string
Shift + Return to add a new line

Input Negative Prompt (constant)

Default: "embedding:easynegative, (worst quality, low quality: 1.3), zombie, horror, distorted, photo, nfsw"

integer
(minimum: 1, maximum: 50)

Number of denoising steps

Default: 15

number
(minimum: 1, maximum: 50)

Scale for classifier-free guidance

Default: 7.5

integer
(minimum: 1, maximum: 50)

Second pass, for video interoplation. Number of denoising steps

Default: 15

number
(minimum: 1, maximum: 50)

Second pass, for video interoplation. Scale for classifier-free guidance

Default: 8.5

file
Preview

Video of a subject doing something fun or interesting, this video will be used to track any people and their pose estimations for the final video

file
Preview
subject_image

An image of what the subject should look like in the final video

number
(minimum: 0, maximum: 1)

Strength of the controlnet for the subject/character image

Default: 0.2

file
Preview
background_image

An image of what the background should look like in the final video

number
(minimum: 0, maximum: 1)

Strength of the controlnet for the background image

Default: 0.4

boolean

Return any temporary files, such as preprocessed controlnet images. Useful for debugging.

Default: false

boolean

Automatically randomise seeds (seed, noise_seed, rand_seed)

Default: true

Output

Generated in

This output was created using a different version of the model, zsxkib/animate-diff-scene-assembler:20a5b2af.

Run time and cost

This model runs on Nvidia L40S GPU hardware. We don't yet have enough runs of this model to provide performance information.

Readme

What This Model Does 🎥

This model lets you create amazing animated videos by combining different elements like a video clip, a character image, and a background image. It uses cutting-edge techniques like motion tracking and image generation to seamlessly put these pieces together into one incredible video.

How to Use It 🛠️

To get started, you’ll need to provide the following:

  1. Prompt: A detailed text description of what you want the video to show, including any specific changes at different points in time.
  2. Negative Prompt: Words or phrases describing things you don’t want to appear in the video.
  3. Video Clip: A video showing someone or something performing an action you want to include in the final video.
  4. Character Image: A picture of the character or subject you want to feature in the video.
  5. Background Image: A picture of the background or environment for your video.

You can also tweak various settings to control how the video is generated, such as the level of detail and the randomness factor.

What to Keep in Mind ⚠️

While this model is incredibly powerful, there are a few important things to consider:

  1. Privacy 🔒: Always ensure you have permission to use any real-world video clips or images you include.
  2. Bias 👥: Be aware that the model might produce biased or unfair results based on the data it was trained on.
  3. Deepfakes 🌐: Use caution when creating videos that could be mistaken for real footage, as this could potentially spread misinformation.
  4. Copyright ©️: Avoid using copyrighted images or videos without proper permission.

Tips and Tricks 💡

  1. Experiment! 🔬: Don’t be afraid to try different prompts and settings to achieve the best results.
  2. Resources 💻: Keep in mind that creating high-quality videos can be demanding on your computer, so ensure you have sufficient power.
  3. Check Carefully 👀: Always review your videos thoroughly before sharing them to catch any potential issues.
  4. Stay Updated 🆙: Keep an eye out for new model versions or improvements to stay on the cutting edge.

Model Weights 🏋️‍♀️

This workflow utilizes various pre-trained weights and checkpoints, which are automatically downloaded when you run the model. Here’s a list of the key weights and some important notes:

  • control_v11f1e_sd15_tile.pth: Controls the animation and tiling of the video.
  • SD1.5 clipvision: The standard SD1.5 clipvision model (not the one named “comfy”).
  • ControlGif: This is actually the “controlnet” model from crishhh, renamed. Find it at https://huggingface.co/crishhh/animatediff_controlnet/tree/main.
  • It works similarly to the tile controlnet, helping control how much the background stays true to the input image.
  • CV_LEFT.safetensors: Handles computer vision tasks for the left side of the video.
  • v3_sd15_mm.ckpt: Generates images and videos using Stable Diffusion version 3.
  • v3_sd15_adapter_lora.ckpt: Adapts the Stable Diffusion model using LoRA (Low-Rank Adaptation).
  • dreamshaper_8.safetensors: Shapes and stylizes the generated content.
  • 150_16_swin_l_oneformer_coco_100ep.pth: Detects and tracks objects using the OneFormer model.
  • control_v11p_sd15_openpose.pth: Controls the animation based on OpenPose keypoints.
  • rife47.pth: Interpolates frames using the RIFE (Real-Time Intermediate Flow Estimation) model.
  • ip-adapter-plus_sd15.bin: Adapts the Stable Diffusion model for improved performance.
  • vae-ft-mse-840000-ema-pruned.safetensors: Encodes and decodes images using a pruned Variational Autoencoder.
  • yolox_l.onnx: Detects objects using the YOLOX (You Only Look Once X) model.
  • sk_model.pth and sk_model2.pth: Perform sketch-related tasks.
  • control_v11p_sd15_lineart.pth: Controls the animation based on lineart.
  • controlGIF_checkpoint.ckpt: Generates animated GIFs.
  • dw-ll_ucoco_384_bs5.torchscript.pt: Performs depth estimation and other tasks using a torchscript model.

⚠️ Important notes: - Ensure you use consistent SD1.5 models throughout, including checkpoints, LoRAs, IP Adapter, and AnimDiff. - When opening the workflow JSON, model names might not match what’s in your system. Make sure the correct models are hooked up, even if the names look correct in the workflow. - For better consistency, especially for hands and faces, use a higher resolution (e.g., 960x540) instead of downsampling too much.

The model weights are automatically downloaded and managed by the system, so you don’t need to manually acquire or update them. Focus on crafting your creative prompts and enjoy the amazing results! 🎥✨

By being mindful and responsible, you can use this model to unleash your creativity and make incredible animations! 🎉