cloneofsimo / hotshot-xl-lora-controlnet

Text-to-gif using SDXL, with controlnet and lora support

  • Public
  • 3.7K runs
  • L40S
  • GitHub
  • Paper
  • License

Input

string
Shift + Return to add a new line

The main prompt that guides the image generation.

Default: "Hi there doggo!"

string
Shift + Return to add a new line

A negative prompt to avoid certain features in the generated images.

Default: ""

integer

The width of the generated images.

Default: 672

integer

The height of the generated images.

Default: 384

integer

The number of steps for the prediction.

Default: 30

integer

The length of the video in frames.

Default: 8

integer

The duration of the video in milliseconds.

Default: 1000

string

The type of control net to use for conditional generation.

file

Input GIF for controlnet condition.

number

The start of the control guidance.

Default: 0

number

The end of the control guidance.

Default: 1

number

The scale of the controlnet conditioning.

Default: 0.7

integer

The seed for the random number generator.

Default: 455

string
Shift + Return to add a new line

Replicate LoRA weights to use. Leave blank to use the default weights.

string
Shift + Return to add a new line

The Hugginface URL for LoRA. For example, `fofr/barbie`

integer

The width of the `original_size` of images. If `original_size` is not the same as `target_size` the image will appear to be down- or upsampled. `original_size` defaults to `(width, height)` if not specified. Part of SDXL's micro-conditioning as explained in section 2.2 of https://arxiv.org/abs/2307.01952

Default: 1920

integer

The `original_size height` of the images.

Default: 1080

integer

The `target_size width` of the images.

Default: 512

integer

The `target_size height` of the images.

Default: 512

Output

output
Generated in

Run time and cost

This model costs approximately $0.036 to run on Replicate, or 27 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia L40S GPU hardware. Predictions typically complete within 37 seconds. The predict time for this model varies significantly based on the inputs.

Readme

✨ Hotshot-XL: State-of-the-art AI text-to-GIF model trained to work alongside Stable Diffusion XL

hotshot.co

Hotshot-XL can generate GIFs with any fine-tuned SDXL model. This means two things:

  1. You’ll be able to make GIFs with any existing or newly fine-tuned SDXL model you may want to use.
  2. If you’d like to make GIFs of personalized subjects, you can load your own SDXL based LORAs, and not have to worry about fine-tuning Hotshot-XL. This is awesome because it’s usually much easier to find suitable images for training data than it is to find videos. It also hopefully fits into everyone’s existing LORA usage/workflows :)

Hotshot-XL is compatible with SDXL ControlNet to make GIFs in the composition/layout you’d like. More information about controlnet

Hotshot-XL was trained to generate 1 second GIFs at 8 FPS.

Hotshot-XL was trained on various aspect ratios. For best results with the base Hotshot-XL model, we recommend using it with an SDXL model that has been fine-tuned with 512x512 images. You can find an SDXL model we fine-tuned for 512x512 resolutions:

https://huggingface.co/hotshotco/SDXL-512