cloneofsimo/hotshot-xl-lora-controlnet | Run with an API on Replicate

cloneofsimo / hotshot-xl-lora-controlnet

Text-to-gif using SDXL, with controlnet and lora support

Cold

Public
3.7K runs
L40S
GitHub
Paper
License

Iterate in playground

Run with an API

Playground API Examples README Versions

Input

prompt

string

Shift + Return to add a new line

anime, animated establishing shot of a volcano erupting, bright sunshine and snowanime, animated establishing shot of a volcano erupting, bright sunshine and snow

The main prompt that guides the image generation.

Default: "Hi there doggo!"

negative_prompt

string

Shift + Return to add a new line

A negative prompt to avoid certain features in the generated images.

Default: ""

width

integer

The width of the generated images.

Default: 672

height

integer

The height of the generated images.

Default: 384

steps

integer

The number of steps for the prediction.

Default: 30

video_length

integer

The length of the video in frames.

Default: 8

video_duration

integer

The duration of the video in milliseconds.

Default: 1000

control_type

string

The type of control net to use for conditional generation.

gif

file

Input GIF for controlnet condition.

control_guidance_start

number

The start of the control guidance.

Default: 0

control_guidance_end

number

The end of the control guidance.

Default: 1

controlnet_conditioning_scale

number

The scale of the controlnet conditioning.

Default: 0.7

seed

integer

The seed for the random number generator.

Default: 455

replicate_weights_url

string

Shift + Return to add a new line

Replicate LoRA weights to use. Leave blank to use the default weights.

hf_lora_url

string

Shift + Return to add a new line

The Hugginface URL for LoRA. For example, `fofr/barbie`

original_width

integer

The width of the `original_size` of images. If `original_size` is not the same as `target_size` the image will appear to be down- or upsampled. `original_size` defaults to `(width, height)` if not specified. Part of SDXL's micro-conditioning as explained in section 2.2 of https://arxiv.org/abs/2307.01952

Default: 1920

original_height

integer

The `original_size height` of the images.

Default: 1080

target_width

integer

The `target_size width` of the images.

Default: 512

target_height

integer

The `target_size height` of the images.

Default: 512

Run this model in Node.js with one line of code:

npx create-replicate --model=cloneofsimo/hotshot-xl-lora-controlnet

or set up a project from scratch

Install Replicate’s Node.js client library:

npm install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import and set up the client:

import Replicate from "replicate";
import fs from "node:fs";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run cloneofsimo/hotshot-xl-lora-controlnet using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "cloneofsimo/hotshot-xl-lora-controlnet:c447ef9fc621af091e2c06d08fd2a22d9f5906389a2f8103c851a2f7cf9c4e63",
  {
    input: {
      seed: 1002,
      steps: 30,
      width: 672,
      height: 384,
      prompt: "anime, animated establishing shot of a volcano erupting, bright sunshine and snow",
      control_type: "depth",
      target_width: 512,
      video_length: 8,
      target_height: 512,
      original_width: 1920,
      video_duration: 1000,
      negative_prompt: "dark, underexposed",
      original_height: 1080,
      control_guidance_end: 0.7,
      replicate_weights_url: "",
      control_guidance_start: 0,
      controlnet_conditioning_scale: 0.8
    }
  }
);

// To access the file URL:
console.log(output.url()); //=> "http://example.com"

// To write the file to disk:
fs.writeFile("my-image.png", output);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import the client:

import replicate

Run cloneofsimo/hotshot-xl-lora-controlnet using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "cloneofsimo/hotshot-xl-lora-controlnet:c447ef9fc621af091e2c06d08fd2a22d9f5906389a2f8103c851a2f7cf9c4e63",
    input={
        "seed": 1002,
        "steps": 30,
        "width": 672,
        "height": 384,
        "prompt": "anime, animated establishing shot of a volcano erupting, bright sunshine and snow",
        "control_type": "depth",
        "target_width": 512,
        "video_length": 8,
        "target_height": 512,
        "original_width": 1920,
        "video_duration": 1000,
        "negative_prompt": "dark, underexposed",
        "original_height": 1080,
        "control_guidance_end": 0.7,
        "replicate_weights_url": "",
        "control_guidance_start": 0,
        "controlnet_conditioning_scale": 0.8
    }
)
print(output)

To learn more, take a look at the guide on getting started with Python.

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Run cloneofsimo/hotshot-xl-lora-controlnet using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "cloneofsimo/hotshot-xl-lora-controlnet:c447ef9fc621af091e2c06d08fd2a22d9f5906389a2f8103c851a2f7cf9c4e63",
    "input": {
      "seed": 1002,
      "steps": 30,
      "width": 672,
      "height": 384,
      "prompt": "anime, animated establishing shot of a volcano erupting, bright sunshine and snow",
      "control_type": "depth",
      "target_width": 512,
      "video_length": 8,
      "target_height": 512,
      "original_width": 1920,
      "video_duration": 1000,
      "negative_prompt": "dark, underexposed",
      "original_height": 1080,
      "control_guidance_end": 0.7,
      "replicate_weights_url": "",
      "control_guidance_start": 0,
      "controlnet_conditioning_scale": 0.8
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

Output

{
  "completed_at": "2023-10-10T09:52:09.874760Z",
  "created_at": "2023-10-10T09:51:46.452704Z",
  "data_removed": false,
  "error": null,
  "id": "irlqj5lb2ilnbbdmrpxmjtutxu",
  "input": {
    "gif": null,
    "seed": 1002,
    "steps": 30,
    "width": 672,
    "height": 384,
    "prompt": "anime, animated establishing shot of a volcano erupting, bright sunshine and snow",
    "hf_lora_url": null,
    "control_type": "depth",
    "target_width": 512,
    "video_length": 8,
    "target_height": 512,
    "original_width": 1920,
    "video_duration": 1000,
    "negative_prompt": "dark, underexposed",
    "original_height": 1080,
    "control_guidance_end": 0.7,
    "replicate_weights_url": "",
    "control_guidance_start": 0,
    "controlnet_conditioning_scale": 0.8
  },
  "logs": "Warning - setting num_images_per_prompt = 1 because video_length = 8\n  0%|          | 0/30 [00:00<?, ?it/s]\n  3%|▎         | 1/30 [00:00<00:19,  1.45it/s]\n  7%|▋         | 2/30 [00:01<00:19,  1.45it/s]\n 10%|█         | 3/30 [00:02<00:18,  1.45it/s]\n 13%|█▎        | 4/30 [00:02<00:17,  1.45it/s]\n 17%|█▋        | 5/30 [00:03<00:17,  1.45it/s]\n 20%|██        | 6/30 [00:04<00:16,  1.45it/s]\n 23%|██▎       | 7/30 [00:04<00:15,  1.46it/s]\n 27%|██▋       | 8/30 [00:05<00:15,  1.46it/s]\n 30%|███       | 9/30 [00:06<00:14,  1.46it/s]\n 33%|███▎      | 10/30 [00:06<00:13,  1.46it/s]\n 37%|███▋      | 11/30 [00:07<00:13,  1.45it/s]\n 40%|████      | 12/30 [00:08<00:12,  1.45it/s]\n 43%|████▎     | 13/30 [00:08<00:11,  1.45it/s]\n 47%|████▋     | 14/30 [00:09<00:10,  1.45it/s]\n 50%|█████     | 15/30 [00:10<00:10,  1.45it/s]\n 53%|█████▎    | 16/30 [00:11<00:09,  1.45it/s]\n 57%|█████▋    | 17/30 [00:11<00:08,  1.45it/s]\n 60%|██████    | 18/30 [00:12<00:08,  1.45it/s]\n 63%|██████▎   | 19/30 [00:13<00:07,  1.45it/s]\n 67%|██████▋   | 20/30 [00:13<00:06,  1.45it/s]\n 70%|███████   | 21/30 [00:14<00:06,  1.45it/s]\n 73%|███████▎  | 22/30 [00:15<00:05,  1.45it/s]\n 77%|███████▋  | 23/30 [00:15<00:04,  1.45it/s]\n 80%|████████  | 24/30 [00:16<00:04,  1.45it/s]\n 83%|████████▎ | 25/30 [00:17<00:03,  1.45it/s]\n 87%|████████▋ | 26/30 [00:17<00:02,  1.45it/s]\n 90%|█████████ | 27/30 [00:18<00:02,  1.45it/s]\n 93%|█████████▎| 28/30 [00:19<00:01,  1.45it/s]\n 97%|█████████▋| 29/30 [00:19<00:00,  1.45it/s]\n100%|██████████| 30/30 [00:20<00:00,  1.45it/s]\n100%|██████████| 30/30 [00:20<00:00,  1.45it/s]\n  0%|          | 0/8 [00:00<?, ?it/s]\n 50%|█████     | 4/8 [00:00<00:00, 35.64it/s]\n100%|██████████| 8/8 [00:00<00:00, 19.58it/s]\n100%|██████████| 8/8 [00:00<00:00, 20.99it/s]",
  "metrics": {
    "predict_time": 23.417482,
    "total_time": 23.422056
  },
  "output": "https://replicate.delivery/pbxt/RU9CI33SMCKMFBFQplELLexGPsOGNIU42VpauosBZZLkhW2IA/tmp.gif",
  "started_at": "2023-10-10T09:51:46.457278Z",
  "status": "succeeded",
  "urls": {
    "get": "https://api.replicate.com/v1/predictions/irlqj5lb2ilnbbdmrpxmjtutxu",
    "cancel": "https://api.replicate.com/v1/predictions/irlqj5lb2ilnbbdmrpxmjtutxu/cancel"
  },
  "version": "c447ef9fc621af091e2c06d08fd2a22d9f5906389a2f8103c851a2f7cf9c4e63"
}

Generated in

23.4 seconds

Tweak itReport View full prediction

Warning - setting num_images_per_prompt = 1 because video_length = 8
  0%|          | 0/30 [00:00<?, ?it/s]
  3%|▎         | 1/30 [00:00<00:19,  1.45it/s]
  7%|▋         | 2/30 [00:01<00:19,  1.45it/s]
 10%|█         | 3/30 [00:02<00:18,  1.45it/s]
 13%|█▎        | 4/30 [00:02<00:17,  1.45it/s]
 17%|█▋        | 5/30 [00:03<00:17,  1.45it/s]
 20%|██        | 6/30 [00:04<00:16,  1.45it/s]
 23%|██▎       | 7/30 [00:04<00:15,  1.46it/s]
 27%|██▋       | 8/30 [00:05<00:15,  1.46it/s]
 30%|███       | 9/30 [00:06<00:14,  1.46it/s]
 33%|███▎      | 10/30 [00:06<00:13,  1.46it/s]
 37%|███▋      | 11/30 [00:07<00:13,  1.45it/s]
 40%|████      | 12/30 [00:08<00:12,  1.45it/s]
 43%|████▎     | 13/30 [00:08<00:11,  1.45it/s]
 47%|████▋     | 14/30 [00:09<00:10,  1.45it/s]
 50%|█████     | 15/30 [00:10<00:10,  1.45it/s]
 53%|█████▎    | 16/30 [00:11<00:09,  1.45it/s]
 57%|█████▋    | 17/30 [00:11<00:08,  1.45it/s]
 60%|██████    | 18/30 [00:12<00:08,  1.45it/s]
 63%|██████▎   | 19/30 [00:13<00:07,  1.45it/s]
 67%|██████▋   | 20/30 [00:13<00:06,  1.45it/s]
 70%|███████   | 21/30 [00:14<00:06,  1.45it/s]
 73%|███████▎  | 22/30 [00:15<00:05,  1.45it/s]
 77%|███████▋  | 23/30 [00:15<00:04,  1.45it/s]
 80%|████████  | 24/30 [00:16<00:04,  1.45it/s]
 83%|████████▎ | 25/30 [00:17<00:03,  1.45it/s]
 87%|████████▋ | 26/30 [00:17<00:02,  1.45it/s]
 90%|█████████ | 27/30 [00:18<00:02,  1.45it/s]
 93%|█████████▎| 28/30 [00:19<00:01,  1.45it/s]
 97%|█████████▋| 29/30 [00:19<00:00,  1.45it/s]
100%|██████████| 30/30 [00:20<00:00,  1.45it/s]
100%|██████████| 30/30 [00:20<00:00,  1.45it/s]
  0%|          | 0/8 [00:00<?, ?it/s]
 50%|█████     | 4/8 [00:00<00:00, 35.64it/s]
100%|██████████| 8/8 [00:00<00:00, 19.58it/s]
100%|██████████| 8/8 [00:00<00:00, 20.99it/s]

Examples

View more examples

Run time and cost

This model costs approximately $0.036 to run on Replicate, or 27 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia L40S GPU hardware. Predictions typically complete within 37 seconds. The predict time for this model varies significantly based on the inputs.

Readme

✨ Hotshot-XL: State-of-the-art AI text-to-GIF model trained to work alongside Stable Diffusion XL

hotshot.co

Hotshot-XL can generate GIFs with any fine-tuned SDXL model. This means two things:

You’ll be able to make GIFs with any existing or newly fine-tuned SDXL model you may want to use.
If you’d like to make GIFs of personalized subjects, you can load your own SDXL based LORAs, and not have to worry about fine-tuning Hotshot-XL. This is awesome because it’s usually much easier to find suitable images for training data than it is to find videos. It also hopefully fits into everyone’s existing LORA usage/workflows :)

Hotshot-XL is compatible with SDXL ControlNet to make GIFs in the composition/layout you’d like. More information about controlnet

Hotshot-XL was trained to generate 1 second GIFs at 8 FPS.

Hotshot-XL was trained on various aspect ratios. For best results with the base Hotshot-XL model, we recommend using it with an SDXL model that has been fine-tuned with 512x512 images. You can find an SDXL model we fine-tuned for 512x512 resolutions:

https://huggingface.co/hotshotco/SDXL-512