hvision-nku / storydiffusion

Consistent Self-Attention for Long-Range Image and Video Generation

Cold

Public
72.2K runs
L40S
GitHub
Paper
License

Run with an API

Playground API Examples README Versions

Input

sd_model

string

Choose a model

Default: "Unstable"

ref_image

file

Reference image for the character

character_description

string

Shift + Return to add a new line

General description of the character. If ref_image above is provided, making sure to follow the class word you want to customize with the trigger word 'img', such as: 'man img' or 'woman img' or 'girl img'

Default: "a man, wearing black suit"

negative_prompt

string

Shift + Return to add a new line

bad anatomy, bad hands, missing fingers, extra fingers, three hands, three legs, bad arms, missing legs, missing arms, poorly drawn face, bad face, fused face, cloned face, three crus, fused feet, fused thigh, extra crus, ugly fingers, horn, cartoon, cg, 3d, unreal, animate, amputation, disconnected limbsbad anatomy, bad hands, missing fingers, extra fingers, three hands, three legs, bad arms, missing legs, missing arms, poorly drawn face, bad face, fused face, cloned face, three crus, fused feet, fused thigh, extra crus, ugly fingers, horn, cartoon, cg, 3d, unreal, animate, amputation, disconnected limbs

Describe things you do not want to see in the output

Default: "bad anatomy, bad hands, missing fingers, extra fingers, three hands, three legs, bad arms, missing legs, missing arms, poorly drawn face, bad face, fused face, cloned face, three crus, fused feet, fused thigh, extra crus, ugly fingers, horn, cartoon, cg, 3d, unreal, animate, amputation, disconnected limbs"

comic_description

string

Shift + Return to add a new line

at home, read new paper #at home, The newspaper says there is a treasure house in the forest.
on the road, near the forest
[NC] The car on the road, near the forest #He drives to the forest in search of treasure.
[NC]A tiger appeared in the forest, at night 
very frightened, open mouth, in the forest, at night
running very fast, in the forest, at night
[NC] A house in the forest, at night #Suddenly, he discovers the treasure house!
in the house filled with  treasure, laughing, at night #He is overjoyed inside the house.at home, read new paper #at home, The newspaper says there is a treasure house in the forest.
on the road, near the forest
[NC] The car on the road, near the forest #He drives to the forest in search of treasure.
[NC]A tiger appeared in the forest, at night 
very frightened, open mouth, in the forest, at night
running very fast, in the forest, at night
[NC] A house in the forest, at night #Suddenly, he discovers the treasure house!
in the house filled with  treasure, laughing, at night #He is overjoyed inside the house.

Comic Description. Each frame is divided by a new line. Only the first 10 prompts are valid for demo speed! For comic_description NOT using ref_image: (1) Support Typesetting Style and Captioning. By default, the prompt is used as the caption for each image. If you need to change the caption, add a '#' at the end of each line. Only the part after the '#' will be added as a caption to the image. (2) The [NC] symbol is used as a flag to indicate that no characters should be present in the generated scene images. If you want do that, prepend the '[NC]' at the beginning of the line.

Default: "at home, read new paper #at home, The newspaper says there is a treasure house in the forest.\non the road, near the forest\n[NC] The car on the road, near the forest #He drives to the forest in search of treasure.\n[NC]A tiger appeared in the forest, at night \nvery frightened, open mouth, in the forest, at night\nrunning very fast, in the forest, at night\n[NC] A house in the forest, at night #Suddenly, he discovers the treasure house!\nin the house filled with treasure, laughing, at night #He is overjoyed inside the house."

style_name

string

Style template

Default: "Japanese Anime"

comic_style

string

Select the comic style for the combined comic

Default: "Classic Comic Style"

style_strength_ratio

integer

(minimum: 15, maximum: 50)

Style strength of Ref Image (%), only used if ref_image is provided

Default: 20

image_width

integer

Width of output image

Default: 768

image_height

integer

Height of output image

Default: 768

num_steps

integer

(minimum: 20, maximum: 50)

Number of sample steps

Default: 25

guidance_scale

number

(minimum: 0.1, maximum: 10)

Scale for classifier-free guidance

Default: 5

seed

integer

Random seed. Leave blank to randomize the seed

sa32_setting

number

(minimum: 0, maximum: 1)

The degree of Paired Attention at 32 x 32 self-attention layers

Default: 0.5

sa64_setting

number

(minimum: 0, maximum: 1)

The degree of Paired Attention at 64 x 64 self-attention layers

Default: 0.5

num_ids

integer

Number of id images in total images. This should not exceed total number of line-separated prompts

Default: 3

output_format

string

Format of the output images

Default: "webp"

output_quality

integer

(minimum: 0, maximum: 100)

Quality of the output images, from 0 to 100. 100 is best quality, 0 is lowest quality

Default: 80

Run this model in Node.js with one line of code:

npx create-replicate --model=hvision-nku/storydiffusion

or set up a project from scratch

Install Replicate’s Node.js client library:

npm install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import and set up the client:

import Replicate from "replicate";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run hvision-nku/storydiffusion using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "hvision-nku/storydiffusion:39c85f153f00e4e9328cb3035b94559a8ec66170eb4c0618c07b16528bf20ac2",
  {
    input: {
      num_ids: 3,
      sd_model: "Unstable",
      num_steps: 25,
      style_name: "Japanese Anime",
      comic_style: "Classic Comic Style",
      image_width: 768,
      image_height: 768,
      sa32_setting: 0.5,
      sa64_setting: 0.5,
      output_format: "webp",
      guidance_scale: 5,
      output_quality: 80,
      negative_prompt: "bad anatomy, bad hands, missing fingers, extra fingers, three hands, three legs, bad arms, missing legs, missing arms, poorly drawn face, bad face, fused face, cloned face, three crus, fused feet, fused thigh, extra crus, ugly fingers, horn, cartoon, cg, 3d, unreal, animate, amputation, disconnected limbs",
      comic_description: "at home, read new paper #at home, The newspaper says there is a treasure house in the forest.\non the road, near the forest\n[NC] The car on the road, near the forest #He drives to the forest in search of treasure.\n[NC]A tiger appeared in the forest, at night \nvery frightened, open mouth, in the forest, at night\nrunning very fast, in the forest, at night\n[NC] A house in the forest, at night #Suddenly, he discovers the treasure house!\nin the house filled with  treasure, laughing, at night #He is overjoyed inside the house.",
      style_strength_ratio: 20,
      character_description: "a man, wearing black suit"
    }
  }
);

console.log(output);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import the client:

import replicate

Run hvision-nku/storydiffusion using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "hvision-nku/storydiffusion:39c85f153f00e4e9328cb3035b94559a8ec66170eb4c0618c07b16528bf20ac2",
    input={
        "num_ids": 3,
        "sd_model": "Unstable",
        "num_steps": 25,
        "style_name": "Japanese Anime",
        "comic_style": "Classic Comic Style",
        "image_width": 768,
        "image_height": 768,
        "sa32_setting": 0.5,
        "sa64_setting": 0.5,
        "output_format": "webp",
        "guidance_scale": 5,
        "output_quality": 80,
        "negative_prompt": "bad anatomy, bad hands, missing fingers, extra fingers, three hands, three legs, bad arms, missing legs, missing arms, poorly drawn face, bad face, fused face, cloned face, three crus, fused feet, fused thigh, extra crus, ugly fingers, horn, cartoon, cg, 3d, unreal, animate, amputation, disconnected limbs",
        "comic_description": "at home, read new paper #at home, The newspaper says there is a treasure house in the forest.\non the road, near the forest\n[NC] The car on the road, near the forest #He drives to the forest in search of treasure.\n[NC]A tiger appeared in the forest, at night \nvery frightened, open mouth, in the forest, at night\nrunning very fast, in the forest, at night\n[NC] A house in the forest, at night #Suddenly, he discovers the treasure house!\nin the house filled with  treasure, laughing, at night #He is overjoyed inside the house.",
        "style_strength_ratio": 20,
        "character_description": "a man, wearing black suit"
    }
)
print(output)

To learn more, take a look at the guide on getting started with Python.

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Run hvision-nku/storydiffusion using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "hvision-nku/storydiffusion:39c85f153f00e4e9328cb3035b94559a8ec66170eb4c0618c07b16528bf20ac2",
    "input": {
      "num_ids": 3,
      "sd_model": "Unstable",
      "num_steps": 25,
      "style_name": "Japanese Anime",
      "comic_style": "Classic Comic Style",
      "image_width": 768,
      "image_height": 768,
      "sa32_setting": 0.5,
      "sa64_setting": 0.5,
      "output_format": "webp",
      "guidance_scale": 5,
      "output_quality": 80,
      "negative_prompt": "bad anatomy, bad hands, missing fingers, extra fingers, three hands, three legs, bad arms, missing legs, missing arms, poorly drawn face, bad face, fused face, cloned face, three crus, fused feet, fused thigh, extra crus, ugly fingers, horn, cartoon, cg, 3d, unreal, animate, amputation, disconnected limbs",
      "comic_description": "at home, read new paper #at home, The newspaper says there is a treasure house in the forest.\\non the road, near the forest\\n[NC] The car on the road, near the forest #He drives to the forest in search of treasure.\\n[NC]A tiger appeared in the forest, at night \\nvery frightened, open mouth, in the forest, at night\\nrunning very fast, in the forest, at night\\n[NC] A house in the forest, at night #Suddenly, he discovers the treasure house!\\nin the house filled with  treasure, laughing, at night #He is overjoyed inside the house.",
      "style_strength_ratio": 20,
      "character_description": "a man, wearing black suit"
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

Output

comic

individual_images

{
  "completed_at": "2024-05-04T23:24:09.982389Z",
  "created_at": "2024-05-04T23:23:26.107000Z",
  "data_removed": false,
  "error": null,
  "id": "m69pbmv63drgm0cf8qnskc27nr",
  "input": {
    "num_ids": 3,
    "sd_model": "Unstable",
    "num_steps": 25,
    "style_name": "Japanese Anime",
    "comic_style": "Classic Comic Style",
    "image_width": 768,
    "image_height": 768,
    "sa32_setting": 0.5,
    "sa64_setting": 0.5,
    "output_format": "webp",
    "guidance_scale": 5,
    "output_quality": 80,
    "negative_prompt": "bad anatomy, bad hands, missing fingers, extra fingers, three hands, three legs, bad arms, missing legs, missing arms, poorly drawn face, bad face, fused face, cloned face, three crus, fused feet, fused thigh, extra crus, ugly fingers, horn, cartoon, cg, 3d, unreal, animate, amputation, disconnected limbs",
    "comic_description": "at home, read new paper #at home, The newspaper says there is a treasure house in the forest.\non the road, near the forest\n[NC] The car on the road, near the forest #He drives to the forest in search of treasure.\n[NC]A tiger appeared in the forest, at night \nvery frightened, open mouth, in the forest, at night\nrunning very fast, in the forest, at night\n[NC] A house in the forest, at night #Suddenly, he discovers the treasure house!\nin the house filled with  treasure, laughing, at night #He is overjoyed inside the house.",
    "style_strength_ratio": 20,
    "character_description": "a man, wearing black suit"
  },
  "logs": "['at home, read new paper #at home, The newspaper says there is a treasure house in the forest.', 'on the road, near the forest', '[NC] The car on the road, near the forest #He drives to the forest in search of treasure.', '[NC]A tiger appeared in the forest, at night ', 'very frightened, open mouth, in the forest, at night', 'running very fast, in the forest, at night', '[NC] A house in the forest, at night #Suddenly, he discovers the treasure house!', 'in the house filled with  treasure, laughing, at night #He is overjoyed inside the house.']\n['a man, wearing black suit,at home, read new paper #at home, The newspaper says there is a treasure house in the forest.', 'a man, wearing black suit,on the road, near the forest', ' The car on the road, near the forest #He drives to the forest in search of treasure.', 'A tiger appeared in the forest, at night ', 'a man, wearing black suit,very frightened, open mouth, in the forest, at night', 'a man, wearing black suit,running very fast, in the forest, at night', ' A house in the forest, at night #Suddenly, he discovers the treasure house!', 'a man, wearing black suit,in the house filled with  treasure, laughing, at night #He is overjoyed inside the house.']\n['a man, wearing black suit,at home, read new paper', 'a man, wearing black suit,on the road, near the forest', 'The car on the road, near the forest', 'A tiger appeared in the forest, at night', 'a man, wearing black suit,very frightened, open mouth, in the forest, at night', 'a man, wearing black suit,running very fast, in the forest, at night', 'A house in the forest, at night', 'a man, wearing black suit,in the house filled with  treasure, laughing, at night']\nUsing seed: 58753\nSuccessfully load paired self-attention\nNumber of the processor : 36\n  0%|          | 0/25 [00:00<?, ?it/s]\n  4%|▍         | 1/25 [00:00<00:07,  3.11it/s]\n  8%|▊         | 2/25 [00:00<00:07,  3.10it/s]\n 12%|█▏        | 3/25 [00:00<00:07,  3.09it/s]\n 16%|█▌        | 4/25 [00:01<00:06,  3.08it/s]\n 20%|██        | 5/25 [00:01<00:06,  3.08it/s]\n 24%|██▍       | 6/25 [00:02<00:07,  2.60it/s]\n 28%|██▊       | 7/25 [00:02<00:07,  2.35it/s]\n 32%|███▏      | 8/25 [00:03<00:07,  2.23it/s]\n 36%|███▌      | 9/25 [00:03<00:07,  2.23it/s]\n 40%|████      | 10/25 [00:04<00:06,  2.19it/s]\n 44%|████▍     | 11/25 [00:04<00:06,  2.13it/s]\n 48%|████▊     | 12/25 [00:05<00:06,  2.12it/s]\n 52%|█████▏    | 13/25 [00:05<00:05,  2.04it/s]\n 56%|█████▌    | 14/25 [00:06<00:05,  2.01it/s]\n 60%|██████    | 15/25 [00:06<00:05,  1.97it/s]\n 64%|██████▍   | 16/25 [00:07<00:04,  2.00it/s]\n 68%|██████▊   | 17/25 [00:07<00:03,  2.06it/s]\n 72%|███████▏  | 18/25 [00:08<00:03,  2.05it/s]\n 76%|███████▌  | 19/25 [00:08<00:02,  2.06it/s]\n 80%|████████  | 20/25 [00:08<00:02,  2.08it/s]\n 84%|████████▍ | 21/25 [00:09<00:01,  2.03it/s]\n 88%|████████▊ | 22/25 [00:10<00:01,  1.97it/s]\n 92%|█████████▏| 23/25 [00:10<00:01,  1.96it/s]\n 96%|█████████▌| 24/25 [00:11<00:00,  1.97it/s]\n100%|██████████| 25/25 [00:11<00:00,  1.93it/s]\n100%|██████████| 25/25 [00:11<00:00,  2.15it/s]\n  0%|          | 0/25 [00:00<?, ?it/s]\n  4%|▍         | 1/25 [00:00<00:03,  6.46it/s]\n  8%|▊         | 2/25 [00:00<00:03,  6.44it/s]\n 12%|█▏        | 3/25 [00:00<00:03,  6.46it/s]\n 16%|█▌        | 4/25 [00:00<00:03,  6.45it/s]\n 20%|██        | 5/25 [00:00<00:03,  6.44it/s]\n 24%|██▍       | 6/25 [00:00<00:03,  5.70it/s]\n 28%|██▊       | 7/25 [00:01<00:03,  5.33it/s]\n 32%|███▏      | 8/25 [00:01<00:03,  5.27it/s]\n 36%|███▌      | 9/25 [00:01<00:03,  5.09it/s]\n 40%|████      | 10/25 [00:01<00:02,  5.08it/s]\n 44%|████▍     | 11/25 [00:02<00:02,  4.98it/s]\n 48%|████▊     | 12/25 [00:02<00:02,  5.07it/s]\n 52%|█████▏    | 13/25 [00:02<00:02,  4.97it/s]\n 56%|█████▌    | 14/25 [00:02<00:02,  4.78it/s]\n 60%|██████    | 15/25 [00:02<00:02,  4.86it/s]\n 64%|██████▍   | 16/25 [00:03<00:01,  4.80it/s]\n 68%|██████▊   | 17/25 [00:03<00:01,  4.79it/s]\n 72%|███████▏  | 18/25 [00:03<00:01,  4.94it/s]\n 76%|███████▌  | 19/25 [00:03<00:01,  4.89it/s]\n 80%|████████  | 20/25 [00:03<00:01,  4.98it/s]\n 84%|████████▍ | 21/25 [00:04<00:00,  4.73it/s]\n 88%|████████▊ | 22/25 [00:04<00:00,  4.59it/s]\n 92%|█████████▏| 23/25 [00:04<00:00,  4.58it/s]\n 96%|█████████▌| 24/25 [00:04<00:00,  4.48it/s]\n100%|██████████| 25/25 [00:05<00:00,  4.48it/s]\n100%|██████████| 25/25 [00:05<00:00,  5.00it/s]\n  0%|          | 0/25 [00:00<?, ?it/s]\n  4%|▍         | 1/25 [00:00<00:03,  6.51it/s]\n  8%|▊         | 2/25 [00:00<00:03,  6.48it/s]\n 12%|█▏        | 3/25 [00:00<00:03,  6.44it/s]\n 16%|█▌        | 4/25 [00:00<00:03,  6.42it/s]\n 20%|██        | 5/25 [00:00<00:03,  6.42it/s]\n 24%|██▍       | 6/25 [00:00<00:03,  5.99it/s]\n 28%|██▊       | 7/25 [00:01<00:03,  5.59it/s]\n 32%|███▏      | 8/25 [00:01<00:03,  5.15it/s]\n 36%|███▌      | 9/25 [00:01<00:03,  5.12it/s]\n 40%|████      | 10/25 [00:01<00:02,  5.13it/s]\n 44%|████▍     | 11/25 [00:01<00:02,  5.06it/s]\n 48%|████▊     | 12/25 [00:02<00:02,  4.99it/s]\n 52%|█████▏    | 13/25 [00:02<00:02,  4.95it/s]\n 56%|█████▌    | 14/25 [00:02<00:02,  4.94it/s]\n 60%|██████    | 15/25 [00:02<00:02,  4.89it/s]\n 64%|██████▍   | 16/25 [00:03<00:01,  4.82it/s]\n 68%|██████▊   | 17/25 [00:03<00:01,  4.95it/s]\n 72%|███████▏  | 18/25 [00:03<00:01,  4.89it/s]\n 76%|███████▌  | 19/25 [00:03<00:01,  4.73it/s]\n 80%|████████  | 20/25 [00:03<00:01,  4.67it/s]\n 84%|████████▍ | 21/25 [00:04<00:00,  4.54it/s]\n 88%|████████▊ | 22/25 [00:04<00:00,  4.44it/s]\n 92%|█████████▏| 23/25 [00:04<00:00,  4.51it/s]\n 96%|█████████▌| 24/25 [00:04<00:00,  4.44it/s]\n100%|██████████| 25/25 [00:05<00:00,  4.39it/s]\n100%|██████████| 25/25 [00:05<00:00,  4.97it/s]\n  0%|          | 0/25 [00:00<?, ?it/s]\n  4%|▍         | 1/25 [00:00<00:03,  6.52it/s]\n  8%|▊         | 2/25 [00:00<00:03,  6.46it/s]\n 12%|█▏        | 3/25 [00:00<00:03,  6.43it/s]\n 16%|█▌        | 4/25 [00:00<00:03,  6.42it/s]\n 20%|██        | 5/25 [00:00<00:03,  6.42it/s]\n 24%|██▍       | 6/25 [00:00<00:03,  5.95it/s]\n 28%|██▊       | 7/25 [00:01<00:03,  5.50it/s]\n 32%|███▏      | 8/25 [00:01<00:03,  5.28it/s]\n 36%|███▌      | 9/25 [00:01<00:02,  5.35it/s]\n 40%|████      | 10/25 [00:01<00:02,  5.02it/s]\n 44%|████▍     | 11/25 [00:02<00:02,  4.83it/s]\n 48%|████▊     | 12/25 [00:02<00:02,  4.83it/s]\n 52%|█████▏    | 13/25 [00:02<00:02,  4.89it/s]\n 56%|█████▌    | 14/25 [00:02<00:02,  4.90it/s]\n 60%|██████    | 15/25 [00:02<00:02,  4.77it/s]\n 64%|██████▍   | 16/25 [00:03<00:01,  4.72it/s]\n 68%|██████▊   | 17/25 [00:03<00:01,  4.94it/s]\n 72%|███████▏  | 18/25 [00:03<00:01,  4.94it/s]\n 76%|███████▌  | 19/25 [00:03<00:01,  4.78it/s]\n 80%|████████  | 20/25 [00:03<00:01,  4.66it/s]\n 84%|████████▍ | 21/25 [00:04<00:00,  4.62it/s]\n 88%|████████▊ | 22/25 [00:04<00:00,  4.52it/s]\n 92%|█████████▏| 23/25 [00:04<00:00,  4.42it/s]\n 96%|█████████▌| 24/25 [00:04<00:00,  4.39it/s]\n100%|██████████| 25/25 [00:05<00:00,  4.36it/s]\n100%|██████████| 25/25 [00:05<00:00,  4.94it/s]\n  0%|          | 0/25 [00:00<?, ?it/s]\n  4%|▍         | 1/25 [00:00<00:03,  6.49it/s]\n  8%|▊         | 2/25 [00:00<00:03,  6.43it/s]\n 12%|█▏        | 3/25 [00:00<00:03,  6.42it/s]\n 16%|█▌        | 4/25 [00:00<00:03,  6.42it/s]\n 20%|██        | 5/25 [00:00<00:03,  6.42it/s]\n 24%|██▍       | 6/25 [00:00<00:03,  5.68it/s]\n 28%|██▊       | 7/25 [00:01<00:03,  5.58it/s]\n 32%|███▏      | 8/25 [00:01<00:03,  5.31it/s]\n 36%|███▌      | 9/25 [00:01<00:03,  5.07it/s]\n 40%|████      | 10/25 [00:01<00:02,  5.06it/s]\n 44%|████▍     | 11/25 [00:01<00:02,  5.14it/s]\n 48%|████▊     | 12/25 [00:02<00:02,  5.06it/s]\n 52%|█████▏    | 13/25 [00:02<00:02,  5.08it/s]\n 56%|█████▌    | 14/25 [00:02<00:02,  5.08it/s]\n 60%|██████    | 15/25 [00:02<00:02,  4.97it/s]\n 64%|██████▍   | 16/25 [00:02<00:01,  4.97it/s]\n 68%|██████▊   | 17/25 [00:03<00:01,  4.83it/s]\n 72%|███████▏  | 18/25 [00:03<00:01,  4.72it/s]\n 76%|███████▌  | 19/25 [00:03<00:01,  4.70it/s]\n 80%|████████  | 20/25 [00:03<00:01,  4.76it/s]\n 84%|████████▍ | 21/25 [00:04<00:00,  4.64it/s]\n 88%|████████▊ | 22/25 [00:04<00:00,  4.60it/s]\n 92%|█████████▏| 23/25 [00:04<00:00,  4.54it/s]\n 96%|█████████▌| 24/25 [00:04<00:00,  4.49it/s]\n100%|██████████| 25/25 [00:04<00:00,  4.59it/s]\n100%|██████████| 25/25 [00:04<00:00,  5.03it/s]\n  0%|          | 0/25 [00:00<?, ?it/s]\n  4%|▍         | 1/25 [00:00<00:03,  6.48it/s]\n  8%|▊         | 2/25 [00:00<00:03,  6.44it/s]\n 12%|█▏        | 3/25 [00:00<00:03,  6.42it/s]\n 16%|█▌        | 4/25 [00:00<00:03,  6.42it/s]\n 20%|██        | 5/25 [00:00<00:03,  6.41it/s]\n 24%|██▍       | 6/25 [00:00<00:03,  5.72it/s]\n 28%|██▊       | 7/25 [00:01<00:03,  5.46it/s]\n 32%|███▏      | 8/25 [00:01<00:03,  5.19it/s]\n 36%|███▌      | 9/25 [00:01<00:03,  5.16it/s]\n 40%|████      | 10/25 [00:01<00:02,  5.12it/s]\n 44%|████▍     | 11/25 [00:01<00:02,  5.31it/s]\n 48%|████▊     | 12/25 [00:02<00:02,  5.03it/s]\n 52%|█████▏    | 13/25 [00:02<00:02,  5.00it/s]\n 56%|█████▌    | 14/25 [00:02<00:02,  4.92it/s]\n 60%|██████    | 15/25 [00:02<00:02,  4.98it/s]\n 64%|██████▍   | 16/25 [00:03<00:01,  4.95it/s]\n 68%|██████▊   | 17/25 [00:03<00:01,  4.84it/s]\n 72%|███████▏  | 18/25 [00:03<00:01,  4.75it/s]\n 76%|███████▌  | 19/25 [00:03<00:01,  4.83it/s]\n 80%|████████  | 20/25 [00:03<00:01,  4.96it/s]\n 84%|████████▍ | 21/25 [00:04<00:00,  4.75it/s]\n 88%|████████▊ | 22/25 [00:04<00:00,  4.58it/s]\n 92%|█████████▏| 23/25 [00:04<00:00,  4.55it/s]\n 96%|█████████▌| 24/25 [00:04<00:00,  4.52it/s]\n100%|██████████| 25/25 [00:04<00:00,  4.47it/s]\n100%|██████████| 25/25 [00:04<00:00,  5.02it/s]\n4 [[<PIL.Image.Image image mode=RGB size=788x788 at 0x7F85B45EB850>, <PIL.Image.Image image mode=RGB size=788x788 at 0x7F85B436F5D0>, <PIL.Image.Image image mode=RGB size=788x788 at 0x7F85A5916450>, <PIL.Image.Image image mode=RGB size=788x788 at 0x7F85A5916C10>]]\n0 [[<PIL.Image.Image image mode=RGB size=788x788 at 0x7F85B45EB850>, <PIL.Image.Image image mode=RGB size=788x788 at 0x7F85B436F5D0>, <PIL.Image.Image image mode=RGB size=788x788 at 0x7F85A5916450>, <PIL.Image.Image image mode=RGB size=788x788 at 0x7F85A5916C10>], [<PIL.Image.Image image mode=RGB size=788x788 at 0x7F85A7D82950>, <PIL.Image.Image image mode=RGB size=788x788 at 0x7F85A575A6D0>, <PIL.Image.Image image mode=RGB size=788x788 at 0x7F85A7C18790>, <PIL.Image.Image image mode=RGB size=788x788 at 0x7F85A7C18BD0>]]\n[[<PIL.Image.Image image mode=RGB size=788x788 at 0x7F85B45EB850>, <PIL.Image.Image image mode=RGB size=788x788 at 0x7F85B436F5D0>, <PIL.Image.Image image mode=RGB size=788x788 at 0x7F85A5916450>, <PIL.Image.Image image mode=RGB size=788x788 at 0x7F85A5916C10>], [<PIL.Image.Image image mode=RGB size=788x788 at 0x7F85A7D82950>, <PIL.Image.Image image mode=RGB size=788x788 at 0x7F85A575A6D0>, <PIL.Image.Image image mode=RGB size=788x788 at 0x7F85A7C18790>, <PIL.Image.Image image mode=RGB size=788x788 at 0x7F85A7C18BD0>]]\n1 (7, 650)\n0 (124, 721)\n1 (56, 636)\n0 (89, 717)",
  "metrics": {
    "predict_time": 43.83848,
    "total_time": 43.875389
  },
  "output": {
    "comic": "https://replicate.delivery/pbxt/3YTlIyPjEKrkI1aXozvAuaw8Z4fS1nVaiWbfbk78TdUWWHxSA/comic.webp",
    "individual_images": [
      "https://replicate.delivery/pbxt/5fFxtbwTjqwfeINf9ELZyH7RhndQGmeuE9JBI7jFIFq5y6IWC/out-0.webp",
      "https://replicate.delivery/pbxt/TYsTSU7VHRooMlhStPdK6W5YUmtETxo7CPcCj5QDrPOm1RsE/out-1.webp",
      "https://replicate.delivery/pbxt/acrWGHqqtqKaCpmjehzeFcghZj3YxmOieLxiTmz0Z1BxsOilA/out-2.webp",
      "https://replicate.delivery/pbxt/3PNbSoXffrnVCU0SW4SulmtTPeyWls6breoZ3dsEJdrjZdELB/out-3.webp",
      "https://replicate.delivery/pbxt/stdforbmgz10LSfOT7EJuS5yPFv5jn006HAKRJefgqIhZdELB/out-4.webp",
      "https://replicate.delivery/pbxt/QbNDf0CycEzwDKbSpscNBXLfic3bvTmWUcGX1AfdIEkysOilA/out-5.webp",
      "https://replicate.delivery/pbxt/hTuyxKCKY2JbGBnVCv2RzXOXSvknpGf2bexActYGJZDZWHxSA/out-6.webp",
      "https://replicate.delivery/pbxt/DKBlGMFJeWz0H6EOXIhtar6SMusbxN86bciM3xzS8ejZWHxSA/out-7.webp"
    ]
  },
  "started_at": "2024-05-04T23:23:26.143909Z",
  "status": "succeeded",
  "urls": {
    "get": "https://api.replicate.com/v1/predictions/m69pbmv63drgm0cf8qnskc27nr",
    "cancel": "https://api.replicate.com/v1/predictions/m69pbmv63drgm0cf8qnskc27nr/cancel"
  },
  "version": "39c85f153f00e4e9328cb3035b94559a8ec66170eb4c0618c07b16528bf20ac2"
}

Generated in

43.8 seconds

Tweak it ShareReport

['at home, read new paper #at home, The newspaper says there is a treasure house in the forest.', 'on the road, near the forest', '[NC] The car on the road, near the forest #He drives to the forest in search of treasure.', '[NC]A tiger appeared in the forest, at night ', 'very frightened, open mouth, in the forest, at night', 'running very fast, in the forest, at night', '[NC] A house in the forest, at night #Suddenly, he discovers the treasure house!', 'in the house filled with  treasure, laughing, at night #He is overjoyed inside the house.']
['a man, wearing black suit,at home, read new paper #at home, The newspaper says there is a treasure house in the forest.', 'a man, wearing black suit,on the road, near the forest', ' The car on the road, near the forest #He drives to the forest in search of treasure.', 'A tiger appeared in the forest, at night ', 'a man, wearing black suit,very frightened, open mouth, in the forest, at night', 'a man, wearing black suit,running very fast, in the forest, at night', ' A house in the forest, at night #Suddenly, he discovers the treasure house!', 'a man, wearing black suit,in the house filled with  treasure, laughing, at night #He is overjoyed inside the house.']
['a man, wearing black suit,at home, read new paper', 'a man, wearing black suit,on the road, near the forest', 'The car on the road, near the forest', 'A tiger appeared in the forest, at night', 'a man, wearing black suit,very frightened, open mouth, in the forest, at night', 'a man, wearing black suit,running very fast, in the forest, at night', 'A house in the forest, at night', 'a man, wearing black suit,in the house filled with  treasure, laughing, at night']
Using seed: 58753
Successfully load paired self-attention
Number of the processor : 36
  0%|          | 0/25 [00:00<?, ?it/s]
  4%|▍         | 1/25 [00:00<00:07,  3.11it/s]
  8%|▊         | 2/25 [00:00<00:07,  3.10it/s]
 12%|█▏        | 3/25 [00:00<00:07,  3.09it/s]
 16%|█▌        | 4/25 [00:01<00:06,  3.08it/s]
 20%|██        | 5/25 [00:01<00:06,  3.08it/s]
 24%|██▍       | 6/25 [00:02<00:07,  2.60it/s]
 28%|██▊       | 7/25 [00:02<00:07,  2.35it/s]
 32%|███▏      | 8/25 [00:03<00:07,  2.23it/s]
 36%|███▌      | 9/25 [00:03<00:07,  2.23it/s]
 40%|████      | 10/25 [00:04<00:06,  2.19it/s]
 44%|████▍     | 11/25 [00:04<00:06,  2.13it/s]
 48%|████▊     | 12/25 [00:05<00:06,  2.12it/s]
 52%|█████▏    | 13/25 [00:05<00:05,  2.04it/s]
 56%|█████▌    | 14/25 [00:06<00:05,  2.01it/s]
 60%|██████    | 15/25 [00:06<00:05,  1.97it/s]
 64%|██████▍   | 16/25 [00:07<00:04,  2.00it/s]
 68%|██████▊   | 17/25 [00:07<00:03,  2.06it/s]
 72%|███████▏  | 18/25 [00:08<00:03,  2.05it/s]
 76%|███████▌  | 19/25 [00:08<00:02,  2.06it/s]
 80%|████████  | 20/25 [00:08<00:02,  2.08it/s]
 84%|████████▍ | 21/25 [00:09<00:01,  2.03it/s]
 88%|████████▊ | 22/25 [00:10<00:01,  1.97it/s]
 92%|█████████▏| 23/25 [00:10<00:01,  1.96it/s]
 96%|█████████▌| 24/25 [00:11<00:00,  1.97it/s]
100%|██████████| 25/25 [00:11<00:00,  1.93it/s]
100%|██████████| 25/25 [00:11<00:00,  2.15it/s]
  0%|          | 0/25 [00:00<?, ?it/s]
  4%|▍         | 1/25 [00:00<00:03,  6.46it/s]
  8%|▊         | 2/25 [00:00<00:03,  6.44it/s]
 12%|█▏        | 3/25 [00:00<00:03,  6.46it/s]
 16%|█▌        | 4/25 [00:00<00:03,  6.45it/s]
 20%|██        | 5/25 [00:00<00:03,  6.44it/s]
 24%|██▍       | 6/25 [00:00<00:03,  5.70it/s]
 28%|██▊       | 7/25 [00:01<00:03,  5.33it/s]
 32%|███▏      | 8/25 [00:01<00:03,  5.27it/s]
 36%|███▌      | 9/25 [00:01<00:03,  5.09it/s]
 40%|████      | 10/25 [00:01<00:02,  5.08it/s]
 44%|████▍     | 11/25 [00:02<00:02,  4.98it/s]
 48%|████▊     | 12/25 [00:02<00:02,  5.07it/s]
 52%|█████▏    | 13/25 [00:02<00:02,  4.97it/s]
 56%|█████▌    | 14/25 [00:02<00:02,  4.78it/s]
 60%|██████    | 15/25 [00:02<00:02,  4.86it/s]
 64%|██████▍   | 16/25 [00:03<00:01,  4.80it/s]
 68%|██████▊   | 17/25 [00:03<00:01,  4.79it/s]
 72%|███████▏  | 18/25 [00:03<00:01,  4.94it/s]
 76%|███████▌  | 19/25 [00:03<00:01,  4.89it/s]
 80%|████████  | 20/25 [00:03<00:01,  4.98it/s]
 84%|████████▍ | 21/25 [00:04<00:00,  4.73it/s]
 88%|████████▊ | 22/25 [00:04<00:00,  4.59it/s]
 92%|█████████▏| 23/25 [00:04<00:00,  4.58it/s]
 96%|█████████▌| 24/25 [00:04<00:00,  4.48it/s]
100%|██████████| 25/25 [00:05<00:00,  4.48it/s]
100%|██████████| 25/25 [00:05<00:00,  5.00it/s]
  0%|          | 0/25 [00:00<?, ?it/s]
  4%|▍         | 1/25 [00:00<00:03,  6.51it/s]
  8%|▊         | 2/25 [00:00<00:03,  6.48it/s]
 12%|█▏        | 3/25 [00:00<00:03,  6.44it/s]
 16%|█▌        | 4/25 [00:00<00:03,  6.42it/s]
 20%|██        | 5/25 [00:00<00:03,  6.42it/s]
 24%|██▍       | 6/25 [00:00<00:03,  5.99it/s]
 28%|██▊       | 7/25 [00:01<00:03,  5.59it/s]
 32%|███▏      | 8/25 [00:01<00:03,  5.15it/s]
 36%|███▌      | 9/25 [00:01<00:03,  5.12it/s]
 40%|████      | 10/25 [00:01<00:02,  5.13it/s]
 44%|████▍     | 11/25 [00:01<00:02,  5.06it/s]
 48%|████▊     | 12/25 [00:02<00:02,  4.99it/s]
 52%|█████▏    | 13/25 [00:02<00:02,  4.95it/s]
 56%|█████▌    | 14/25 [00:02<00:02,  4.94it/s]
 60%|██████    | 15/25 [00:02<00:02,  4.89it/s]
 64%|██████▍   | 16/25 [00:03<00:01,  4.82it/s]
 68%|██████▊   | 17/25 [00:03<00:01,  4.95it/s]
 72%|███████▏  | 18/25 [00:03<00:01,  4.89it/s]
 76%|███████▌  | 19/25 [00:03<00:01,  4.73it/s]
 80%|████████  | 20/25 [00:03<00:01,  4.67it/s]
 84%|████████▍ | 21/25 [00:04<00:00,  4.54it/s]
 88%|████████▊ | 22/25 [00:04<00:00,  4.44it/s]
 92%|█████████▏| 23/25 [00:04<00:00,  4.51it/s]
 96%|█████████▌| 24/25 [00:04<00:00,  4.44it/s]
100%|██████████| 25/25 [00:05<00:00,  4.39it/s]
100%|██████████| 25/25 [00:05<00:00,  4.97it/s]
  0%|          | 0/25 [00:00<?, ?it/s]
  4%|▍         | 1/25 [00:00<00:03,  6.52it/s]
  8%|▊         | 2/25 [00:00<00:03,  6.46it/s]
 12%|█▏        | 3/25 [00:00<00:03,  6.43it/s]
 16%|█▌        | 4/25 [00:00<00:03,  6.42it/s]
 20%|██        | 5/25 [00:00<00:03,  6.42it/s]
 24%|██▍       | 6/25 [00:00<00:03,  5.95it/s]
 28%|██▊       | 7/25 [00:01<00:03,  5.50it/s]
 32%|███▏      | 8/25 [00:01<00:03,  5.28it/s]
 36%|███▌      | 9/25 [00:01<00:02,  5.35it/s]
 40%|████      | 10/25 [00:01<00:02,  5.02it/s]
 44%|████▍     | 11/25 [00:02<00:02,  4.83it/s]
 48%|████▊     | 12/25 [00:02<00:02,  4.83it/s]
 52%|█████▏    | 13/25 [00:02<00:02,  4.89it/s]
 56%|█████▌    | 14/25 [00:02<00:02,  4.90it/s]
 60%|██████    | 15/25 [00:02<00:02,  4.77it/s]
 64%|██████▍   | 16/25 [00:03<00:01,  4.72it/s]
 68%|██████▊   | 17/25 [00:03<00:01,  4.94it/s]
 72%|███████▏  | 18/25 [00:03<00:01,  4.94it/s]
 76%|███████▌  | 19/25 [00:03<00:01,  4.78it/s]
 80%|████████  | 20/25 [00:03<00:01,  4.66it/s]
 84%|████████▍ | 21/25 [00:04<00:00,  4.62it/s]
 88%|████████▊ | 22/25 [00:04<00:00,  4.52it/s]
 92%|█████████▏| 23/25 [00:04<00:00,  4.42it/s]
 96%|█████████▌| 24/25 [00:04<00:00,  4.39it/s]
100%|██████████| 25/25 [00:05<00:00,  4.36it/s]
100%|██████████| 25/25 [00:05<00:00,  4.94it/s]
  0%|          | 0/25 [00:00<?, ?it/s]
  4%|▍         | 1/25 [00:00<00:03,  6.49it/s]
  8%|▊         | 2/25 [00:00<00:03,  6.43it/s]
 12%|█▏        | 3/25 [00:00<00:03,  6.42it/s]
 16%|█▌        | 4/25 [00:00<00:03,  6.42it/s]
 20%|██        | 5/25 [00:00<00:03,  6.42it/s]
 24%|██▍       | 6/25 [00:00<00:03,  5.68it/s]
 28%|██▊       | 7/25 [00:01<00:03,  5.58it/s]
 32%|███▏      | 8/25 [00:01<00:03,  5.31it/s]
 36%|███▌      | 9/25 [00:01<00:03,  5.07it/s]
 40%|████      | 10/25 [00:01<00:02,  5.06it/s]
 44%|████▍     | 11/25 [00:01<00:02,  5.14it/s]
 48%|████▊     | 12/25 [00:02<00:02,  5.06it/s]
 52%|█████▏    | 13/25 [00:02<00:02,  5.08it/s]
 56%|█████▌    | 14/25 [00:02<00:02,  5.08it/s]
 60%|██████    | 15/25 [00:02<00:02,  4.97it/s]
 64%|██████▍   | 16/25 [00:02<00:01,  4.97it/s]
 68%|██████▊   | 17/25 [00:03<00:01,  4.83it/s]
 72%|███████▏  | 18/25 [00:03<00:01,  4.72it/s]
 76%|███████▌  | 19/25 [00:03<00:01,  4.70it/s]
 80%|████████  | 20/25 [00:03<00:01,  4.76it/s]
 84%|████████▍ | 21/25 [00:04<00:00,  4.64it/s]
 88%|████████▊ | 22/25 [00:04<00:00,  4.60it/s]
 92%|█████████▏| 23/25 [00:04<00:00,  4.54it/s]
 96%|█████████▌| 24/25 [00:04<00:00,  4.49it/s]
100%|██████████| 25/25 [00:04<00:00,  4.59it/s]
100%|██████████| 25/25 [00:04<00:00,  5.03it/s]
  0%|          | 0/25 [00:00<?, ?it/s]
  4%|▍         | 1/25 [00:00<00:03,  6.48it/s]
  8%|▊         | 2/25 [00:00<00:03,  6.44it/s]
 12%|█▏        | 3/25 [00:00<00:03,  6.42it/s]
 16%|█▌        | 4/25 [00:00<00:03,  6.42it/s]
 20%|██        | 5/25 [00:00<00:03,  6.41it/s]
 24%|██▍       | 6/25 [00:00<00:03,  5.72it/s]
 28%|██▊       | 7/25 [00:01<00:03,  5.46it/s]
 32%|███▏      | 8/25 [00:01<00:03,  5.19it/s]
 36%|███▌      | 9/25 [00:01<00:03,  5.16it/s]
 40%|████      | 10/25 [00:01<00:02,  5.12it/s]
 44%|████▍     | 11/25 [00:01<00:02,  5.31it/s]
 48%|████▊     | 12/25 [00:02<00:02,  5.03it/s]
 52%|█████▏    | 13/25 [00:02<00:02,  5.00it/s]
 56%|█████▌    | 14/25 [00:02<00:02,  4.92it/s]
 60%|██████    | 15/25 [00:02<00:02,  4.98it/s]
 64%|██████▍   | 16/25 [00:03<00:01,  4.95it/s]
 68%|██████▊   | 17/25 [00:03<00:01,  4.84it/s]
 72%|███████▏  | 18/25 [00:03<00:01,  4.75it/s]
 76%|███████▌  | 19/25 [00:03<00:01,  4.83it/s]
 80%|████████  | 20/25 [00:03<00:01,  4.96it/s]
 84%|████████▍ | 21/25 [00:04<00:00,  4.75it/s]
 88%|████████▊ | 22/25 [00:04<00:00,  4.58it/s]
 92%|█████████▏| 23/25 [00:04<00:00,  4.55it/s]
 96%|█████████▌| 24/25 [00:04<00:00,  4.52it/s]
100%|██████████| 25/25 [00:04<00:00,  4.47it/s]
100%|██████████| 25/25 [00:04<00:00,  5.02it/s]
4 [[<PIL.Image.Image image mode=RGB size=788x788 at 0x7F85B45EB850>, <PIL.Image.Image image mode=RGB size=788x788 at 0x7F85B436F5D0>, <PIL.Image.Image image mode=RGB size=788x788 at 0x7F85A5916450>, <PIL.Image.Image image mode=RGB size=788x788 at 0x7F85A5916C10>]]
0 [[<PIL.Image.Image image mode=RGB size=788x788 at 0x7F85B45EB850>, <PIL.Image.Image image mode=RGB size=788x788 at 0x7F85B436F5D0>, <PIL.Image.Image image mode=RGB size=788x788 at 0x7F85A5916450>, <PIL.Image.Image image mode=RGB size=788x788 at 0x7F85A5916C10>], [<PIL.Image.Image image mode=RGB size=788x788 at 0x7F85A7D82950>, <PIL.Image.Image image mode=RGB size=788x788 at 0x7F85A575A6D0>, <PIL.Image.Image image mode=RGB size=788x788 at 0x7F85A7C18790>, <PIL.Image.Image image mode=RGB size=788x788 at 0x7F85A7C18BD0>]]
[[<PIL.Image.Image image mode=RGB size=788x788 at 0x7F85B45EB850>, <PIL.Image.Image image mode=RGB size=788x788 at 0x7F85B436F5D0>, <PIL.Image.Image image mode=RGB size=788x788 at 0x7F85A5916450>, <PIL.Image.Image image mode=RGB size=788x788 at 0x7F85A5916C10>], [<PIL.Image.Image image mode=RGB size=788x788 at 0x7F85A7D82950>, <PIL.Image.Image image mode=RGB size=788x788 at 0x7F85A575A6D0>, <PIL.Image.Image image mode=RGB size=788x788 at 0x7F85A7C18790>, <PIL.Image.Image image mode=RGB size=788x788 at 0x7F85A7C18BD0>]]
1 (7, 650)
0 (124, 721)
1 (56, 636)
0 (89, 717)

Examples

View more examples

Run time and cost

This model costs approximately $0.069 to run on Replicate, or 14 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia L40S GPU hardware. Predictions typically complete within 71 seconds. The predict time for this model varies significantly based on the inputs.

Readme

Demo Video

https://github.com/HVision-NKU/StoryDiffusion/assets/49511209/d5b80f8f-09b0-48cd-8b10-daff46d422af

🌠 Key Features:

StoryDiffusion can create a magic story by generating consistent images and videos. Our work mainly has two parts: 1. Consistent self-attention for character-consistent image generation over long-range sequences. It is hot-pluggable and compatible with all SD1.5 and SDXL-based image diffusion models. For the current implementation, the user needs to provide at least 3 text prompts for the consistent self-attention module. We recommend at least 5 - 6 text prompts for better layout arrangement. 2. Motion predictor for long-range video generation, which predicts motion between Condition Images in a compressed image semantic space, achieving larger motion prediction.

Disclaimer

This project strives to impact the domain of AI-driven image and video generation positively. Users are granted the freedom to create images and videos using this tool, but they are expected to comply with local laws and utilize it responsibly. The developers do not assume any responsibility for potential misuse by users.

BibTeX

If you find StoryDiffusion useful for your research and applications, please cite using this BibTeX:

@article{Zhou2024storydiffusion,
  title={StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation},
  author={Zhou, Yupeng and Zhou, Daquan and Cheng, Ming-Ming and Feng, Jiashi and Hou, Qibin},
  year={2024}
}