adirik/syncdiffusion – Run with an API on Replicate

adirik / syncdiffusion

Generate panoramic images with text prompts

Cold

Public
121 runs
L40S
GitHub
Paper
License

Iterate in playground

Run with an API

Playground API Examples README Versions

Input

prompt

*string

Shift + Return to add a new line

natural landscape in anime style illustrationnatural landscape in anime style illustration

Prompt to generate from

negative_prompt

string

Shift + Return to add a new line

Prompt for negative conditioning

Default: ""

loop_closure

boolean

Use loop closure. Enable it to simulate a 360 degree panorama

Default: false

width

integer

(minimum: 512, maximum: 3072)

Width of the output image. Height must be divisible by the stride multiplied by 8

Default: 2048

height

integer

(minimum: 512, maximum: 3072)

Height of the output image. Width must be divisible by the stride multiplied by 8

Default: 512

stride

integer

(minimum: 8, maximum: 64)

Window stride for the diffusion. Must be divisible by 8

Default: 16

guidance_scale

number

(minimum: 0, maximum: 20)

Scale of the guidance image

Default: 7.5

sync_weight

number

(minimum: 0, maximum: 30)

Weight of the SyncDiffusion

Default: 20

sync_decay_rate

number

(minimum: 0, maximum: 1)

SyncDiffusion weight scheduler decay rate

Default: 0.99

sync_freq

integer

Frequency for the SyncDiffusion

Default: 1

sync_threshold

integer

(minimum: 0, maximum: 50)

Maximum number of steps applied with SyncDiffusion

Default: 5

num_inference_steps

integer

(minimum: 1, maximum: 200)

Number of inference steps for the diffusion

Default: 50

seed

integer

(minimum: 0, maximum: 65535)

Seed for the SyncDiffusion. Leave blank to randomize the seed

Run this model in Node.js with one line of code:

npx create-replicate --model=adirik/syncdiffusion

or set up a project from scratch

Install Replicate’s Node.js client library:

npm install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import and set up the client:

import Replicate from "replicate";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run adirik/syncdiffusion using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "adirik/syncdiffusion:f430b8f8c80d9b6edc79e659adcb1f7d270127db3e9f1dc47e6be3a5205e59eb",
  {
    input: {
      seed: 2,
      width: 2048,
      height: 512,
      prompt: "natural landscape in anime style illustration",
      stride: 16,
      sync_freq: 1,
      sync_weight: 20,
      loop_closure: false,
      guidance_scale: 7.5,
      sync_threshold: 5,
      negative_prompt: "",
      sync_decay_rate: 0.99,
      num_inference_steps: 50
    }
  }
);

// To access the file URL:
console.log(output.url()); //=> "http://example.com"

// To write the file to disk:
fs.writeFile("my-image.png", output);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import the client:

import replicate

Run adirik/syncdiffusion using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "adirik/syncdiffusion:f430b8f8c80d9b6edc79e659adcb1f7d270127db3e9f1dc47e6be3a5205e59eb",
    input={
        "seed": 2,
        "width": 2048,
        "height": 512,
        "prompt": "natural landscape in anime style illustration",
        "stride": 16,
        "sync_freq": 1,
        "sync_weight": 20,
        "loop_closure": False,
        "guidance_scale": 7.5,
        "sync_threshold": 5,
        "negative_prompt": "",
        "sync_decay_rate": 0.99,
        "num_inference_steps": 50
    }
)
print(output)

To learn more, take a look at the guide on getting started with Python.

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Run adirik/syncdiffusion using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "adirik/syncdiffusion:f430b8f8c80d9b6edc79e659adcb1f7d270127db3e9f1dc47e6be3a5205e59eb",
    "input": {
      "seed": 2,
      "width": 2048,
      "height": 512,
      "prompt": "natural landscape in anime style illustration",
      "stride": 16,
      "sync_freq": 1,
      "sync_weight": 20,
      "loop_closure": false,
      "guidance_scale": 7.5,
      "sync_threshold": 5,
      "negative_prompt": "",
      "sync_decay_rate": 0.99,
      "num_inference_steps": 50
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

Output

{
  "completed_at": "2024-01-29T10:12:49.546770Z",
  "created_at": "2024-01-29T10:09:30.073441Z",
  "data_removed": false,
  "error": null,
  "id": "zpkqvwtbaiz7imbdnasc43uoda",
  "input": {
    "seed": 2,
    "width": 2048,
    "height": 512,
    "prompt": "natural landscape in anime style illustration",
    "stride": 16,
    "sync_freq": 1,
    "sync_weight": 20,
    "loop_closure": false,
    "guidance_scale": 7.5,
    "sync_threshold": 5,
    "negative_prompt": "",
    "sync_decay_rate": 0.99,
    "num_inference_steps": 50
  },
  "logs": "[INFO] number of views to process: 13\n/src/syncdiffusion/syncdiffusion_model.py:150: FutureWarning: Accessing config attribute `in_channels` directly via 'UNet2DConditionModel' object attribute is deprecated. Please access 'in_channels' over 'UNet2DConditionModel's config object instead, e.g. 'unet.config.in_channels'.\nlatent = torch.randn((1, self.unet.in_channels, height // 8, width // 8))\n[INFO] using exponential decay scheduler with decay rate 0.99\n  0%|          | 0/50 [00:00<?, ?it/s]\n  2%|▏         | 1/50 [00:06<05:04,  6.21s/it]\n  4%|▍         | 2/50 [00:11<04:34,  5.72s/it]\n  6%|▌         | 3/50 [00:16<04:21,  5.56s/it]\n  8%|▊         | 4/50 [00:22<04:12,  5.48s/it]\n 10%|█         | 5/50 [00:27<04:04,  5.44s/it]\n 12%|█▏        | 6/50 [00:28<02:52,  3.91s/it]\n 14%|█▍        | 7/50 [00:29<02:06,  2.94s/it]\n 16%|█▌        | 8/50 [00:30<01:36,  2.30s/it]\n 18%|█▊        | 9/50 [00:31<01:16,  1.88s/it]\n 20%|██        | 10/50 [00:32<01:03,  1.59s/it]\n 22%|██▏       | 11/50 [00:33<00:54,  1.39s/it]\n 24%|██▍       | 12/50 [00:34<00:47,  1.25s/it]\n 26%|██▌       | 13/50 [00:35<00:42,  1.16s/it]\n 28%|██▊       | 14/50 [00:36<00:39,  1.09s/it]\n 30%|███       | 15/50 [00:37<00:36,  1.05s/it]\n 32%|███▏      | 16/50 [00:38<00:34,  1.02s/it]\n 34%|███▍      | 17/50 [00:38<00:32,  1.01it/s]\n 36%|███▌      | 18/50 [00:39<00:31,  1.02it/s]\n 38%|███▊      | 19/50 [00:40<00:29,  1.04it/s]\n 40%|████      | 20/50 [00:41<00:28,  1.04it/s]\n 42%|████▏     | 21/50 [00:42<00:27,  1.05it/s]\n 44%|████▍     | 22/50 [00:43<00:26,  1.05it/s]\n 46%|████▌     | 23/50 [00:44<00:25,  1.05it/s]\n 48%|████▊     | 24/50 [00:45<00:24,  1.06it/s]\n 50%|█████     | 25/50 [00:46<00:23,  1.06it/s]\n 52%|█████▏    | 26/50 [00:47<00:22,  1.06it/s]\n 54%|█████▍    | 27/50 [00:48<00:21,  1.06it/s]\n 56%|█████▌    | 28/50 [00:49<00:20,  1.06it/s]\n 58%|█████▊    | 29/50 [00:50<00:19,  1.06it/s]\n 60%|██████    | 30/50 [00:51<00:18,  1.06it/s]\n 62%|██████▏   | 31/50 [00:52<00:17,  1.06it/s]\n 64%|██████▍   | 32/50 [00:53<00:16,  1.06it/s]\n 66%|██████▌   | 33/50 [00:54<00:16,  1.06it/s]\n 68%|██████▊   | 34/50 [00:54<00:15,  1.06it/s]\n 70%|███████   | 35/50 [00:55<00:14,  1.06it/s]\n 72%|███████▏  | 36/50 [00:56<00:13,  1.06it/s]\n 74%|███████▍  | 37/50 [00:57<00:12,  1.06it/s]\n 76%|███████▌  | 38/50 [00:58<00:11,  1.06it/s]\n 78%|███████▊  | 39/50 [00:59<00:10,  1.06it/s]\n 80%|████████  | 40/50 [01:00<00:09,  1.06it/s]\n 82%|████████▏ | 41/50 [01:01<00:08,  1.06it/s]\n 84%|████████▍ | 42/50 [01:02<00:07,  1.06it/s]\n 86%|████████▌ | 43/50 [01:03<00:06,  1.06it/s]\n 88%|████████▊ | 44/50 [01:04<00:05,  1.06it/s]\n 90%|█████████ | 45/50 [01:05<00:04,  1.06it/s]\n 92%|█████████▏| 46/50 [01:06<00:03,  1.06it/s]\n 94%|█████████▍| 47/50 [01:07<00:02,  1.06it/s]\n 96%|█████████▌| 48/50 [01:08<00:01,  1.06it/s]\n 98%|█████████▊| 49/50 [01:09<00:00,  1.06it/s]\n100%|██████████| 50/50 [01:10<00:00,  1.06it/s]\n100%|██████████| 50/50 [01:10<00:00,  1.40s/it]\n[INFO] Done!",
  "metrics": {
    "predict_time": 72.613723,
    "total_time": 199.473329
  },
  "output": "https://replicate.delivery/pbxt/EArYhWvUBobJPtU3oHeRkx92iiSvH4iejFvZ1Nq55e5BhlikA/output.png",
  "started_at": "2024-01-29T10:11:36.933047Z",
  "status": "succeeded",
  "urls": {
    "get": "https://api.replicate.com/v1/predictions/zpkqvwtbaiz7imbdnasc43uoda",
    "cancel": "https://api.replicate.com/v1/predictions/zpkqvwtbaiz7imbdnasc43uoda/cancel"
  },
  "version": "a2c97d1c34b88c075e38899a38a371e2016917ed358f4b8618b32101f4897a1d"
}

Generated in

72.6 seconds

Tweak itReport

[INFO] number of views to process: 13
/src/syncdiffusion/syncdiffusion_model.py:150: FutureWarning: Accessing config attribute `in_channels` directly via 'UNet2DConditionModel' object attribute is deprecated. Please access 'in_channels' over 'UNet2DConditionModel's config object instead, e.g. 'unet.config.in_channels'.
latent = torch.randn((1, self.unet.in_channels, height // 8, width // 8))
[INFO] using exponential decay scheduler with decay rate 0.99
  0%|          | 0/50 [00:00<?, ?it/s]
  2%|▏         | 1/50 [00:06<05:04,  6.21s/it]
  4%|▍         | 2/50 [00:11<04:34,  5.72s/it]
  6%|▌         | 3/50 [00:16<04:21,  5.56s/it]
  8%|▊         | 4/50 [00:22<04:12,  5.48s/it]
 10%|█         | 5/50 [00:27<04:04,  5.44s/it]
 12%|█▏        | 6/50 [00:28<02:52,  3.91s/it]
 14%|█▍        | 7/50 [00:29<02:06,  2.94s/it]
 16%|█▌        | 8/50 [00:30<01:36,  2.30s/it]
 18%|█▊        | 9/50 [00:31<01:16,  1.88s/it]
 20%|██        | 10/50 [00:32<01:03,  1.59s/it]
 22%|██▏       | 11/50 [00:33<00:54,  1.39s/it]
 24%|██▍       | 12/50 [00:34<00:47,  1.25s/it]
 26%|██▌       | 13/50 [00:35<00:42,  1.16s/it]
 28%|██▊       | 14/50 [00:36<00:39,  1.09s/it]
 30%|███       | 15/50 [00:37<00:36,  1.05s/it]
 32%|███▏      | 16/50 [00:38<00:34,  1.02s/it]
 34%|███▍      | 17/50 [00:38<00:32,  1.01it/s]
 36%|███▌      | 18/50 [00:39<00:31,  1.02it/s]
 38%|███▊      | 19/50 [00:40<00:29,  1.04it/s]
 40%|████      | 20/50 [00:41<00:28,  1.04it/s]
 42%|████▏     | 21/50 [00:42<00:27,  1.05it/s]
 44%|████▍     | 22/50 [00:43<00:26,  1.05it/s]
 46%|████▌     | 23/50 [00:44<00:25,  1.05it/s]
 48%|████▊     | 24/50 [00:45<00:24,  1.06it/s]
 50%|█████     | 25/50 [00:46<00:23,  1.06it/s]
 52%|█████▏    | 26/50 [00:47<00:22,  1.06it/s]
 54%|█████▍    | 27/50 [00:48<00:21,  1.06it/s]
 56%|█████▌    | 28/50 [00:49<00:20,  1.06it/s]
 58%|█████▊    | 29/50 [00:50<00:19,  1.06it/s]
 60%|██████    | 30/50 [00:51<00:18,  1.06it/s]
 62%|██████▏   | 31/50 [00:52<00:17,  1.06it/s]
 64%|██████▍   | 32/50 [00:53<00:16,  1.06it/s]
 66%|██████▌   | 33/50 [00:54<00:16,  1.06it/s]
 68%|██████▊   | 34/50 [00:54<00:15,  1.06it/s]
 70%|███████   | 35/50 [00:55<00:14,  1.06it/s]
 72%|███████▏  | 36/50 [00:56<00:13,  1.06it/s]
 74%|███████▍  | 37/50 [00:57<00:12,  1.06it/s]
 76%|███████▌  | 38/50 [00:58<00:11,  1.06it/s]
 78%|███████▊  | 39/50 [00:59<00:10,  1.06it/s]
 80%|████████  | 40/50 [01:00<00:09,  1.06it/s]
 82%|████████▏ | 41/50 [01:01<00:08,  1.06it/s]
 84%|████████▍ | 42/50 [01:02<00:07,  1.06it/s]
 86%|████████▌ | 43/50 [01:03<00:06,  1.06it/s]
 88%|████████▊ | 44/50 [01:04<00:05,  1.06it/s]
 90%|█████████ | 45/50 [01:05<00:04,  1.06it/s]
 92%|█████████▏| 46/50 [01:06<00:03,  1.06it/s]
 94%|█████████▍| 47/50 [01:07<00:02,  1.06it/s]
 96%|█████████▌| 48/50 [01:08<00:01,  1.06it/s]
 98%|█████████▊| 49/50 [01:09<00:00,  1.06it/s]
100%|██████████| 50/50 [01:10<00:00,  1.06it/s]
100%|██████████| 50/50 [01:10<00:00,  1.40s/it]
[INFO] Done!

This output was created using a different version of the model, adirik/syncdiffusion:a2c97d1c.

Examples

View more examples

Run time and cost

This model costs approximately $0.19 to run on Replicate, or 5 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia L40S GPU hardware. Predictions typically complete within 4 minutes.

Readme

SyncDiffusion

SyncDiffusion, leveraging the Stable Diffusion 2.0, introduces an innovative approach to generating seamless panoramas. Unlike conventional methods that often result in disjointed montages, SyncDiffusion utilizes a unique synchronization mechanism through gradient descent and perceptual similarity loss. This technique ensures the creation of coherent and visually consistent panoramic images, effectively addressing the common issue of blending different scenes inappropriately. SyncDiffusion stands out in its ability to produce high-quality outputs that are not only true to the input prompt but also maintain overall image fidelity.

For further information and technical details, you can refer to the original project page, paper and repository.

How to use the API

To use Sync Diffusion you need to provide text prompts. You can create horizontal or vertical panoramic immages. The output file will be in .png format. The API input arguments are as follows:

prompt: Provide a descriptive prompt for the image you want to generate. This is the primary driver of the content in the generated panorama.
negative_prompt: Use this to specify what you don’t want in the image, helping to refine the results.
width: Set the width of the output image.
height: Set the height of the output image.
guidance_scale: Adjusts the scale of the guidance image. Higher values lead to more adherence to the prompt.
sync_weight: Determines the weight of the sync diffusion in the image generation process.
sync_decay_rate: Sets the weight schduler decay rate of the sync diffusion.
sync_freq: Specifies the frequency for the gradient descent of the sync diffusion process.
sync_threshold: Defines the maximum number of steps for the sync diffusion.
num_inference_steps: Sets the number of inference steps for the diffusion process.
stride: Determines the window stride in the latent space for the diffusion.
seed: Provides a seed for the sync diffusion to control randomness.
loop_closure: Enable or disable the use of loop closure in the panorama image generation.

Important Notes

SyncDiffusion is applied in the early inference steps, as determined by “sync_threshold.” Consequently, the estimated duration for generation shown in the logs (at the beginning) is significantly longer than the actual duration. This discrepancy arises because SyncDiffusion steps are more computationally intensive compared to normal diffusion steps. More SyncDiffusion steps lead to more consistent images, but they also result in longer processing times.
Generating square images (e.g., 2048x2048) with SyncDiffusion takes substantially longer compared to creating vertical or horizontal panoramic images.

References

@article{lee2023syncdiffusion, title={SyncDiffusion: Coherent Montage via Synchronized Joint Diffusions}, author={Yuseung Lee and Kunho Kim and Hyunjin Kim and Minhyuk Sung}, journal={arXiv preprint arXiv:2306.05178}, year={2023} }