zsxkib/diffbir:51ed1464 – Run with an API on Replicate

Version

You're looking at a specific version of this model. Jump to the model overview.

zsxkib /diffbir:51ed1464

Playground API

Input

input

*file

Path to the input image you want to enhance.

upscaling_model_type

string

Choose the type of model best suited for the primary content of the image: 'faces' for portraits and 'general_scenes' for everything else.

Default: "general_scenes"

restoration_model_type

string

Select the restoration model that aligns with the content of your image. This model is responsible for image restoration which removes degradations.

Default: "general_scenes"

reload_restoration_model

boolean

Reload the image restoration model (SwinIR) if set to True. This can be useful if you've updated or changed the underlying SwinIR model.

Default: false

steps

integer

(minimum: 1, maximum: 100)

The number of enhancement iterations to perform. More steps might result in a clearer image but can also introduce artifacts.

Default: 50

super_resolution_factor

integer

(minimum: 1, maximum: 4)

Factor by which the input image resolution should be increased. For instance, a factor of 4 will make the resolution 4 times greater in both height and width.

Default: 4

repeat_times

integer

(minimum: 1, maximum: 10)

Number of times the enhancement process is repeated by feeding the output back as input. This can refine the result but might also introduce over-enhancement issues.

Default: 1

disable_preprocess_model

boolean

Disables the initial preprocessing step using SwinIR. Turn this off if your input image is already of high quality and doesn't require restoration.

Default: false

tiled

boolean

Whether to use patch-based sampling. This can be useful for very large images to enhance them in smaller chunks rather than all at once.

Default: false

tile_size

integer

Size of each tile (or patch) when 'tiled' option is enabled. Determines how the image is divided during patch-based enhancement.

Default: 512

tile_stride

integer

Distance between the start of each tile when the image is divided for patch-based enhancement. A smaller stride means more overlap between tiles.

Default: 256

use_guidance

boolean

Use latent image guidance for enhancement. This can help in achieving more accurate and contextually relevant enhancements.

Default: false

guidance_scale

number

For 'general_scenes': Scale factor for the guidance mechanism. Adjusts the influence of guidance on the enhancement process.

Default: 0

guidance_time_start

integer

For 'general_scenes': Specifies when (at which step) the guidance mechanism starts influencing the enhancement.

Default: 1001

guidance_time_stop

integer

For 'general_scenes': Specifies when (at which step) the guidance mechanism stops influencing the enhancement.

Default: -1

guidance_space

string

For 'general_scenes': Determines in which space (RGB or latent) the guidance operates. 'latent' can often provide more subtle and context-aware enhancements.

Default: "latent"

guidance_repeat

integer

For 'general_scenes': Number of times the guidance process is repeated during enhancement.

Default: 5

color_fix_type

string

Method used for color correction post enhancement. 'wavelet' and 'adain' offer different styles of color correction, while 'none' skips this step.

Default: "wavelet"

seed

integer

Random seed to ensure reproducibility. Setting this ensures that multiple runs with the same input produce the same output.

Default: 231

has_aligned

boolean

For 'faces' mode: Indicates if the input images are already cropped and aligned to faces. If not, the model will attempt to do this.

Default: false

only_center_face

boolean

For 'faces' mode: If multiple faces are detected, only enhance the center-most face in the image.

Default: false

face_detection_model

string

For 'faces' mode: Model used for detecting faces in the image. Choose based on accuracy and speed preferences.

Default: "retinaface_resnet50"

background_upsampler

string

For 'faces' mode: Model used to upscale the background in images where the primary subject is a face.

Default: "RealESRGAN"

background_upsampler_tile

integer

For 'faces' mode: Size of each tile used by the background upsampler when dividing the image into patches.

Default: 400

background_upsampler_tile_stride

integer

For 'faces' mode: Distance between the start of each tile when the background is divided for upscaling. A smaller stride means more overlap between tiles.

Default: 400

Run this model in Node.js with one line of code:

npx create-replicate --model=zsxkib/diffbir

or set up a project from scratch

Install Replicate’s Node.js client library:

npm install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import and set up the client:

import Replicate from "replicate";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run zsxkib/diffbir using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "zsxkib/diffbir:51ed1464d8bbbaca811153b051d3b09ab42f0bdeb85804ae26ba323d7a66a4ac",
  {
    input: {
      seed: 231,
      input: "https://replicate.delivery/pbxt/JgdmREudlAXBDFZnIvZjfgSxwxtNd3aHk7gXHScaLGFltLGe/0427.png",
      steps: 50,
      tiled: false,
      tile_size: 512,
      has_aligned: true,
      tile_stride: 256,
      repeat_times: 1,
      use_guidance: false,
      color_fix_type: "wavelet",
      guidance_scale: 0,
      guidance_space: "latent",
      guidance_repeat: 5,
      only_center_face: false,
      guidance_time_stop: -1,
      guidance_time_start: 1001,
      background_upsampler: "RealESRGAN",
      face_detection_model: "retinaface_resnet50",
      upscaling_model_type: "faces",
      restoration_model_type: "general_scenes",
      super_resolution_factor: 1,
      disable_preprocess_model: false,
      reload_restoration_model: false,
      background_upsampler_tile: 400,
      background_upsampler_tile_stride: 400
    }
  }
);

// To access the file URL:
console.log(output[0].url()); //=> "http://example.com"

// To write the file to disk:
fs.writeFile("my-image.png", output[0]);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import the client:

import replicate

Run zsxkib/diffbir using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "zsxkib/diffbir:51ed1464d8bbbaca811153b051d3b09ab42f0bdeb85804ae26ba323d7a66a4ac",
    input={
        "seed": 231,
        "input": "https://replicate.delivery/pbxt/JgdmREudlAXBDFZnIvZjfgSxwxtNd3aHk7gXHScaLGFltLGe/0427.png",
        "steps": 50,
        "tiled": False,
        "tile_size": 512,
        "has_aligned": True,
        "tile_stride": 256,
        "repeat_times": 1,
        "use_guidance": False,
        "color_fix_type": "wavelet",
        "guidance_scale": 0,
        "guidance_space": "latent",
        "guidance_repeat": 5,
        "only_center_face": False,
        "guidance_time_stop": -1,
        "guidance_time_start": 1001,
        "background_upsampler": "RealESRGAN",
        "face_detection_model": "retinaface_resnet50",
        "upscaling_model_type": "faces",
        "restoration_model_type": "general_scenes",
        "super_resolution_factor": 1,
        "disable_preprocess_model": False,
        "reload_restoration_model": False,
        "background_upsampler_tile": 400,
        "background_upsampler_tile_stride": 400
    }
)
print(output)

To learn more, take a look at the guide on getting started with Python.

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Run zsxkib/diffbir using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "51ed1464d8bbbaca811153b051d3b09ab42f0bdeb85804ae26ba323d7a66a4ac",
    "input": {
      "seed": 231,
      "input": "https://replicate.delivery/pbxt/JgdmREudlAXBDFZnIvZjfgSxwxtNd3aHk7gXHScaLGFltLGe/0427.png",
      "steps": 50,
      "tiled": false,
      "tile_size": 512,
      "has_aligned": true,
      "tile_stride": 256,
      "repeat_times": 1,
      "use_guidance": false,
      "color_fix_type": "wavelet",
      "guidance_scale": 0,
      "guidance_space": "latent",
      "guidance_repeat": 5,
      "only_center_face": false,
      "guidance_time_stop": -1,
      "guidance_time_start": 1001,
      "background_upsampler": "RealESRGAN",
      "face_detection_model": "retinaface_resnet50",
      "upscaling_model_type": "faces",
      "restoration_model_type": "general_scenes",
      "super_resolution_factor": 1,
      "disable_preprocess_model": false,
      "reload_restoration_model": false,
      "background_upsampler_tile": 400,
      "background_upsampler_tile_stride": 400
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

Output

We were unable to load these images. Please make sure the URLs are valid.

{
  "input": "https://replicate.delivery/pbxt/JgdmREudlAXBDFZnIvZjfgSxwxtNd3aHk7gXHScaLGFltLGe/0427.png",
  "outut": "https://replicate.delivery/pbxt/tjBj5e8QUiSAHaJhYwLUV2Sb5fmmp9VuIvfb6X4fG6UCHp1GB/tmpbr7p39dy0427.png"
}

{
  "completed_at": "2023-10-12T13:19:45.432677Z",
  "created_at": "2023-10-12T13:17:41.439299Z",
  "data_removed": false,
  "error": null,
  "id": "77euyklbgcyarhaczq7uwxulai",
  "input": {
    "seed": 231,
    "input": "https://replicate.delivery/pbxt/JgdmREudlAXBDFZnIvZjfgSxwxtNd3aHk7gXHScaLGFltLGe/0427.png",
    "steps": 50,
    "tiled": false,
    "tile_size": 512,
    "has_aligned": true,
    "tile_stride": 256,
    "repeat_times": 1,
    "use_guidance": false,
    "color_fix_type": "wavelet",
    "guidance_scale": 0,
    "guidance_space": "latent",
    "guidance_repeat": 5,
    "only_center_face": false,
    "guidance_time_stop": -1,
    "guidance_time_start": 1001,
    "background_upsampler": "RealESRGAN",
    "face_detection_model": "retinaface_resnet50",
    "upscaling_model_type": "faces",
    "restoration_model_type": "general_scenes",
    "super_resolution_factor": 1,
    "disable_preprocess_model": false,
    "reload_restoration_model": false,
    "background_upsampler_tile": 400,
    "background_upsampler_tile_stride": 400
  },
  "logs": "ckptckptckpt weights/face_full_v1.ckpt\nSwitching from mode 'FULL' to 'FACE'...\nBuilding and loading 'FACE' mode model...\nControlLDM: Running in eps-prediction mode\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.\nDiffusionWrapper has 865.91 M params.\nmaking attention of type 'vanilla-xformers' with 512 in_channels\nbuilding MemoryEfficientAttnBlock with 512 in_channels...\nWorking with z of shape (1, 4, 32, 32) = 4096 dimensions.\nmaking attention of type 'vanilla-xformers' with 512 in_channels\nbuilding MemoryEfficientAttnBlock with 512 in_channels...\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.\nSetting up [LPIPS] perceptual loss: trunk [alex], v[0.1], spatial [off]\nLoading model from: /root/.pyenv/versions/3.9.18/lib/python3.9/site-packages/lpips/weights/v0.1/alex.pth\nreload swinir model from weights/general_swinir_v1.ckpt\nENABLE XFORMERS!\nModel successfully switched to 'FACE' mode.\n{'bg_tile': 400,\n'bg_tile_stride': 400,\n'bg_upsampler': 'RealESRGAN',\n'ckpt': 'weights/face_full_v1.ckpt',\n'color_fix_type': 'wavelet',\n'config': 'configs/model/cldm.yaml',\n'detection_model': 'retinaface_resnet50',\n'device': 'cuda',\n'disable_preprocess_model': False,\n'g_repeat': 5,\n'g_scale': 0.0,\n'g_space': 'latent',\n 'g_t_start': 1001,\n 'g_t_stop': -1,\n 'has_aligned': True,\n'image_size': 512,\n'input': '/tmp/tmpbr7p39dy0427.png',\n 'only_center_face': False,\n 'output': '.',\n 'reload_swinir': False,\n'repeat_times': 1,\n 'seed': 231,\n 'show_lq': False,\n 'skip_if_exist': False,\n 'sr_scale': 1,\n'steps': 50,\n 'swinir_ckpt': 'weights/general_swinir_v1.ckpt',\n'tile_size': 512,\n'tile_stride': 256,\n 'tiled': False,\n 'use_guidance': False}\nGlobal seed set to 231\n/root/.pyenv/versions/3.9.18/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=None`.\nwarnings.warn(msg)\nDownloading: \"https://github.com/xinntao/facexlib/releases/download/v0.1.0/detection_Resnet50_Final.pth\" to /root/.pyenv/versions/3.9.18/lib/python3.9/site-packages/facexlib/weights/detection_Resnet50_Final.pth\n  0%|          | 0.00/104M [00:00<?, ?B/s]\n  4%|▎         | 3.81M/104M [00:00<00:02, 39.8MB/s]\n  8%|▊         | 8.60M/104M [00:00<00:02, 45.9MB/s]\n 14%|█▎        | 14.1M/104M [00:00<00:01, 51.3MB/s]\n 20%|█▉        | 20.6M/104M [00:00<00:01, 57.8MB/s]\n 27%|██▋       | 28.1M/104M [00:00<00:01, 65.5MB/s]\n 34%|███▍      | 35.7M/104M [00:00<00:01, 70.4MB/s]\n 43%|████▎     | 45.0M/104M [00:00<00:00, 79.3MB/s]\n 53%|█████▎    | 54.9M/104M [00:00<00:00, 86.8MB/s]\n 63%|██████▎   | 65.8M/104M [00:00<00:00, 95.7MB/s]\n 74%|███████▍  | 77.1M/104M [00:01<00:00, 103MB/s] \n 85%|████████▌ | 89.2M/104M [00:01<00:00, 110MB/s]\n 97%|█████████▋| 102M/104M [00:01<00:00, 116MB/s] \n100%|██████████| 104M/104M [00:01<00:00, 89.6MB/s]\nDownloading: \"https://github.com/xinntao/facexlib/releases/download/v0.2.2/parsing_parsenet.pth\" to /root/.pyenv/versions/3.9.18/lib/python3.9/site-packages/facexlib/weights/parsing_parsenet.pth\n  0%|          | 0.00/81.4M [00:00<?, ?B/s]\n  5%|▌         | 4.19M/81.4M [00:00<00:01, 43.6MB/s]\n 13%|█▎        | 10.6M/81.4M [00:00<00:01, 57.4MB/s]\n 22%|██▏       | 18.3M/81.4M [00:00<00:00, 67.9MB/s]\n 36%|███▌      | 29.2M/81.4M [00:00<00:00, 86.4MB/s]\n 53%|█████▎    | 43.3M/81.4M [00:00<00:00, 108MB/s] \n 68%|██████▊   | 55.1M/81.4M [00:00<00:00, 114MB/s]\n 83%|████████▎ | 67.5M/81.4M [00:00<00:00, 119MB/s]\n100%|██████████| 81.4M/81.4M [00:00<00:00, 107MB/s]\nLoading RealESRGAN_x2plus.pth for background upsampling...\ntimesteps used in spaced sampler:\n[0, 20, 41, 61, 82, 102, 122, 143, 163, 183, 204, 224, 245, 265, 285, 306, 326, 347, 367, 387, 408, 428, 449, 469, 489, 510, 530, 550, 571, 591, 612, 632, 652, 673, 693, 714, 734, 754, 775, 795, 816, 836, 856, 877, 897, 917, 938, 958, 979, 999]\nSpaced Sampler:   0%|          | 0/50 [00:00<?, ?it/s]\nSpaced Sampler:   2%|▏         | 1/50 [00:00<00:10,  4.78it/s]\nSpaced Sampler:   6%|▌         | 3/50 [00:00<00:05,  8.71it/s]\nSpaced Sampler:  10%|█         | 5/50 [00:00<00:04, 10.22it/s]\nSpaced Sampler:  14%|█▍        | 7/50 [00:00<00:03, 11.02it/s]\nSpaced Sampler:  18%|█▊        | 9/50 [00:00<00:03, 11.47it/s]\nSpaced Sampler:  22%|██▏       | 11/50 [00:01<00:03, 11.76it/s]\nSpaced Sampler:  26%|██▌       | 13/50 [00:01<00:03, 11.94it/s]\nSpaced Sampler:  30%|███       | 15/50 [00:01<00:02, 12.07it/s]\nSpaced Sampler:  34%|███▍      | 17/50 [00:01<00:02, 12.15it/s]\nSpaced Sampler:  38%|███▊      | 19/50 [00:01<00:02, 12.16it/s]\nSpaced Sampler:  42%|████▏     | 21/50 [00:01<00:02, 12.19it/s]\nSpaced Sampler:  46%|████▌     | 23/50 [00:02<00:02, 12.22it/s]\nSpaced Sampler:  50%|█████     | 25/50 [00:02<00:02, 12.23it/s]\nSpaced Sampler:  54%|█████▍    | 27/50 [00:02<00:01, 12.26it/s]\nSpaced Sampler:  58%|█████▊    | 29/50 [00:02<00:01, 12.27it/s]\nSpaced Sampler:  62%|██████▏   | 31/50 [00:02<00:01, 12.27it/s]\nSpaced Sampler:  66%|██████▌   | 33/50 [00:02<00:01, 12.27it/s]\nSpaced Sampler:  70%|███████   | 35/50 [00:02<00:01, 12.24it/s]\nSpaced Sampler:  74%|███████▍  | 37/50 [00:03<00:01, 12.14it/s]\nSpaced Sampler:  78%|███████▊  | 39/50 [00:03<00:00, 12.18it/s]\nSpaced Sampler:  82%|████████▏ | 41/50 [00:03<00:00, 12.19it/s]\nSpaced Sampler:  86%|████████▌ | 43/50 [00:03<00:00, 12.19it/s]\nSpaced Sampler:  90%|█████████ | 45/50 [00:03<00:00, 12.20it/s]\nSpaced Sampler:  94%|█████████▍| 47/50 [00:03<00:00, 12.17it/s]\nSpaced Sampler:  98%|█████████▊| 49/50 [00:04<00:00, 12.12it/s]\nSpaced Sampler: 100%|██████████| 50/50 [00:04<00:00, 11.86it/s]\nFace image tmpbr7p39dy0427 saved to ./..",
  "metrics": {
    "predict_time": 36.889872,
    "total_time": 123.993378
  },
  "output": [
    "https://replicate.delivery/pbxt/tjBj5e8QUiSAHaJhYwLUV2Sb5fmmp9VuIvfb6X4fG6UCHp1GB/tmpbr7p39dy0427.png"
  ],
  "started_at": "2023-10-12T13:19:08.542805Z",
  "status": "succeeded",
  "urls": {
    "get": "https://api.replicate.com/v1/predictions/77euyklbgcyarhaczq7uwxulai",
    "cancel": "https://api.replicate.com/v1/predictions/77euyklbgcyarhaczq7uwxulai/cancel"
  },
  "version": "51ed1464d8bbbaca811153b051d3b09ab42f0bdeb85804ae26ba323d7a66a4ac"
}

Generated in

36.9 seconds

Tweak itReport

ckptckptckpt weights/face_full_v1.ckpt
Switching from mode 'FULL' to 'FACE'...
Building and loading 'FACE' mode model...
ControlLDM: Running in eps-prediction mode
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.
DiffusionWrapper has 865.91 M params.
making attention of type 'vanilla-xformers' with 512 in_channels
building MemoryEfficientAttnBlock with 512 in_channels...
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla-xformers' with 512 in_channels
building MemoryEfficientAttnBlock with 512 in_channels...
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up [LPIPS] perceptual loss: trunk [alex], v[0.1], spatial [off]
Loading model from: /root/.pyenv/versions/3.9.18/lib/python3.9/site-packages/lpips/weights/v0.1/alex.pth
reload swinir model from weights/general_swinir_v1.ckpt
ENABLE XFORMERS!
Model successfully switched to 'FACE' mode.
{'bg_tile': 400,
'bg_tile_stride': 400,
'bg_upsampler': 'RealESRGAN',
'ckpt': 'weights/face_full_v1.ckpt',
'color_fix_type': 'wavelet',
'config': 'configs/model/cldm.yaml',
'detection_model': 'retinaface_resnet50',
'device': 'cuda',
'disable_preprocess_model': False,
'g_repeat': 5,
'g_scale': 0.0,
'g_space': 'latent',
 'g_t_start': 1001,
 'g_t_stop': -1,
 'has_aligned': True,
'image_size': 512,
'input': '/tmp/tmpbr7p39dy0427.png',
 'only_center_face': False,
 'output': '.',
 'reload_swinir': False,
'repeat_times': 1,
 'seed': 231,
 'show_lq': False,
 'skip_if_exist': False,
 'sr_scale': 1,
'steps': 50,
 'swinir_ckpt': 'weights/general_swinir_v1.ckpt',
'tile_size': 512,
'tile_stride': 256,
 'tiled': False,
 'use_guidance': False}
Global seed set to 231
/root/.pyenv/versions/3.9.18/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=None`.
warnings.warn(msg)
Downloading: "https://github.com/xinntao/facexlib/releases/download/v0.1.0/detection_Resnet50_Final.pth" to /root/.pyenv/versions/3.9.18/lib/python3.9/site-packages/facexlib/weights/detection_Resnet50_Final.pth
  0%|          | 0.00/104M [00:00<?, ?B/s]
  4%|▎         | 3.81M/104M [00:00<00:02, 39.8MB/s]
  8%|▊         | 8.60M/104M [00:00<00:02, 45.9MB/s]
 14%|█▎        | 14.1M/104M [00:00<00:01, 51.3MB/s]
 20%|█▉        | 20.6M/104M [00:00<00:01, 57.8MB/s]
 27%|██▋       | 28.1M/104M [00:00<00:01, 65.5MB/s]
 34%|███▍      | 35.7M/104M [00:00<00:01, 70.4MB/s]
 43%|████▎     | 45.0M/104M [00:00<00:00, 79.3MB/s]
 53%|█████▎    | 54.9M/104M [00:00<00:00, 86.8MB/s]
 63%|██████▎   | 65.8M/104M [00:00<00:00, 95.7MB/s]
 74%|███████▍  | 77.1M/104M [00:01<00:00, 103MB/s] 
 85%|████████▌ | 89.2M/104M [00:01<00:00, 110MB/s]
 97%|█████████▋| 102M/104M [00:01<00:00, 116MB/s] 
100%|██████████| 104M/104M [00:01<00:00, 89.6MB/s]
Downloading: "https://github.com/xinntao/facexlib/releases/download/v0.2.2/parsing_parsenet.pth" to /root/.pyenv/versions/3.9.18/lib/python3.9/site-packages/facexlib/weights/parsing_parsenet.pth
  0%|          | 0.00/81.4M [00:00<?, ?B/s]
  5%|▌         | 4.19M/81.4M [00:00<00:01, 43.6MB/s]
 13%|█▎        | 10.6M/81.4M [00:00<00:01, 57.4MB/s]
 22%|██▏       | 18.3M/81.4M [00:00<00:00, 67.9MB/s]
 36%|███▌      | 29.2M/81.4M [00:00<00:00, 86.4MB/s]
 53%|█████▎    | 43.3M/81.4M [00:00<00:00, 108MB/s] 
 68%|██████▊   | 55.1M/81.4M [00:00<00:00, 114MB/s]
 83%|████████▎ | 67.5M/81.4M [00:00<00:00, 119MB/s]
100%|██████████| 81.4M/81.4M [00:00<00:00, 107MB/s]
Loading RealESRGAN_x2plus.pth for background upsampling...
timesteps used in spaced sampler:
[0, 20, 41, 61, 82, 102, 122, 143, 163, 183, 204, 224, 245, 265, 285, 306, 326, 347, 367, 387, 408, 428, 449, 469, 489, 510, 530, 550, 571, 591, 612, 632, 652, 673, 693, 714, 734, 754, 775, 795, 816, 836, 856, 877, 897, 917, 938, 958, 979, 999]
Spaced Sampler:   0%|          | 0/50 [00:00<?, ?it/s]
Spaced Sampler:   2%|▏         | 1/50 [00:00<00:10,  4.78it/s]
Spaced Sampler:   6%|▌         | 3/50 [00:00<00:05,  8.71it/s]
Spaced Sampler:  10%|█         | 5/50 [00:00<00:04, 10.22it/s]
Spaced Sampler:  14%|█▍        | 7/50 [00:00<00:03, 11.02it/s]
Spaced Sampler:  18%|█▊        | 9/50 [00:00<00:03, 11.47it/s]
Spaced Sampler:  22%|██▏       | 11/50 [00:01<00:03, 11.76it/s]
Spaced Sampler:  26%|██▌       | 13/50 [00:01<00:03, 11.94it/s]
Spaced Sampler:  30%|███       | 15/50 [00:01<00:02, 12.07it/s]
Spaced Sampler:  34%|███▍      | 17/50 [00:01<00:02, 12.15it/s]
Spaced Sampler:  38%|███▊      | 19/50 [00:01<00:02, 12.16it/s]
Spaced Sampler:  42%|████▏     | 21/50 [00:01<00:02, 12.19it/s]
Spaced Sampler:  46%|████▌     | 23/50 [00:02<00:02, 12.22it/s]
Spaced Sampler:  50%|█████     | 25/50 [00:02<00:02, 12.23it/s]
Spaced Sampler:  54%|█████▍    | 27/50 [00:02<00:01, 12.26it/s]
Spaced Sampler:  58%|█████▊    | 29/50 [00:02<00:01, 12.27it/s]
Spaced Sampler:  62%|██████▏   | 31/50 [00:02<00:01, 12.27it/s]
Spaced Sampler:  66%|██████▌   | 33/50 [00:02<00:01, 12.27it/s]
Spaced Sampler:  70%|███████   | 35/50 [00:02<00:01, 12.24it/s]
Spaced Sampler:  74%|███████▍  | 37/50 [00:03<00:01, 12.14it/s]
Spaced Sampler:  78%|███████▊  | 39/50 [00:03<00:00, 12.18it/s]
Spaced Sampler:  82%|████████▏ | 41/50 [00:03<00:00, 12.19it/s]
Spaced Sampler:  86%|████████▌ | 43/50 [00:03<00:00, 12.19it/s]
Spaced Sampler:  90%|█████████ | 45/50 [00:03<00:00, 12.20it/s]
Spaced Sampler:  94%|█████████▍| 47/50 [00:03<00:00, 12.17it/s]
Spaced Sampler:  98%|█████████▊| 49/50 [00:04<00:00, 12.12it/s]
Spaced Sampler: 100%|██████████| 50/50 [00:04<00:00, 11.86it/s]
Face image tmpbr7p39dy0427 saved to ./..