astramlco/diffbir | Run with an API on Replicate

Cold

DiffBIR: Towards Blind Image Restoration with Generative Diffusion Prior

Public

515 runs

Run with an API

Playground API Examples README Versions

Input

input

*file

Path to the input image you want to enhance.

upscaling_model_type

string

Choose the type of model best suited for the primary content of the image: 'faces' for portraits and 'general_scenes' for everything else.

Default: "general_scenes"

restoration_model_type

string

Select the restoration model that aligns with the content of your image. This model is responsible for image restoration which removes degradations.

Default: "general_scenes"

reload_restoration_model

boolean

Reload the image restoration model (SwinIR) if set to True. This can be useful if you've updated or changed the underlying SwinIR model.

Default: false

steps

integer

(minimum: 1, maximum: 100)

The number of enhancement iterations to perform. More steps might result in a clearer image but can also introduce artifacts.

Default: 50

super_resolution_factor

integer

(minimum: 1, maximum: 4)

Factor by which the input image resolution should be increased. For instance, a factor of 4 will make the resolution 4 times greater in both height and width.

Default: 4

repeat_times

integer

(minimum: 1, maximum: 10)

Number of times the enhancement process is repeated by feeding the output back as input. This can refine the result but might also introduce over-enhancement issues.

Default: 1

disable_preprocess_model

boolean

Disables the initial preprocessing step using SwinIR. Turn this off if your input image is already of high quality and doesn't require restoration.

Default: false

tiled

boolean

Whether to use patch-based sampling. This can be useful for very large images to enhance them in smaller chunks rather than all at once.

Default: false

tile_size

integer

Size of each tile (or patch) when 'tiled' option is enabled. Determines how the image is divided during patch-based enhancement.

Default: 512

tile_stride

integer

Distance between the start of each tile when the image is divided for patch-based enhancement. A smaller stride means more overlap between tiles.

Default: 256

use_guidance

boolean

Use latent image guidance for enhancement. This can help in achieving more accurate and contextually relevant enhancements.

Default: false

guidance_scale

number

For 'general_scenes': Scale factor for the guidance mechanism. Adjusts the influence of guidance on the enhancement process.

Default: 0

guidance_time_start

integer

For 'general_scenes': Specifies when (at which step) the guidance mechanism starts influencing the enhancement.

Default: 1001

guidance_time_stop

integer

For 'general_scenes': Specifies when (at which step) the guidance mechanism stops influencing the enhancement.

Default: -1

guidance_space

string

For 'general_scenes': Determines in which space (RGB or latent) the guidance operates. 'latent' can often provide more subtle and context-aware enhancements.

Default: "latent"

guidance_repeat

integer

For 'general_scenes': Number of times the guidance process is repeated during enhancement.

Default: 5

color_fix_type

string

Method used for color correction post enhancement. 'wavelet' and 'adain' offer different styles of color correction, while 'none' skips this step.

Default: "wavelet"

seed

integer

Random seed to ensure reproducibility. Setting this ensures that multiple runs with the same input produce the same output.

Default: 231

has_aligned

boolean

For 'faces' mode: Indicates if the input images are already cropped and aligned to faces. If not, the model will attempt to do this.

Default: false

only_center_face

boolean

For 'faces' mode: If multiple faces are detected, only enhance the center-most face in the image.

Default: false

face_detection_model

string

For 'faces' mode: Model used for detecting faces in the image. Choose based on accuracy and speed preferences.

Default: "retinaface_resnet50"

background_upsampler

string

For 'faces' mode: Model used to upscale the background in images where the primary subject is a face.

Default: "RealESRGAN"

background_upsampler_tile

integer

For 'faces' mode: Size of each tile used by the background upsampler when dividing the image into patches.

Default: 400

background_upsampler_tile_stride

integer

For 'faces' mode: Distance between the start of each tile when the background is divided for upscaling. A smaller stride means more overlap between tiles.

Default: 400

Run this model in Node.js with one line of code:

npx create-replicate --model=astramlco/diffbir

or set up a project from scratch

Install Replicate’s Node.js client library:

npm install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import and set up the client:

import Replicate from "replicate";
import fs from "node:fs";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run astramlco/diffbir using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "astramlco/diffbir:f7a6e7832fee8d2593be566723295b80ed14b424f8365f8647e19775f617e205",
  {
    input: {
      seed: 231,
      input: "https://replicate.delivery/pbxt/Ketx7Rc2HPkgBgsBAmZ14gWWKc6aiKsA783giEEf4qiHDtOb/C.L.A.I.R.E._everhart_s.jpg",
      steps: 50,
      tiled: false,
      tile_size: 512,
      has_aligned: false,
      tile_stride: 256,
      repeat_times: 1,
      use_guidance: false,
      color_fix_type: "wavelet",
      guidance_scale: 0,
      guidance_space: "latent",
      guidance_repeat: 5,
      only_center_face: false,
      guidance_time_stop: -1,
      guidance_time_start: 1001,
      background_upsampler: "RealESRGAN",
      face_detection_model: "retinaface_resnet50",
      upscaling_model_type: "faces",
      restoration_model_type: "faces",
      super_resolution_factor: 2,
      disable_preprocess_model: false,
      reload_restoration_model: false,
      background_upsampler_tile: 400,
      background_upsampler_tile_stride: 400
    }
  }
);

// To access the file URL:
console.log(output[0].url()); //=> "http://example.com"

// To write the file to disk:
fs.writeFile("my-image.png", output[0]);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import the client:

import replicate

Run astramlco/diffbir using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "astramlco/diffbir:f7a6e7832fee8d2593be566723295b80ed14b424f8365f8647e19775f617e205",
    input={
        "seed": 231,
        "input": "https://replicate.delivery/pbxt/Ketx7Rc2HPkgBgsBAmZ14gWWKc6aiKsA783giEEf4qiHDtOb/C.L.A.I.R.E._everhart_s.jpg",
        "steps": 50,
        "tiled": False,
        "tile_size": 512,
        "has_aligned": False,
        "tile_stride": 256,
        "repeat_times": 1,
        "use_guidance": False,
        "color_fix_type": "wavelet",
        "guidance_scale": 0,
        "guidance_space": "latent",
        "guidance_repeat": 5,
        "only_center_face": False,
        "guidance_time_stop": -1,
        "guidance_time_start": 1001,
        "background_upsampler": "RealESRGAN",
        "face_detection_model": "retinaface_resnet50",
        "upscaling_model_type": "faces",
        "restoration_model_type": "faces",
        "super_resolution_factor": 2,
        "disable_preprocess_model": False,
        "reload_restoration_model": False,
        "background_upsampler_tile": 400,
        "background_upsampler_tile_stride": 400
    }
)

# To access the file URL:
print(output[0].url())
#=> "http://example.com"

# To write the file to disk:
with open("my-image.png", "wb") as file:
    file.write(output[0].read())

To learn more, take a look at the guide on getting started with Python.

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Run astramlco/diffbir using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "astramlco/diffbir:f7a6e7832fee8d2593be566723295b80ed14b424f8365f8647e19775f617e205",
    "input": {
      "seed": 231,
      "input": "https://replicate.delivery/pbxt/Ketx7Rc2HPkgBgsBAmZ14gWWKc6aiKsA783giEEf4qiHDtOb/C.L.A.I.R.E._everhart_s.jpg",
      "steps": 50,
      "tiled": false,
      "tile_size": 512,
      "has_aligned": false,
      "tile_stride": 256,
      "repeat_times": 1,
      "use_guidance": false,
      "color_fix_type": "wavelet",
      "guidance_scale": 0,
      "guidance_space": "latent",
      "guidance_repeat": 5,
      "only_center_face": false,
      "guidance_time_stop": -1,
      "guidance_time_start": 1001,
      "background_upsampler": "RealESRGAN",
      "face_detection_model": "retinaface_resnet50",
      "upscaling_model_type": "faces",
      "restoration_model_type": "faces",
      "super_resolution_factor": 2,
      "disable_preprocess_model": false,
      "reload_restoration_model": false,
      "background_upsampler_tile": 400,
      "background_upsampler_tile_stride": 400
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

You can run this model locally using Cog. First, install Cog:

brew install cog

If you don’t have Homebrew, there are other installation options available.

Run this to download the model and run it in your local environment:

cog predict r8.im/astramlco/diffbir@sha256:f7a6e7832fee8d2593be566723295b80ed14b424f8365f8647e19775f617e205 \
  -i 'seed=231' \
  -i 'input="https://replicate.delivery/pbxt/Ketx7Rc2HPkgBgsBAmZ14gWWKc6aiKsA783giEEf4qiHDtOb/C.L.A.I.R.E._everhart_s.jpg"' \
  -i 'steps=50' \
  -i 'tiled=false' \
  -i 'tile_size=512' \
  -i 'has_aligned=false' \
  -i 'tile_stride=256' \
  -i 'repeat_times=1' \
  -i 'use_guidance=false' \
  -i 'color_fix_type="wavelet"' \
  -i 'guidance_scale=0' \
  -i 'guidance_space="latent"' \
  -i 'guidance_repeat=5' \
  -i 'only_center_face=false' \
  -i 'guidance_time_stop=-1' \
  -i 'guidance_time_start=1001' \
  -i 'background_upsampler="RealESRGAN"' \
  -i 'face_detection_model="retinaface_resnet50"' \
  -i 'upscaling_model_type="faces"' \
  -i 'restoration_model_type="faces"' \
  -i 'super_resolution_factor=2' \
  -i 'disable_preprocess_model=false' \
  -i 'reload_restoration_model=false' \
  -i 'background_upsampler_tile=400' \
  -i 'background_upsampler_tile_stride=400'

To learn more, take a look at the Cog documentation.

Run this to download the model and run it in your local environment:

docker run -d -p 5000:5000 --gpus=all r8.im/astramlco/diffbir@sha256:f7a6e7832fee8d2593be566723295b80ed14b424f8365f8647e19775f617e205
curl -s -X POST \
  -H "Content-Type: application/json" \
  -d $'{
    "input": {
      "seed": 231,
      "input": "https://replicate.delivery/pbxt/Ketx7Rc2HPkgBgsBAmZ14gWWKc6aiKsA783giEEf4qiHDtOb/C.L.A.I.R.E._everhart_s.jpg",
      "steps": 50,
      "tiled": false,
      "tile_size": 512,
      "has_aligned": false,
      "tile_stride": 256,
      "repeat_times": 1,
      "use_guidance": false,
      "color_fix_type": "wavelet",
      "guidance_scale": 0,
      "guidance_space": "latent",
      "guidance_repeat": 5,
      "only_center_face": false,
      "guidance_time_stop": -1,
      "guidance_time_start": 1001,
      "background_upsampler": "RealESRGAN",
      "face_detection_model": "retinaface_resnet50",
      "upscaling_model_type": "faces",
      "restoration_model_type": "faces",
      "super_resolution_factor": 2,
      "disable_preprocess_model": false,
      "reload_restoration_model": false,
      "background_upsampler_tile": 400,
      "background_upsampler_tile_stride": 400
    }
  }' \
  http://localhost:5000/predictions

To learn more, take a look at the Cog documentation.

Output

{
  "completed_at": "2024-03-30T07:49:57.477422Z",
  "created_at": "2024-03-30T07:46:10.719345Z",
  "data_removed": false,
  "error": null,
  "id": "lg6q6ydbytjofvmc7o474gdleu",
  "input": {
    "seed": 231,
    "input": "https://replicate.delivery/pbxt/Ketx7Rc2HPkgBgsBAmZ14gWWKc6aiKsA783giEEf4qiHDtOb/C.L.A.I.R.E._everhart_s.jpg",
    "steps": 50,
    "tiled": false,
    "tile_size": 512,
    "has_aligned": false,
    "tile_stride": 256,
    "repeat_times": 1,
    "use_guidance": false,
    "color_fix_type": "wavelet",
    "guidance_scale": 0,
    "guidance_space": "latent",
    "guidance_repeat": 5,
    "only_center_face": false,
    "guidance_time_stop": -1,
    "guidance_time_start": 1001,
    "background_upsampler": "RealESRGAN",
    "face_detection_model": "retinaface_resnet50",
    "upscaling_model_type": "faces",
    "restoration_model_type": "faces",
    "super_resolution_factor": 2,
    "disable_preprocess_model": false,
    "reload_restoration_model": false,
    "background_upsampler_tile": 400,
    "background_upsampler_tile_stride": 400
  },
  "logs": "ckptckptckpt weights/face_full_v1.ckpt\nSwitching from mode 'FULL' to 'FACE'...\nBuilding and loading 'FACE' mode model...\nControlLDM: Running in eps-prediction mode\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.\nDiffusionWrapper has 865.91 M params.\nmaking attention of type 'vanilla-xformers' with 512 in_channels\nbuilding MemoryEfficientAttnBlock with 512 in_channels...\nWorking with z of shape (1, 4, 32, 32) = 4096 dimensions.\nmaking attention of type 'vanilla-xformers' with 512 in_channels\nbuilding MemoryEfficientAttnBlock with 512 in_channels...\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.\nSetting up [LPIPS] perceptual loss: trunk [alex], v[0.1], spatial [off]\nLoading model from: /root/.pyenv/versions/3.9.19/lib/python3.9/site-packages/lpips/weights/v0.1/alex.pth\nreload swinir model from weights/face_swinir_v1.ckpt\nENABLE XFORMERS!\nModel successfully switched to 'FACE' mode.\n{'bg_tile': 400,\n'bg_tile_stride': 400,\n'bg_upsampler': 'RealESRGAN',\n'ckpt': 'weights/face_full_v1.ckpt',\n'color_fix_type': 'wavelet',\n'config': 'configs/model/cldm.yaml',\n'detection_model': 'retinaface_resnet50',\n'device': 'cuda',\n'disable_preprocess_model': False,\n'g_repeat': 5,\n'g_scale': 0.0,\n'g_space': 'latent',\n'g_t_start': 1001,\n'g_t_stop': -1,\n'has_aligned': False,\n'image_size': 512,\n'input': '/tmp/tmpcdoq1b8cC.L.A.I.R.E._everhart_s.jpg',\n'only_center_face': False,\n'output': '.',\n'reload_swinir': False,\n'repeat_times': 1,\n'seed': 231,\n'show_lq': False,\n'skip_if_exist': False,\n'sr_scale': 2,\n'steps': 50,\n'swinir_ckpt': 'weights/face_swinir_v1.ckpt',\n'tile_size': 512,\n'tile_stride': 256,\n'tiled': False,\n'use_guidance': False}\nGlobal seed set to 231\n/root/.pyenv/versions/3.9.19/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=None`.\nwarnings.warn(msg)\nDownloading: \"https://github.com/xinntao/facexlib/releases/download/v0.1.0/detection_Resnet50_Final.pth\" to /root/.pyenv/versions/3.9.19/lib/python3.9/site-packages/facexlib/weights/detection_Resnet50_Final.pth\n  0%|          | 0.00/104M [00:00<?, ?B/s]\n 39%|███▉      | 40.5M/104M [00:00<00:00, 425MB/s]\n 78%|███████▊  | 81.1M/104M [00:00<00:00, 98.1MB/s]\n100%|██████████| 104M/104M [00:00<00:00, 127MB/s]\nDownloading: \"https://github.com/xinntao/facexlib/releases/download/v0.2.2/parsing_parsenet.pth\" to /root/.pyenv/versions/3.9.19/lib/python3.9/site-packages/facexlib/weights/parsing_parsenet.pth\n  0%|          | 0.00/81.4M [00:00<?, ?B/s]\n 54%|█████▍    | 43.8M/81.4M [00:00<00:00, 459MB/s]\n100%|██████████| 81.4M/81.4M [00:00<00:00, 464MB/s]\nLoading RealESRGAN_x2plus.pth for background upsampling...\nDownloading: \"https://github.com/xinntao/Real-ESRGAN/releases/download/v0.2.1/RealESRGAN_x2plus.pth\" to /src/weights/realesrgan/RealESRGAN_x2plus.pth\n  0%|          | 0.00/64.0M [00:00<?, ?B/s]\n 52%|█████▏    | 33.4M/64.0M [00:00<00:00, 351MB/s]\n100%|██████████| 64.0M/64.0M [00:00<00:00, 386MB/s]\ntimesteps used in spaced sampler:\n[0, 20, 41, 61, 82, 102, 122, 143, 163, 183, 204, 224, 245, 265, 285, 306, 326, 347, 367, 387, 408, 428, 449, 469, 489, 510, 530, 550, 571, 591, 612, 632, 652, 673, 693, 714, 734, 754, 775, 795, 816, 836, 856, 877, 897, 917, 938, 958, 979, 999]\nSpaced Sampler:   0%|          | 0/50 [00:00<?, ?it/s]\nSpaced Sampler:   2%|▏         | 1/50 [00:00<00:10,  4.79it/s]\nSpaced Sampler:   6%|▌         | 3/50 [00:00<00:05,  8.70it/s]\nSpaced Sampler:  10%|█         | 5/50 [00:00<00:04, 10.19it/s]\nSpaced Sampler:  14%|█▍        | 7/50 [00:00<00:03, 10.93it/s]\nSpaced Sampler:  18%|█▊        | 9/50 [00:00<00:03, 11.34it/s]\nSpaced Sampler:  22%|██▏       | 11/50 [00:01<00:03, 11.59it/s]\nSpaced Sampler:  26%|██▌       | 13/50 [00:01<00:03, 11.81it/s]\nSpaced Sampler:  30%|███       | 15/50 [00:01<00:02, 11.96it/s]\nSpaced Sampler:  34%|███▍      | 17/50 [00:01<00:02, 12.07it/s]\nSpaced Sampler:  38%|███▊      | 19/50 [00:01<00:02, 12.13it/s]\nSpaced Sampler:  42%|████▏     | 21/50 [00:01<00:02, 12.17it/s]\nSpaced Sampler:  46%|████▌     | 23/50 [00:02<00:02, 12.14it/s]\nSpaced Sampler:  50%|█████     | 25/50 [00:02<00:02, 12.13it/s]\nSpaced Sampler:  54%|█████▍    | 27/50 [00:02<00:01, 12.11it/s]\nSpaced Sampler:  58%|█████▊    | 29/50 [00:02<00:01, 12.10it/s]\nSpaced Sampler:  62%|██████▏   | 31/50 [00:02<00:01, 12.07it/s]\nSpaced Sampler:  66%|██████▌   | 33/50 [00:02<00:01, 12.08it/s]\nSpaced Sampler:  70%|███████   | 35/50 [00:03<00:01, 12.06it/s]\nSpaced Sampler:  74%|███████▍  | 37/50 [00:03<00:01, 12.05it/s]\nSpaced Sampler:  78%|███████▊  | 39/50 [00:03<00:00, 11.95it/s]\nSpaced Sampler:  82%|████████▏ | 41/50 [00:03<00:00, 11.97it/s]\nSpaced Sampler:  86%|████████▌ | 43/50 [00:03<00:00, 12.01it/s]\nSpaced Sampler:  90%|█████████ | 45/50 [00:03<00:00, 12.06it/s]\nSpaced Sampler:  94%|█████████▍| 47/50 [00:04<00:00, 12.12it/s]\nSpaced Sampler:  98%|█████████▊| 49/50 [00:04<00:00, 12.17it/s]\nSpaced Sampler: 100%|██████████| 50/50 [00:04<00:00, 11.77it/s]\nupsampling the background image using RealESRGAN...\nFace image tmpcdoq1b8cC.L.A.I.R.E._everhart_s saved to ./..",
  "metrics": {
    "predict_time": 43.867453,
    "total_time": 226.758077
  },
  "output": [
    "https://replicate.delivery/pbxt/ZebgzumoHLy2ZCL5ppfJnICvW5SBKGZ2emNQxPeS35KNidVKB/tmpcdoq1b8cC.L.A.I.R.E._everhart_s_00.png",
    "https://replicate.delivery/pbxt/e1wupnukQOzFYaCzdEemITU2qoHPTUcDRHyfiWepxncNidVKB/tmpcdoq1b8cC.L.A.I.R.E._everhart_s_00.png",
    "https://replicate.delivery/pbxt/a35MV4BcyI7wEdbFav8hrbX8CuOt3IU8jEr39S6E9geSsrSJA/tmpcdoq1b8cC.L.A.I.R.E._everhart_s.png"
  ],
  "started_at": "2024-03-30T07:49:13.609969Z",
  "status": "succeeded",
  "urls": {
    "get": "https://api.replicate.com/v1/predictions/lg6q6ydbytjofvmc7o474gdleu",
    "cancel": "https://api.replicate.com/v1/predictions/lg6q6ydbytjofvmc7o474gdleu/cancel"
  },
  "version": "f7a6e7832fee8d2593be566723295b80ed14b424f8365f8647e19775f617e205"
}

Generated in

43.9 seconds

Tweak it Iterate in playgroundReport View full prediction

ckptckptckpt weights/face_full_v1.ckpt
Switching from mode 'FULL' to 'FACE'...
Building and loading 'FACE' mode model...
ControlLDM: Running in eps-prediction mode
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.
DiffusionWrapper has 865.91 M params.
making attention of type 'vanilla-xformers' with 512 in_channels
building MemoryEfficientAttnBlock with 512 in_channels...
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla-xformers' with 512 in_channels
building MemoryEfficientAttnBlock with 512 in_channels...
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up [LPIPS] perceptual loss: trunk [alex], v[0.1], spatial [off]
Loading model from: /root/.pyenv/versions/3.9.19/lib/python3.9/site-packages/lpips/weights/v0.1/alex.pth
reload swinir model from weights/face_swinir_v1.ckpt
ENABLE XFORMERS!
Model successfully switched to 'FACE' mode.
{'bg_tile': 400,
'bg_tile_stride': 400,
'bg_upsampler': 'RealESRGAN',
'ckpt': 'weights/face_full_v1.ckpt',
'color_fix_type': 'wavelet',
'config': 'configs/model/cldm.yaml',
'detection_model': 'retinaface_resnet50',
'device': 'cuda',
'disable_preprocess_model': False,
'g_repeat': 5,
'g_scale': 0.0,
'g_space': 'latent',
'g_t_start': 1001,
'g_t_stop': -1,
'has_aligned': False,
'image_size': 512,
'input': '/tmp/tmpcdoq1b8cC.L.A.I.R.E._everhart_s.jpg',
'only_center_face': False,
'output': '.',
'reload_swinir': False,
'repeat_times': 1,
'seed': 231,
'show_lq': False,
'skip_if_exist': False,
'sr_scale': 2,
'steps': 50,
'swinir_ckpt': 'weights/face_swinir_v1.ckpt',
'tile_size': 512,
'tile_stride': 256,
'tiled': False,
'use_guidance': False}
Global seed set to 231
/root/.pyenv/versions/3.9.19/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=None`.
warnings.warn(msg)
Downloading: "https://github.com/xinntao/facexlib/releases/download/v0.1.0/detection_Resnet50_Final.pth" to /root/.pyenv/versions/3.9.19/lib/python3.9/site-packages/facexlib/weights/detection_Resnet50_Final.pth
  0%|          | 0.00/104M [00:00<?, ?B/s]
 39%|███▉      | 40.5M/104M [00:00<00:00, 425MB/s]
 78%|███████▊  | 81.1M/104M [00:00<00:00, 98.1MB/s]
100%|██████████| 104M/104M [00:00<00:00, 127MB/s]
Downloading: "https://github.com/xinntao/facexlib/releases/download/v0.2.2/parsing_parsenet.pth" to /root/.pyenv/versions/3.9.19/lib/python3.9/site-packages/facexlib/weights/parsing_parsenet.pth
  0%|          | 0.00/81.4M [00:00<?, ?B/s]
 54%|█████▍    | 43.8M/81.4M [00:00<00:00, 459MB/s]
100%|██████████| 81.4M/81.4M [00:00<00:00, 464MB/s]
Loading RealESRGAN_x2plus.pth for background upsampling...
Downloading: "https://github.com/xinntao/Real-ESRGAN/releases/download/v0.2.1/RealESRGAN_x2plus.pth" to /src/weights/realesrgan/RealESRGAN_x2plus.pth
  0%|          | 0.00/64.0M [00:00<?, ?B/s]
 52%|█████▏    | 33.4M/64.0M [00:00<00:00, 351MB/s]
100%|██████████| 64.0M/64.0M [00:00<00:00, 386MB/s]
timesteps used in spaced sampler:
[0, 20, 41, 61, 82, 102, 122, 143, 163, 183, 204, 224, 245, 265, 285, 306, 326, 347, 367, 387, 408, 428, 449, 469, 489, 510, 530, 550, 571, 591, 612, 632, 652, 673, 693, 714, 734, 754, 775, 795, 816, 836, 856, 877, 897, 917, 938, 958, 979, 999]
Spaced Sampler:   0%|          | 0/50 [00:00<?, ?it/s]
Spaced Sampler:   2%|▏         | 1/50 [00:00<00:10,  4.79it/s]
Spaced Sampler:   6%|▌         | 3/50 [00:00<00:05,  8.70it/s]
Spaced Sampler:  10%|█         | 5/50 [00:00<00:04, 10.19it/s]
Spaced Sampler:  14%|█▍        | 7/50 [00:00<00:03, 10.93it/s]
Spaced Sampler:  18%|█▊        | 9/50 [00:00<00:03, 11.34it/s]
Spaced Sampler:  22%|██▏       | 11/50 [00:01<00:03, 11.59it/s]
Spaced Sampler:  26%|██▌       | 13/50 [00:01<00:03, 11.81it/s]
Spaced Sampler:  30%|███       | 15/50 [00:01<00:02, 11.96it/s]
Spaced Sampler:  34%|███▍      | 17/50 [00:01<00:02, 12.07it/s]
Spaced Sampler:  38%|███▊      | 19/50 [00:01<00:02, 12.13it/s]
Spaced Sampler:  42%|████▏     | 21/50 [00:01<00:02, 12.17it/s]
Spaced Sampler:  46%|████▌     | 23/50 [00:02<00:02, 12.14it/s]
Spaced Sampler:  50%|█████     | 25/50 [00:02<00:02, 12.13it/s]
Spaced Sampler:  54%|█████▍    | 27/50 [00:02<00:01, 12.11it/s]
Spaced Sampler:  58%|█████▊    | 29/50 [00:02<00:01, 12.10it/s]
Spaced Sampler:  62%|██████▏   | 31/50 [00:02<00:01, 12.07it/s]
Spaced Sampler:  66%|██████▌   | 33/50 [00:02<00:01, 12.08it/s]
Spaced Sampler:  70%|███████   | 35/50 [00:03<00:01, 12.06it/s]
Spaced Sampler:  74%|███████▍  | 37/50 [00:03<00:01, 12.05it/s]
Spaced Sampler:  78%|███████▊  | 39/50 [00:03<00:00, 11.95it/s]
Spaced Sampler:  82%|████████▏ | 41/50 [00:03<00:00, 11.97it/s]
Spaced Sampler:  86%|████████▌ | 43/50 [00:03<00:00, 12.01it/s]
Spaced Sampler:  90%|█████████ | 45/50 [00:03<00:00, 12.06it/s]
Spaced Sampler:  94%|█████████▍| 47/50 [00:04<00:00, 12.12it/s]
Spaced Sampler:  98%|█████████▊| 49/50 [00:04<00:00, 12.17it/s]
Spaced Sampler: 100%|██████████| 50/50 [00:04<00:00, 11.77it/s]
upsampling the background image using RealESRGAN...
Face image tmpcdoq1b8cC.L.A.I.R.E._everhart_s saved to ./..

Examples

View more examples

Run time and cost

This model costs approximately $0.22 to run on Replicate, or 4 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia L40S GPU hardware. Predictions typically complete within 4 minutes. The predict time for this model varies significantly based on the inputs.

Readme

This model doesn't have a readme.