zsxkib
/
diffbir
✨DiffBIR: Towards Blind Image Restoration with Generative Diffusion Prior
Prediction
zsxkib/diffbir:51ed1464IDbmqkjzlb4dtxi3plebcqg6sffuStatusSucceededSourceWebHardwareA40 (Large)Total durationCreatedby @zsxkibInput
- seed
- 231
- steps
- 50
- tiled
- tile_size
- 512
- has_aligned
- tile_stride
- 256
- repeat_times
- 1
- use_guidance
- color_fix_type
- wavelet
- guidance_scale
- 0
- guidance_space
- latent
- guidance_repeat
- 5
- only_center_face
- guidance_time_stop
- -1
- guidance_time_start
- 1001
- background_upsampler
- DiffBIR
- face_detection_model
- retinaface_resnet50
- upscaling_model_type
- faces
- restoration_model_type
- general_scenes
- super_resolution_factor
- 2
- disable_preprocess_model
- reload_restoration_model
- background_upsampler_tile
- 400
- background_upsampler_tile_stride
- 400
{ "seed": 231, "input": "https://replicate.delivery/pbxt/JgdLVwRXXl4oaGqmF4Wdl7vOapnTlay32dE7B3UNgxSwylvQ/Audrey_Hepburn.jpg", "steps": 50, "tiled": false, "tile_size": 512, "has_aligned": false, "tile_stride": 256, "repeat_times": 1, "use_guidance": false, "color_fix_type": "wavelet", "guidance_scale": 0, "guidance_space": "latent", "guidance_repeat": 5, "only_center_face": false, "guidance_time_stop": -1, "guidance_time_start": 1001, "background_upsampler": "DiffBIR", "face_detection_model": "retinaface_resnet50", "upscaling_model_type": "faces", "restoration_model_type": "general_scenes", "super_resolution_factor": 2, "disable_preprocess_model": false, "reload_restoration_model": false, "background_upsampler_tile": 400, "background_upsampler_tile_stride": 400 }
Install Replicate’s Node.js client library:npm install replicate
Set theREPLICATE_API_TOKEN
environment variable:export REPLICATE_API_TOKEN=<paste-your-token-here>
Find your API token in your account settings.
Import and set up the client:import Replicate from "replicate"; const replicate = new Replicate({ auth: process.env.REPLICATE_API_TOKEN, });
Run zsxkib/diffbir using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
const output = await replicate.run( "zsxkib/diffbir:51ed1464d8bbbaca811153b051d3b09ab42f0bdeb85804ae26ba323d7a66a4ac", { input: { seed: 231, input: "https://replicate.delivery/pbxt/JgdLVwRXXl4oaGqmF4Wdl7vOapnTlay32dE7B3UNgxSwylvQ/Audrey_Hepburn.jpg", steps: 50, tiled: false, tile_size: 512, has_aligned: false, tile_stride: 256, repeat_times: 1, use_guidance: false, color_fix_type: "wavelet", guidance_scale: 0, guidance_space: "latent", guidance_repeat: 5, only_center_face: false, guidance_time_stop: -1, guidance_time_start: 1001, background_upsampler: "DiffBIR", face_detection_model: "retinaface_resnet50", upscaling_model_type: "faces", restoration_model_type: "general_scenes", super_resolution_factor: 2, disable_preprocess_model: false, reload_restoration_model: false, background_upsampler_tile: 400, background_upsampler_tile_stride: 400 } } ); console.log(output);
To learn more, take a look at the guide on getting started with Node.js.
Install Replicate’s Python client library:pip install replicate
Set theREPLICATE_API_TOKEN
environment variable:export REPLICATE_API_TOKEN=<paste-your-token-here>
Find your API token in your account settings.
Import the client:import replicate
Run zsxkib/diffbir using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
output = replicate.run( "zsxkib/diffbir:51ed1464d8bbbaca811153b051d3b09ab42f0bdeb85804ae26ba323d7a66a4ac", input={ "seed": 231, "input": "https://replicate.delivery/pbxt/JgdLVwRXXl4oaGqmF4Wdl7vOapnTlay32dE7B3UNgxSwylvQ/Audrey_Hepburn.jpg", "steps": 50, "tiled": False, "tile_size": 512, "has_aligned": False, "tile_stride": 256, "repeat_times": 1, "use_guidance": False, "color_fix_type": "wavelet", "guidance_scale": 0, "guidance_space": "latent", "guidance_repeat": 5, "only_center_face": False, "guidance_time_stop": -1, "guidance_time_start": 1001, "background_upsampler": "DiffBIR", "face_detection_model": "retinaface_resnet50", "upscaling_model_type": "faces", "restoration_model_type": "general_scenes", "super_resolution_factor": 2, "disable_preprocess_model": False, "reload_restoration_model": False, "background_upsampler_tile": 400, "background_upsampler_tile_stride": 400 } ) print(output)
To learn more, take a look at the guide on getting started with Python.
Set theREPLICATE_API_TOKEN
environment variable:export REPLICATE_API_TOKEN=<paste-your-token-here>
Find your API token in your account settings.
Run zsxkib/diffbir using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
curl -s -X POST \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ -H "Content-Type: application/json" \ -H "Prefer: wait" \ -d $'{ "version": "51ed1464d8bbbaca811153b051d3b09ab42f0bdeb85804ae26ba323d7a66a4ac", "input": { "seed": 231, "input": "https://replicate.delivery/pbxt/JgdLVwRXXl4oaGqmF4Wdl7vOapnTlay32dE7B3UNgxSwylvQ/Audrey_Hepburn.jpg", "steps": 50, "tiled": false, "tile_size": 512, "has_aligned": false, "tile_stride": 256, "repeat_times": 1, "use_guidance": false, "color_fix_type": "wavelet", "guidance_scale": 0, "guidance_space": "latent", "guidance_repeat": 5, "only_center_face": false, "guidance_time_stop": -1, "guidance_time_start": 1001, "background_upsampler": "DiffBIR", "face_detection_model": "retinaface_resnet50", "upscaling_model_type": "faces", "restoration_model_type": "general_scenes", "super_resolution_factor": 2, "disable_preprocess_model": false, "reload_restoration_model": false, "background_upsampler_tile": 400, "background_upsampler_tile_stride": 400 } }' \ https://api.replicate.com/v1/predictions
To learn more, take a look at Replicate’s HTTP API reference docs.
You can run this model locally using Cog. First, install Cog:brew install cog
If you don’t have Homebrew, there are other installation options available.
Run this to download the model and run it in your local environment:
cog predict r8.im/zsxkib/diffbir@sha256:51ed1464d8bbbaca811153b051d3b09ab42f0bdeb85804ae26ba323d7a66a4ac \ -i 'seed=231' \ -i 'input="https://replicate.delivery/pbxt/JgdLVwRXXl4oaGqmF4Wdl7vOapnTlay32dE7B3UNgxSwylvQ/Audrey_Hepburn.jpg"' \ -i 'steps=50' \ -i 'tiled=false' \ -i 'tile_size=512' \ -i 'has_aligned=false' \ -i 'tile_stride=256' \ -i 'repeat_times=1' \ -i 'use_guidance=false' \ -i 'color_fix_type="wavelet"' \ -i 'guidance_scale=0' \ -i 'guidance_space="latent"' \ -i 'guidance_repeat=5' \ -i 'only_center_face=false' \ -i 'guidance_time_stop=-1' \ -i 'guidance_time_start=1001' \ -i 'background_upsampler="DiffBIR"' \ -i 'face_detection_model="retinaface_resnet50"' \ -i 'upscaling_model_type="faces"' \ -i 'restoration_model_type="general_scenes"' \ -i 'super_resolution_factor=2' \ -i 'disable_preprocess_model=false' \ -i 'reload_restoration_model=false' \ -i 'background_upsampler_tile=400' \ -i 'background_upsampler_tile_stride=400'
To learn more, take a look at the Cog documentation.
Run this to download the model and run it in your local environment:
docker run -d -p 5000:5000 --gpus=all r8.im/zsxkib/diffbir@sha256:51ed1464d8bbbaca811153b051d3b09ab42f0bdeb85804ae26ba323d7a66a4ac
curl -s -X POST \ -H "Content-Type: application/json" \ -d $'{ "input": { "seed": 231, "input": "https://replicate.delivery/pbxt/JgdLVwRXXl4oaGqmF4Wdl7vOapnTlay32dE7B3UNgxSwylvQ/Audrey_Hepburn.jpg", "steps": 50, "tiled": false, "tile_size": 512, "has_aligned": false, "tile_stride": 256, "repeat_times": 1, "use_guidance": false, "color_fix_type": "wavelet", "guidance_scale": 0, "guidance_space": "latent", "guidance_repeat": 5, "only_center_face": false, "guidance_time_stop": -1, "guidance_time_start": 1001, "background_upsampler": "DiffBIR", "face_detection_model": "retinaface_resnet50", "upscaling_model_type": "faces", "restoration_model_type": "general_scenes", "super_resolution_factor": 2, "disable_preprocess_model": false, "reload_restoration_model": false, "background_upsampler_tile": 400, "background_upsampler_tile_stride": 400 } }' \ http://localhost:5000/predictions
To learn more, take a look at the Cog documentation.
Output
We were unable to load these images. Please make sure the URLs are valid.
{ "input": "https://replicate.delivery/pbxt/JgdLVwRXXl4oaGqmF4Wdl7vOapnTlay32dE7B3UNgxSwylvQ/Audrey_Hepburn.jpg", "outut": "https://replicate.delivery/pbxt/NlSQp8BS4WLxL13eERn20OJzbMYfKpDx4usqAkywlgZY2ZtRA/tmpwg3l1z7wAudrey_Hepburn.png" }
{ "completed_at": "2023-10-12T12:50:32.973606Z", "created_at": "2023-10-12T12:49:17.922019Z", "data_removed": false, "error": null, "id": "bmqkjzlb4dtxi3plebcqg6sffu", "input": { "seed": 231, "input": "https://replicate.delivery/pbxt/JgdLVwRXXl4oaGqmF4Wdl7vOapnTlay32dE7B3UNgxSwylvQ/Audrey_Hepburn.jpg", "steps": 50, "tiled": false, "tile_size": 512, "has_aligned": false, "tile_stride": 256, "repeat_times": 1, "use_guidance": false, "color_fix_type": "wavelet", "guidance_scale": 0, "guidance_space": "latent", "guidance_repeat": 5, "only_center_face": false, "guidance_time_stop": -1, "guidance_time_start": 1001, "background_upsampler": "DiffBIR", "face_detection_model": "retinaface_resnet50", "upscaling_model_type": "faces", "restoration_model_type": "general_scenes", "super_resolution_factor": 2, "disable_preprocess_model": false, "reload_restoration_model": false, "background_upsampler_tile": 400, "background_upsampler_tile_stride": 400 }, "logs": "ckptckptckpt weights/face_full_v1.ckpt\nSwitching from mode 'FULL' to 'FACE'...\nBuilding and loading 'FACE' mode model...\nControlLDM: Running in eps-prediction mode\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.\nDiffusionWrapper has 865.91 M params.\nmaking attention of type 'vanilla-xformers' with 512 in_channels\nbuilding MemoryEfficientAttnBlock with 512 in_channels...\nWorking with z of shape (1, 4, 32, 32) = 4096 dimensions.\nmaking attention of type 'vanilla-xformers' with 512 in_channels\nbuilding MemoryEfficientAttnBlock with 512 in_channels...\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.\nSetting up [LPIPS] perceptual loss: trunk [alex], v[0.1], spatial [off]\nLoading model from: /root/.pyenv/versions/3.9.18/lib/python3.9/site-packages/lpips/weights/v0.1/alex.pth\nreload swinir model from weights/general_swinir_v1.ckpt\nENABLE XFORMERS!\nModel successfully switched to 'FACE' mode.\n{'bg_tile': 400,\n'bg_tile_stride': 400,\n'bg_upsampler': 'DiffBIR',\n'ckpt': 'weights/face_full_v1.ckpt',\n'color_fix_type': 'wavelet',\n'config': 'configs/model/cldm.yaml',\n'detection_model': 'retinaface_resnet50',\n'device': 'cuda',\n'disable_preprocess_model': False,\n'g_repeat': 5,\n'g_scale': 0.0,\n'g_space': 'latent',\n'g_t_start': 1001,\n'g_t_stop': -1,\n'has_aligned': False,\n'image_size': 512,\n'input': '/tmp/tmpwg3l1z7wAudrey_Hepburn.jpg',\n'only_center_face': False,\n'output': '.',\n'reload_swinir': False,\n'repeat_times': 1,\n'seed': 231,\n'show_lq': False,\n'skip_if_exist': False,\n'sr_scale': 2,\n'steps': 50,\n'swinir_ckpt': 'weights/general_swinir_v1.ckpt',\n'tile_size': 512,\n'tile_stride': 256,\n'tiled': False,\n'use_guidance': False}\nGlobal seed set to 231\n/root/.pyenv/versions/3.9.18/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=None`.\nwarnings.warn(msg)\nDownloading: \"https://github.com/xinntao/facexlib/releases/download/v0.1.0/detection_Resnet50_Final.pth\" to /root/.pyenv/versions/3.9.18/lib/python3.9/site-packages/facexlib/weights/detection_Resnet50_Final.pth\n 0%| | 0.00/104M [00:00<?, ?B/s]\n 37%|███▋ | 38.6M/104M [00:00<00:00, 405MB/s]\n 76%|███████▋ | 79.8M/104M [00:00<00:00, 421MB/s]\n100%|██████████| 104M/104M [00:00<00:00, 423MB/s]\nDownloading: \"https://github.com/xinntao/facexlib/releases/download/v0.2.2/parsing_parsenet.pth\" to /root/.pyenv/versions/3.9.18/lib/python3.9/site-packages/facexlib/weights/parsing_parsenet.pth\n 0%| | 0.00/81.4M [00:00<?, ?B/s]\n 37%|███▋ | 30.4M/81.4M [00:00<00:00, 319MB/s]\n 87%|████████▋ | 70.5M/81.4M [00:00<00:00, 378MB/s]\n100%|██████████| 81.4M/81.4M [00:00<00:00, 378MB/s]\nControlLDM: Running in eps-prediction mode\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.\nDiffusionWrapper has 865.91 M params.\nmaking attention of type 'vanilla-xformers' with 512 in_channels\nbuilding MemoryEfficientAttnBlock with 512 in_channels...\nWorking with z of shape (1, 4, 32, 32) = 4096 dimensions.\nmaking attention of type 'vanilla-xformers' with 512 in_channels\nbuilding MemoryEfficientAttnBlock with 512 in_channels...\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.\nSetting up [LPIPS] perceptual loss: trunk [alex], v[0.1], spatial [off]\nLoading model from: /root/.pyenv/versions/3.9.18/lib/python3.9/site-packages/lpips/weights/v0.1/alex.pth\nreload swinir model from weights/general_swinir_v1.ckpt\ntimesteps used in spaced sampler:\n[0, 20, 41, 61, 82, 102, 122, 143, 163, 183, 204, 224, 245, 265, 285, 306, 326, 347, 367, 387, 408, 428, 449, 469, 489, 510, 530, 550, 571, 591, 612, 632, 652, 673, 693, 714, 734, 754, 775, 795, 816, 836, 856, 877, 897, 917, 938, 958, 979, 999]\nSpaced Sampler: 0%| | 0/50 [00:00<?, ?it/s]\nSpaced Sampler: 2%|▏ | 1/50 [00:00<00:10, 4.82it/s]\nSpaced Sampler: 6%|▌ | 3/50 [00:00<00:05, 8.76it/s]\nSpaced Sampler: 10%|█ | 5/50 [00:00<00:04, 10.31it/s]\nSpaced Sampler: 14%|█▍ | 7/50 [00:00<00:03, 11.08it/s]\nSpaced Sampler: 18%|█▊ | 9/50 [00:00<00:03, 11.51it/s]\nSpaced Sampler: 22%|██▏ | 11/50 [00:01<00:03, 11.78it/s]\nSpaced Sampler: 26%|██▌ | 13/50 [00:01<00:03, 11.95it/s]\nSpaced Sampler: 30%|███ | 15/50 [00:01<00:02, 12.06it/s]\nSpaced Sampler: 34%|███▍ | 17/50 [00:01<00:02, 12.11it/s]\nSpaced Sampler: 38%|███▊ | 19/50 [00:01<00:02, 12.16it/s]\nSpaced Sampler: 42%|████▏ | 21/50 [00:01<00:02, 12.20it/s]\nSpaced Sampler: 46%|████▌ | 23/50 [00:01<00:02, 12.23it/s]\nSpaced Sampler: 50%|█████ | 25/50 [00:02<00:02, 12.25it/s]\nSpaced Sampler: 54%|█████▍ | 27/50 [00:02<00:01, 12.27it/s]\nSpaced Sampler: 58%|█████▊ | 29/50 [00:02<00:01, 12.26it/s]\nSpaced Sampler: 62%|██████▏ | 31/50 [00:02<00:01, 12.25it/s]\nSpaced Sampler: 66%|██████▌ | 33/50 [00:02<00:01, 12.25it/s]\nSpaced Sampler: 70%|███████ | 35/50 [00:02<00:01, 12.26it/s]\nSpaced Sampler: 74%|███████▍ | 37/50 [00:03<00:01, 12.27it/s]\nSpaced Sampler: 78%|███████▊ | 39/50 [00:03<00:00, 12.27it/s]\nSpaced Sampler: 82%|████████▏ | 41/50 [00:03<00:00, 12.26it/s]\nSpaced Sampler: 86%|████████▌ | 43/50 [00:03<00:00, 12.24it/s]\nSpaced Sampler: 90%|█████████ | 45/50 [00:03<00:00, 12.24it/s]\nSpaced Sampler: 94%|█████████▍| 47/50 [00:03<00:00, 12.24it/s]\nSpaced Sampler: 98%|█████████▊| 49/50 [00:04<00:00, 12.25it/s]\nSpaced Sampler: 100%|██████████| 50/50 [00:04<00:00, 11.91it/s]\nupsampling the background image using DiffBIR...\ntimesteps used in spaced sampler:\n[0, 20, 41, 61, 82, 102, 122, 143, 163, 183, 204, 224, 245, 265, 285, 306, 326, 347, 367, 387, 408, 428, 449, 469, 489, 510, 530, 550, 571, 591, 612, 632, 652, 673, 693, 714, 734, 754, 775, 795, 816, 836, 856, 877, 897, 917, 938, 958, 979, 999]\nSpaced Sampler: 0%| | 0/50 [00:00<?, ?it/s]\nSpaced Sampler: 2%|▏ | 1/50 [00:00<00:44, 1.11it/s]\nSpaced Sampler: 4%|▍ | 2/50 [00:01<00:28, 1.67it/s]\nSpaced Sampler: 6%|▌ | 3/50 [00:01<00:23, 1.98it/s]\nSpaced Sampler: 8%|▊ | 4/50 [00:02<00:21, 2.18it/s]\nSpaced Sampler: 10%|█ | 5/50 [00:02<00:19, 2.30it/s]\nSpaced Sampler: 12%|█▏ | 6/50 [00:02<00:18, 2.38it/s]\nSpaced Sampler: 14%|█▍ | 7/50 [00:03<00:17, 2.44it/s]\nSpaced Sampler: 16%|█▌ | 8/50 [00:03<00:16, 2.48it/s]\nSpaced Sampler: 18%|█▊ | 9/50 [00:04<00:16, 2.51it/s]\nSpaced Sampler: 20%|██ | 10/50 [00:04<00:15, 2.53it/s]\nSpaced Sampler: 22%|██▏ | 11/50 [00:04<00:15, 2.54it/s]\nSpaced Sampler: 24%|██▍ | 12/50 [00:05<00:14, 2.55it/s]\nSpaced Sampler: 26%|██▌ | 13/50 [00:05<00:14, 2.55it/s]\nSpaced Sampler: 28%|██▊ | 14/50 [00:05<00:14, 2.56it/s]\nSpaced Sampler: 30%|███ | 15/50 [00:06<00:13, 2.56it/s]\nSpaced Sampler: 32%|███▏ | 16/50 [00:06<00:13, 2.56it/s]\nSpaced Sampler: 34%|███▍ | 17/50 [00:07<00:12, 2.56it/s]\nSpaced Sampler: 36%|███▌ | 18/50 [00:07<00:12, 2.56it/s]\nSpaced Sampler: 38%|███▊ | 19/50 [00:07<00:12, 2.56it/s]\nSpaced Sampler: 40%|████ | 20/50 [00:08<00:11, 2.56it/s]\nSpaced Sampler: 42%|████▏ | 21/50 [00:08<00:11, 2.56it/s]\nSpaced Sampler: 44%|████▍ | 22/50 [00:09<00:10, 2.56it/s]\nSpaced Sampler: 46%|████▌ | 23/50 [00:09<00:10, 2.56it/s]\nSpaced Sampler: 48%|████▊ | 24/50 [00:09<00:10, 2.56it/s]\nSpaced Sampler: 50%|█████ | 25/50 [00:10<00:09, 2.56it/s]\nSpaced Sampler: 52%|█████▏ | 26/50 [00:10<00:09, 2.56it/s]\nSpaced Sampler: 54%|█████▍ | 27/50 [00:11<00:08, 2.56it/s]\nSpaced Sampler: 56%|█████▌ | 28/50 [00:11<00:08, 2.56it/s]\nSpaced Sampler: 58%|█████▊ | 29/50 [00:11<00:08, 2.56it/s]\nSpaced Sampler: 60%|██████ | 30/50 [00:12<00:07, 2.56it/s]\nSpaced Sampler: 62%|██████▏ | 31/50 [00:12<00:07, 2.56it/s]\nSpaced Sampler: 64%|██████▍ | 32/50 [00:12<00:07, 2.56it/s]\nSpaced Sampler: 66%|██████▌ | 33/50 [00:13<00:06, 2.56it/s]\nSpaced Sampler: 68%|██████▊ | 34/50 [00:13<00:06, 2.56it/s]\nSpaced Sampler: 70%|███████ | 35/50 [00:14<00:05, 2.56it/s]\nSpaced Sampler: 72%|███████▏ | 36/50 [00:14<00:05, 2.55it/s]\nSpaced Sampler: 74%|███████▍ | 37/50 [00:14<00:05, 2.56it/s]\nSpaced Sampler: 76%|███████▌ | 38/50 [00:15<00:04, 2.55it/s]\nSpaced Sampler: 78%|███████▊ | 39/50 [00:15<00:04, 2.55it/s]\nSpaced Sampler: 80%|████████ | 40/50 [00:16<00:03, 2.55it/s]\nSpaced Sampler: 82%|████████▏ | 41/50 [00:16<00:03, 2.55it/s]\nSpaced Sampler: 84%|████████▍ | 42/50 [00:16<00:03, 2.55it/s]\nSpaced Sampler: 86%|████████▌ | 43/50 [00:17<00:02, 2.55it/s]\nSpaced Sampler: 88%|████████▊ | 44/50 [00:17<00:02, 2.55it/s]\nSpaced Sampler: 90%|█████████ | 45/50 [00:18<00:01, 2.55it/s]\nSpaced Sampler: 92%|█████████▏| 46/50 [00:18<00:01, 2.55it/s]\nSpaced Sampler: 94%|█████████▍| 47/50 [00:18<00:01, 2.55it/s]\nSpaced Sampler: 96%|█████████▌| 48/50 [00:19<00:00, 2.55it/s]\nSpaced Sampler: 98%|█████████▊| 49/50 [00:19<00:00, 2.55it/s]\nSpaced Sampler: 100%|██████████| 50/50 [00:20<00:00, 2.55it/s]\nSpaced Sampler: 100%|██████████| 50/50 [00:20<00:00, 2.49it/s]\nFace image tmpwg3l1z7wAudrey_Hepburn saved to ./..", "metrics": { "predict_time": 73.053379, "total_time": 75.051587 }, "output": [ "https://replicate.delivery/pbxt/3hkSakaS9qpMPxMMmfjYdr8ZLRRKiUkGwdNIlS0r5bcM7s2IA/tmpwg3l1z7wAudrey_Hepburn_00.png", "https://replicate.delivery/pbxt/boZvG5okpewhK6FY1YUeO2sehoy1FJoXW9IxBPifL42iZn1GB/tmpwg3l1z7wAudrey_Hepburn_00.png", "https://replicate.delivery/pbxt/NlSQp8BS4WLxL13eERn20OJzbMYfKpDx4usqAkywlgZY2ZtRA/tmpwg3l1z7wAudrey_Hepburn.png" ], "started_at": "2023-10-12T12:49:19.920227Z", "status": "succeeded", "urls": { "get": "https://api.replicate.com/v1/predictions/bmqkjzlb4dtxi3plebcqg6sffu", "cancel": "https://api.replicate.com/v1/predictions/bmqkjzlb4dtxi3plebcqg6sffu/cancel" }, "version": "51ed1464d8bbbaca811153b051d3b09ab42f0bdeb85804ae26ba323d7a66a4ac" }
Generated inckptckptckpt weights/face_full_v1.ckpt Switching from mode 'FULL' to 'FACE'... Building and loading 'FACE' mode model... ControlLDM: Running in eps-prediction mode Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. DiffusionWrapper has 865.91 M params. making attention of type 'vanilla-xformers' with 512 in_channels building MemoryEfficientAttnBlock with 512 in_channels... Working with z of shape (1, 4, 32, 32) = 4096 dimensions. making attention of type 'vanilla-xformers' with 512 in_channels building MemoryEfficientAttnBlock with 512 in_channels... Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up [LPIPS] perceptual loss: trunk [alex], v[0.1], spatial [off] Loading model from: /root/.pyenv/versions/3.9.18/lib/python3.9/site-packages/lpips/weights/v0.1/alex.pth reload swinir model from weights/general_swinir_v1.ckpt ENABLE XFORMERS! Model successfully switched to 'FACE' mode. {'bg_tile': 400, 'bg_tile_stride': 400, 'bg_upsampler': 'DiffBIR', 'ckpt': 'weights/face_full_v1.ckpt', 'color_fix_type': 'wavelet', 'config': 'configs/model/cldm.yaml', 'detection_model': 'retinaface_resnet50', 'device': 'cuda', 'disable_preprocess_model': False, 'g_repeat': 5, 'g_scale': 0.0, 'g_space': 'latent', 'g_t_start': 1001, 'g_t_stop': -1, 'has_aligned': False, 'image_size': 512, 'input': '/tmp/tmpwg3l1z7wAudrey_Hepburn.jpg', 'only_center_face': False, 'output': '.', 'reload_swinir': False, 'repeat_times': 1, 'seed': 231, 'show_lq': False, 'skip_if_exist': False, 'sr_scale': 2, 'steps': 50, 'swinir_ckpt': 'weights/general_swinir_v1.ckpt', 'tile_size': 512, 'tile_stride': 256, 'tiled': False, 'use_guidance': False} Global seed set to 231 /root/.pyenv/versions/3.9.18/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=None`. warnings.warn(msg) Downloading: "https://github.com/xinntao/facexlib/releases/download/v0.1.0/detection_Resnet50_Final.pth" to /root/.pyenv/versions/3.9.18/lib/python3.9/site-packages/facexlib/weights/detection_Resnet50_Final.pth 0%| | 0.00/104M [00:00<?, ?B/s] 37%|███▋ | 38.6M/104M [00:00<00:00, 405MB/s] 76%|███████▋ | 79.8M/104M [00:00<00:00, 421MB/s] 100%|██████████| 104M/104M [00:00<00:00, 423MB/s] Downloading: "https://github.com/xinntao/facexlib/releases/download/v0.2.2/parsing_parsenet.pth" to /root/.pyenv/versions/3.9.18/lib/python3.9/site-packages/facexlib/weights/parsing_parsenet.pth 0%| | 0.00/81.4M [00:00<?, ?B/s] 37%|███▋ | 30.4M/81.4M [00:00<00:00, 319MB/s] 87%|████████▋ | 70.5M/81.4M [00:00<00:00, 378MB/s] 100%|██████████| 81.4M/81.4M [00:00<00:00, 378MB/s] ControlLDM: Running in eps-prediction mode Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. DiffusionWrapper has 865.91 M params. making attention of type 'vanilla-xformers' with 512 in_channels building MemoryEfficientAttnBlock with 512 in_channels... Working with z of shape (1, 4, 32, 32) = 4096 dimensions. making attention of type 'vanilla-xformers' with 512 in_channels building MemoryEfficientAttnBlock with 512 in_channels... Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up [LPIPS] perceptual loss: trunk [alex], v[0.1], spatial [off] Loading model from: /root/.pyenv/versions/3.9.18/lib/python3.9/site-packages/lpips/weights/v0.1/alex.pth reload swinir model from weights/general_swinir_v1.ckpt timesteps used in spaced sampler: [0, 20, 41, 61, 82, 102, 122, 143, 163, 183, 204, 224, 245, 265, 285, 306, 326, 347, 367, 387, 408, 428, 449, 469, 489, 510, 530, 550, 571, 591, 612, 632, 652, 673, 693, 714, 734, 754, 775, 795, 816, 836, 856, 877, 897, 917, 938, 958, 979, 999] Spaced Sampler: 0%| | 0/50 [00:00<?, ?it/s] Spaced Sampler: 2%|▏ | 1/50 [00:00<00:10, 4.82it/s] Spaced Sampler: 6%|▌ | 3/50 [00:00<00:05, 8.76it/s] Spaced Sampler: 10%|█ | 5/50 [00:00<00:04, 10.31it/s] Spaced Sampler: 14%|█▍ | 7/50 [00:00<00:03, 11.08it/s] Spaced Sampler: 18%|█▊ | 9/50 [00:00<00:03, 11.51it/s] Spaced Sampler: 22%|██▏ | 11/50 [00:01<00:03, 11.78it/s] Spaced Sampler: 26%|██▌ | 13/50 [00:01<00:03, 11.95it/s] Spaced Sampler: 30%|███ | 15/50 [00:01<00:02, 12.06it/s] Spaced Sampler: 34%|███▍ | 17/50 [00:01<00:02, 12.11it/s] Spaced Sampler: 38%|███▊ | 19/50 [00:01<00:02, 12.16it/s] Spaced Sampler: 42%|████▏ | 21/50 [00:01<00:02, 12.20it/s] Spaced Sampler: 46%|████▌ | 23/50 [00:01<00:02, 12.23it/s] Spaced Sampler: 50%|█████ | 25/50 [00:02<00:02, 12.25it/s] Spaced Sampler: 54%|█████▍ | 27/50 [00:02<00:01, 12.27it/s] Spaced Sampler: 58%|█████▊ | 29/50 [00:02<00:01, 12.26it/s] Spaced Sampler: 62%|██████▏ | 31/50 [00:02<00:01, 12.25it/s] Spaced Sampler: 66%|██████▌ | 33/50 [00:02<00:01, 12.25it/s] Spaced Sampler: 70%|███████ | 35/50 [00:02<00:01, 12.26it/s] Spaced Sampler: 74%|███████▍ | 37/50 [00:03<00:01, 12.27it/s] Spaced Sampler: 78%|███████▊ | 39/50 [00:03<00:00, 12.27it/s] Spaced Sampler: 82%|████████▏ | 41/50 [00:03<00:00, 12.26it/s] Spaced Sampler: 86%|████████▌ | 43/50 [00:03<00:00, 12.24it/s] Spaced Sampler: 90%|█████████ | 45/50 [00:03<00:00, 12.24it/s] Spaced Sampler: 94%|█████████▍| 47/50 [00:03<00:00, 12.24it/s] Spaced Sampler: 98%|█████████▊| 49/50 [00:04<00:00, 12.25it/s] Spaced Sampler: 100%|██████████| 50/50 [00:04<00:00, 11.91it/s] upsampling the background image using DiffBIR... timesteps used in spaced sampler: [0, 20, 41, 61, 82, 102, 122, 143, 163, 183, 204, 224, 245, 265, 285, 306, 326, 347, 367, 387, 408, 428, 449, 469, 489, 510, 530, 550, 571, 591, 612, 632, 652, 673, 693, 714, 734, 754, 775, 795, 816, 836, 856, 877, 897, 917, 938, 958, 979, 999] Spaced Sampler: 0%| | 0/50 [00:00<?, ?it/s] Spaced Sampler: 2%|▏ | 1/50 [00:00<00:44, 1.11it/s] Spaced Sampler: 4%|▍ | 2/50 [00:01<00:28, 1.67it/s] Spaced Sampler: 6%|▌ | 3/50 [00:01<00:23, 1.98it/s] Spaced Sampler: 8%|▊ | 4/50 [00:02<00:21, 2.18it/s] Spaced Sampler: 10%|█ | 5/50 [00:02<00:19, 2.30it/s] Spaced Sampler: 12%|█▏ | 6/50 [00:02<00:18, 2.38it/s] Spaced Sampler: 14%|█▍ | 7/50 [00:03<00:17, 2.44it/s] Spaced Sampler: 16%|█▌ | 8/50 [00:03<00:16, 2.48it/s] Spaced Sampler: 18%|█▊ | 9/50 [00:04<00:16, 2.51it/s] Spaced Sampler: 20%|██ | 10/50 [00:04<00:15, 2.53it/s] Spaced Sampler: 22%|██▏ | 11/50 [00:04<00:15, 2.54it/s] Spaced Sampler: 24%|██▍ | 12/50 [00:05<00:14, 2.55it/s] Spaced Sampler: 26%|██▌ | 13/50 [00:05<00:14, 2.55it/s] Spaced Sampler: 28%|██▊ | 14/50 [00:05<00:14, 2.56it/s] Spaced Sampler: 30%|███ | 15/50 [00:06<00:13, 2.56it/s] Spaced Sampler: 32%|███▏ | 16/50 [00:06<00:13, 2.56it/s] Spaced Sampler: 34%|███▍ | 17/50 [00:07<00:12, 2.56it/s] Spaced Sampler: 36%|███▌ | 18/50 [00:07<00:12, 2.56it/s] Spaced Sampler: 38%|███▊ | 19/50 [00:07<00:12, 2.56it/s] Spaced Sampler: 40%|████ | 20/50 [00:08<00:11, 2.56it/s] Spaced Sampler: 42%|████▏ | 21/50 [00:08<00:11, 2.56it/s] Spaced Sampler: 44%|████▍ | 22/50 [00:09<00:10, 2.56it/s] Spaced Sampler: 46%|████▌ | 23/50 [00:09<00:10, 2.56it/s] Spaced Sampler: 48%|████▊ | 24/50 [00:09<00:10, 2.56it/s] Spaced Sampler: 50%|█████ | 25/50 [00:10<00:09, 2.56it/s] Spaced Sampler: 52%|█████▏ | 26/50 [00:10<00:09, 2.56it/s] Spaced Sampler: 54%|█████▍ | 27/50 [00:11<00:08, 2.56it/s] Spaced Sampler: 56%|█████▌ | 28/50 [00:11<00:08, 2.56it/s] Spaced Sampler: 58%|█████▊ | 29/50 [00:11<00:08, 2.56it/s] Spaced Sampler: 60%|██████ | 30/50 [00:12<00:07, 2.56it/s] Spaced Sampler: 62%|██████▏ | 31/50 [00:12<00:07, 2.56it/s] Spaced Sampler: 64%|██████▍ | 32/50 [00:12<00:07, 2.56it/s] Spaced Sampler: 66%|██████▌ | 33/50 [00:13<00:06, 2.56it/s] Spaced Sampler: 68%|██████▊ | 34/50 [00:13<00:06, 2.56it/s] Spaced Sampler: 70%|███████ | 35/50 [00:14<00:05, 2.56it/s] Spaced Sampler: 72%|███████▏ | 36/50 [00:14<00:05, 2.55it/s] Spaced Sampler: 74%|███████▍ | 37/50 [00:14<00:05, 2.56it/s] Spaced Sampler: 76%|███████▌ | 38/50 [00:15<00:04, 2.55it/s] Spaced Sampler: 78%|███████▊ | 39/50 [00:15<00:04, 2.55it/s] Spaced Sampler: 80%|████████ | 40/50 [00:16<00:03, 2.55it/s] Spaced Sampler: 82%|████████▏ | 41/50 [00:16<00:03, 2.55it/s] Spaced Sampler: 84%|████████▍ | 42/50 [00:16<00:03, 2.55it/s] Spaced Sampler: 86%|████████▌ | 43/50 [00:17<00:02, 2.55it/s] Spaced Sampler: 88%|████████▊ | 44/50 [00:17<00:02, 2.55it/s] Spaced Sampler: 90%|█████████ | 45/50 [00:18<00:01, 2.55it/s] Spaced Sampler: 92%|█████████▏| 46/50 [00:18<00:01, 2.55it/s] Spaced Sampler: 94%|█████████▍| 47/50 [00:18<00:01, 2.55it/s] Spaced Sampler: 96%|█████████▌| 48/50 [00:19<00:00, 2.55it/s] Spaced Sampler: 98%|█████████▊| 49/50 [00:19<00:00, 2.55it/s] Spaced Sampler: 100%|██████████| 50/50 [00:20<00:00, 2.55it/s] Spaced Sampler: 100%|██████████| 50/50 [00:20<00:00, 2.49it/s] Face image tmpwg3l1z7wAudrey_Hepburn saved to ./..
Prediction
zsxkib/diffbir:51ed1464IDy3pp3vdb7h6jyccilsfukcundiStatusSucceededSourceWebHardwareA40 (Large)Total durationCreatedby @zsxkibInput
- seed
- 231
- steps
- 50
- tiled
- tile_size
- 512
- has_aligned
- tile_stride
- 256
- repeat_times
- 1
- use_guidance
- color_fix_type
- wavelet
- guidance_scale
- 0
- guidance_space
- latent
- guidance_repeat
- 5
- only_center_face
- guidance_time_stop
- -1
- guidance_time_start
- 1001
- background_upsampler
- RealESRGAN
- face_detection_model
- retinaface_resnet50
- upscaling_model_type
- general_scenes
- restoration_model_type
- general_scenes
- super_resolution_factor
- 4
- disable_preprocess_model
- reload_restoration_model
- background_upsampler_tile
- 400
- background_upsampler_tile_stride
- 400
{ "seed": 231, "input": "https://replicate.delivery/pbxt/JgdU22O42XDQO6PmXbkzwiYqBWui2hxdG8TATTNgGpBG78E7/49.jpg", "steps": 50, "tiled": false, "tile_size": 512, "has_aligned": false, "tile_stride": 256, "repeat_times": 1, "use_guidance": false, "color_fix_type": "wavelet", "guidance_scale": 0, "guidance_space": "latent", "guidance_repeat": 5, "only_center_face": false, "guidance_time_stop": -1, "guidance_time_start": 1001, "background_upsampler": "RealESRGAN", "face_detection_model": "retinaface_resnet50", "upscaling_model_type": "general_scenes", "restoration_model_type": "general_scenes", "super_resolution_factor": 4, "disable_preprocess_model": false, "reload_restoration_model": false, "background_upsampler_tile": 400, "background_upsampler_tile_stride": 400 }
Install Replicate’s Node.js client library:npm install replicate
Set theREPLICATE_API_TOKEN
environment variable:export REPLICATE_API_TOKEN=<paste-your-token-here>
Find your API token in your account settings.
Import and set up the client:import Replicate from "replicate"; const replicate = new Replicate({ auth: process.env.REPLICATE_API_TOKEN, });
Run zsxkib/diffbir using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
const output = await replicate.run( "zsxkib/diffbir:51ed1464d8bbbaca811153b051d3b09ab42f0bdeb85804ae26ba323d7a66a4ac", { input: { seed: 231, input: "https://replicate.delivery/pbxt/JgdU22O42XDQO6PmXbkzwiYqBWui2hxdG8TATTNgGpBG78E7/49.jpg", steps: 50, tiled: false, tile_size: 512, has_aligned: false, tile_stride: 256, repeat_times: 1, use_guidance: false, color_fix_type: "wavelet", guidance_scale: 0, guidance_space: "latent", guidance_repeat: 5, only_center_face: false, guidance_time_stop: -1, guidance_time_start: 1001, background_upsampler: "RealESRGAN", face_detection_model: "retinaface_resnet50", upscaling_model_type: "general_scenes", restoration_model_type: "general_scenes", super_resolution_factor: 4, disable_preprocess_model: false, reload_restoration_model: false, background_upsampler_tile: 400, background_upsampler_tile_stride: 400 } } ); console.log(output);
To learn more, take a look at the guide on getting started with Node.js.
Install Replicate’s Python client library:pip install replicate
Set theREPLICATE_API_TOKEN
environment variable:export REPLICATE_API_TOKEN=<paste-your-token-here>
Find your API token in your account settings.
Import the client:import replicate
Run zsxkib/diffbir using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
output = replicate.run( "zsxkib/diffbir:51ed1464d8bbbaca811153b051d3b09ab42f0bdeb85804ae26ba323d7a66a4ac", input={ "seed": 231, "input": "https://replicate.delivery/pbxt/JgdU22O42XDQO6PmXbkzwiYqBWui2hxdG8TATTNgGpBG78E7/49.jpg", "steps": 50, "tiled": False, "tile_size": 512, "has_aligned": False, "tile_stride": 256, "repeat_times": 1, "use_guidance": False, "color_fix_type": "wavelet", "guidance_scale": 0, "guidance_space": "latent", "guidance_repeat": 5, "only_center_face": False, "guidance_time_stop": -1, "guidance_time_start": 1001, "background_upsampler": "RealESRGAN", "face_detection_model": "retinaface_resnet50", "upscaling_model_type": "general_scenes", "restoration_model_type": "general_scenes", "super_resolution_factor": 4, "disable_preprocess_model": False, "reload_restoration_model": False, "background_upsampler_tile": 400, "background_upsampler_tile_stride": 400 } ) print(output)
To learn more, take a look at the guide on getting started with Python.
Set theREPLICATE_API_TOKEN
environment variable:export REPLICATE_API_TOKEN=<paste-your-token-here>
Find your API token in your account settings.
Run zsxkib/diffbir using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
curl -s -X POST \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ -H "Content-Type: application/json" \ -H "Prefer: wait" \ -d $'{ "version": "51ed1464d8bbbaca811153b051d3b09ab42f0bdeb85804ae26ba323d7a66a4ac", "input": { "seed": 231, "input": "https://replicate.delivery/pbxt/JgdU22O42XDQO6PmXbkzwiYqBWui2hxdG8TATTNgGpBG78E7/49.jpg", "steps": 50, "tiled": false, "tile_size": 512, "has_aligned": false, "tile_stride": 256, "repeat_times": 1, "use_guidance": false, "color_fix_type": "wavelet", "guidance_scale": 0, "guidance_space": "latent", "guidance_repeat": 5, "only_center_face": false, "guidance_time_stop": -1, "guidance_time_start": 1001, "background_upsampler": "RealESRGAN", "face_detection_model": "retinaface_resnet50", "upscaling_model_type": "general_scenes", "restoration_model_type": "general_scenes", "super_resolution_factor": 4, "disable_preprocess_model": false, "reload_restoration_model": false, "background_upsampler_tile": 400, "background_upsampler_tile_stride": 400 } }' \ https://api.replicate.com/v1/predictions
To learn more, take a look at Replicate’s HTTP API reference docs.
You can run this model locally using Cog. First, install Cog:brew install cog
If you don’t have Homebrew, there are other installation options available.
Run this to download the model and run it in your local environment:
cog predict r8.im/zsxkib/diffbir@sha256:51ed1464d8bbbaca811153b051d3b09ab42f0bdeb85804ae26ba323d7a66a4ac \ -i 'seed=231' \ -i 'input="https://replicate.delivery/pbxt/JgdU22O42XDQO6PmXbkzwiYqBWui2hxdG8TATTNgGpBG78E7/49.jpg"' \ -i 'steps=50' \ -i 'tiled=false' \ -i 'tile_size=512' \ -i 'has_aligned=false' \ -i 'tile_stride=256' \ -i 'repeat_times=1' \ -i 'use_guidance=false' \ -i 'color_fix_type="wavelet"' \ -i 'guidance_scale=0' \ -i 'guidance_space="latent"' \ -i 'guidance_repeat=5' \ -i 'only_center_face=false' \ -i 'guidance_time_stop=-1' \ -i 'guidance_time_start=1001' \ -i 'background_upsampler="RealESRGAN"' \ -i 'face_detection_model="retinaface_resnet50"' \ -i 'upscaling_model_type="general_scenes"' \ -i 'restoration_model_type="general_scenes"' \ -i 'super_resolution_factor=4' \ -i 'disable_preprocess_model=false' \ -i 'reload_restoration_model=false' \ -i 'background_upsampler_tile=400' \ -i 'background_upsampler_tile_stride=400'
To learn more, take a look at the Cog documentation.
Run this to download the model and run it in your local environment:
docker run -d -p 5000:5000 --gpus=all r8.im/zsxkib/diffbir@sha256:51ed1464d8bbbaca811153b051d3b09ab42f0bdeb85804ae26ba323d7a66a4ac
curl -s -X POST \ -H "Content-Type: application/json" \ -d $'{ "input": { "seed": 231, "input": "https://replicate.delivery/pbxt/JgdU22O42XDQO6PmXbkzwiYqBWui2hxdG8TATTNgGpBG78E7/49.jpg", "steps": 50, "tiled": false, "tile_size": 512, "has_aligned": false, "tile_stride": 256, "repeat_times": 1, "use_guidance": false, "color_fix_type": "wavelet", "guidance_scale": 0, "guidance_space": "latent", "guidance_repeat": 5, "only_center_face": false, "guidance_time_stop": -1, "guidance_time_start": 1001, "background_upsampler": "RealESRGAN", "face_detection_model": "retinaface_resnet50", "upscaling_model_type": "general_scenes", "restoration_model_type": "general_scenes", "super_resolution_factor": 4, "disable_preprocess_model": false, "reload_restoration_model": false, "background_upsampler_tile": 400, "background_upsampler_tile_stride": 400 } }' \ http://localhost:5000/predictions
To learn more, take a look at the Cog documentation.
Output
We were unable to load these images. Please make sure the URLs are valid.
{ "input": "https://replicate.delivery/pbxt/JgdU22O42XDQO6PmXbkzwiYqBWui2hxdG8TATTNgGpBG78E7/49.jpg", "outut": "https://replicate.delivery/pbxt/LHdTSK0QrV5UHZ8dfbq9F4vBCGjR0O3ZHCugtVdJ6v75eZtRA/._0.png" }
{ "completed_at": "2023-10-12T12:58:27.796509Z", "created_at": "2023-10-12T12:58:16.320003Z", "data_removed": false, "error": null, "id": "y3pp3vdb7h6jyccilsfukcundi", "input": { "seed": 231, "input": "https://replicate.delivery/pbxt/JgdU22O42XDQO6PmXbkzwiYqBWui2hxdG8TATTNgGpBG78E7/49.jpg", "steps": 50, "tiled": false, "tile_size": 512, "has_aligned": false, "tile_stride": 256, "repeat_times": 1, "use_guidance": false, "color_fix_type": "wavelet", "guidance_scale": 0, "guidance_space": "latent", "guidance_repeat": 5, "only_center_face": false, "guidance_time_stop": -1, "guidance_time_start": 1001, "background_upsampler": "RealESRGAN", "face_detection_model": "retinaface_resnet50", "upscaling_model_type": "general_scenes", "restoration_model_type": "general_scenes", "super_resolution_factor": 4, "disable_preprocess_model": false, "reload_restoration_model": false, "background_upsampler_tile": 400, "background_upsampler_tile_stride": 400 }, "logs": "ckptckptckpt weights/general_full_v1.ckpt\nSwitching from mode 'FACE' to 'FULL'...\nLoading 'FULL' mode model...\nFreezing the 'FULL' mode model and moving to the desired device...\nENABLE XFORMERS!\nModel successfully switched to 'FULL' mode.\n{'bg_tile': 400,\n'bg_tile_stride': 200,\n'bg_upsampler': 'RealESRGAN',\n'ckpt': 'weights/general_full_v1.ckpt',\n'color_fix_type': 'wavelet',\n'config': 'configs/model/cldm.yaml',\n'detection_model': 'retinaface_resnet50',\n'device': 'cuda',\n'disable_preprocess_model': False,\n'g_repeat': 5,\n'g_scale': 0.0,\n'g_space': 'latent',\n'g_t_start': 1001,\n'g_t_stop': -1,\n'has_aligned': False,\n'image_size': 512,\n'input': '/tmp/tmpl7qh3cnr49.jpg',\n'only_center_face': False,\n'output': '.',\n'reload_swinir': False,\n'repeat_times': 1,\n'seed': 231,\n'show_lq': False,\n'skip_if_exist': False,\n'sr_scale': 4,\n'steps': 50,\n'swinir_ckpt': 'weights/general_swinir_v1.ckpt',\n'tile_size': 512,\n'tile_stride': 256,\n'tiled': False,\n'use_guidance': False}\nGlobal seed set to 231\ntimesteps used in spaced sampler:\n[0, 20, 41, 61, 82, 102, 122, 143, 163, 183, 204, 224, 245, 265, 285, 306, 326, 347, 367, 387, 408, 428, 449, 469, 489, 510, 530, 550, 571, 591, 612, 632, 652, 673, 693, 714, 734, 754, 775, 795, 816, 836, 856, 877, 897, 917, 938, 958, 979, 999]\nSpaced Sampler: 0%| | 0/50 [00:00<?, ?it/s]\nSpaced Sampler: 2%|▏ | 1/50 [00:00<00:14, 3.33it/s]\nSpaced Sampler: 4%|▍ | 2/50 [00:00<00:09, 5.20it/s]\nSpaced Sampler: 6%|▌ | 3/50 [00:00<00:07, 6.34it/s]\nSpaced Sampler: 8%|▊ | 4/50 [00:00<00:06, 7.06it/s]\nSpaced Sampler: 10%|█ | 5/50 [00:00<00:05, 7.53it/s]\nSpaced Sampler: 12%|█▏ | 6/50 [00:00<00:05, 7.84it/s]\nSpaced Sampler: 14%|█▍ | 7/50 [00:01<00:05, 8.06it/s]\nSpaced Sampler: 16%|█▌ | 8/50 [00:01<00:05, 8.20it/s]\nSpaced Sampler: 18%|█▊ | 9/50 [00:01<00:04, 8.30it/s]\nSpaced Sampler: 20%|██ | 10/50 [00:01<00:04, 8.37it/s]\nSpaced Sampler: 22%|██▏ | 11/50 [00:01<00:04, 8.43it/s]\nSpaced Sampler: 24%|██▍ | 12/50 [00:01<00:04, 8.46it/s]\nSpaced Sampler: 26%|██▌ | 13/50 [00:01<00:04, 8.47it/s]\nSpaced Sampler: 28%|██▊ | 14/50 [00:01<00:04, 8.49it/s]\nSpaced Sampler: 30%|███ | 15/50 [00:01<00:04, 8.50it/s]\nSpaced Sampler: 32%|███▏ | 16/50 [00:02<00:03, 8.51it/s]\nSpaced Sampler: 34%|███▍ | 17/50 [00:02<00:03, 8.51it/s]\nSpaced Sampler: 36%|███▌ | 18/50 [00:02<00:03, 8.51it/s]\nSpaced Sampler: 38%|███▊ | 19/50 [00:02<00:03, 8.52it/s]\nSpaced Sampler: 40%|████ | 20/50 [00:02<00:03, 8.52it/s]\nSpaced Sampler: 42%|████▏ | 21/50 [00:02<00:03, 8.52it/s]\nSpaced Sampler: 44%|████▍ | 22/50 [00:02<00:03, 8.52it/s]\nSpaced Sampler: 46%|████▌ | 23/50 [00:02<00:03, 8.53it/s]\nSpaced Sampler: 48%|████▊ | 24/50 [00:02<00:03, 8.51it/s]\nSpaced Sampler: 50%|█████ | 25/50 [00:03<00:02, 8.51it/s]\nSpaced Sampler: 52%|█████▏ | 26/50 [00:03<00:02, 8.51it/s]\nSpaced Sampler: 54%|█████▍ | 27/50 [00:03<00:02, 8.51it/s]\nSpaced Sampler: 56%|█████▌ | 28/50 [00:03<00:02, 8.50it/s]\nSpaced Sampler: 58%|█████▊ | 29/50 [00:03<00:02, 8.50it/s]\nSpaced Sampler: 60%|██████ | 30/50 [00:03<00:02, 8.51it/s]\nSpaced Sampler: 62%|██████▏ | 31/50 [00:03<00:02, 8.50it/s]\nSpaced Sampler: 64%|██████▍ | 32/50 [00:03<00:02, 8.50it/s]\nSpaced Sampler: 66%|██████▌ | 33/50 [00:04<00:02, 8.49it/s]\nSpaced Sampler: 68%|██████▊ | 34/50 [00:04<00:01, 8.49it/s]\nSpaced Sampler: 70%|███████ | 35/50 [00:04<00:01, 8.50it/s]\nSpaced Sampler: 72%|███████▏ | 36/50 [00:04<00:01, 8.50it/s]\nSpaced Sampler: 74%|███████▍ | 37/50 [00:04<00:01, 8.50it/s]\nSpaced Sampler: 76%|███████▌ | 38/50 [00:04<00:01, 8.49it/s]\nSpaced Sampler: 78%|███████▊ | 39/50 [00:04<00:01, 8.50it/s]\nSpaced Sampler: 80%|████████ | 40/50 [00:04<00:01, 8.49it/s]\nSpaced Sampler: 82%|████████▏ | 41/50 [00:04<00:01, 8.49it/s]\nSpaced Sampler: 84%|████████▍ | 42/50 [00:05<00:00, 8.49it/s]\nSpaced Sampler: 86%|████████▌ | 43/50 [00:05<00:00, 8.49it/s]\nSpaced Sampler: 88%|████████▊ | 44/50 [00:05<00:00, 8.50it/s]\nSpaced Sampler: 90%|█████████ | 45/50 [00:05<00:00, 8.49it/s]\nSpaced Sampler: 92%|█████████▏| 46/50 [00:05<00:00, 8.49it/s]\nSpaced Sampler: 94%|█████████▍| 47/50 [00:05<00:00, 8.49it/s]\nSpaced Sampler: 96%|█████████▌| 48/50 [00:05<00:00, 8.49it/s]\nSpaced Sampler: 98%|█████████▊| 49/50 [00:05<00:00, 8.49it/s]\nSpaced Sampler: 100%|██████████| 50/50 [00:06<00:00, 8.49it/s]\nSpaced Sampler: 100%|██████████| 50/50 [00:06<00:00, 8.26it/s]\nsave to ./._0.png", "metrics": { "predict_time": 11.545806, "total_time": 11.476506 }, "output": [ "https://replicate.delivery/pbxt/LHdTSK0QrV5UHZ8dfbq9F4vBCGjR0O3ZHCugtVdJ6v75eZtRA/._0.png" ], "started_at": "2023-10-12T12:58:16.250703Z", "status": "succeeded", "urls": { "get": "https://api.replicate.com/v1/predictions/y3pp3vdb7h6jyccilsfukcundi", "cancel": "https://api.replicate.com/v1/predictions/y3pp3vdb7h6jyccilsfukcundi/cancel" }, "version": "51ed1464d8bbbaca811153b051d3b09ab42f0bdeb85804ae26ba323d7a66a4ac" }
Generated inckptckptckpt weights/general_full_v1.ckpt Switching from mode 'FACE' to 'FULL'... Loading 'FULL' mode model... Freezing the 'FULL' mode model and moving to the desired device... ENABLE XFORMERS! Model successfully switched to 'FULL' mode. {'bg_tile': 400, 'bg_tile_stride': 200, 'bg_upsampler': 'RealESRGAN', 'ckpt': 'weights/general_full_v1.ckpt', 'color_fix_type': 'wavelet', 'config': 'configs/model/cldm.yaml', 'detection_model': 'retinaface_resnet50', 'device': 'cuda', 'disable_preprocess_model': False, 'g_repeat': 5, 'g_scale': 0.0, 'g_space': 'latent', 'g_t_start': 1001, 'g_t_stop': -1, 'has_aligned': False, 'image_size': 512, 'input': '/tmp/tmpl7qh3cnr49.jpg', 'only_center_face': False, 'output': '.', 'reload_swinir': False, 'repeat_times': 1, 'seed': 231, 'show_lq': False, 'skip_if_exist': False, 'sr_scale': 4, 'steps': 50, 'swinir_ckpt': 'weights/general_swinir_v1.ckpt', 'tile_size': 512, 'tile_stride': 256, 'tiled': False, 'use_guidance': False} Global seed set to 231 timesteps used in spaced sampler: [0, 20, 41, 61, 82, 102, 122, 143, 163, 183, 204, 224, 245, 265, 285, 306, 326, 347, 367, 387, 408, 428, 449, 469, 489, 510, 530, 550, 571, 591, 612, 632, 652, 673, 693, 714, 734, 754, 775, 795, 816, 836, 856, 877, 897, 917, 938, 958, 979, 999] Spaced Sampler: 0%| | 0/50 [00:00<?, ?it/s] Spaced Sampler: 2%|▏ | 1/50 [00:00<00:14, 3.33it/s] Spaced Sampler: 4%|▍ | 2/50 [00:00<00:09, 5.20it/s] Spaced Sampler: 6%|▌ | 3/50 [00:00<00:07, 6.34it/s] Spaced Sampler: 8%|▊ | 4/50 [00:00<00:06, 7.06it/s] Spaced Sampler: 10%|█ | 5/50 [00:00<00:05, 7.53it/s] Spaced Sampler: 12%|█▏ | 6/50 [00:00<00:05, 7.84it/s] Spaced Sampler: 14%|█▍ | 7/50 [00:01<00:05, 8.06it/s] Spaced Sampler: 16%|█▌ | 8/50 [00:01<00:05, 8.20it/s] Spaced Sampler: 18%|█▊ | 9/50 [00:01<00:04, 8.30it/s] Spaced Sampler: 20%|██ | 10/50 [00:01<00:04, 8.37it/s] Spaced Sampler: 22%|██▏ | 11/50 [00:01<00:04, 8.43it/s] Spaced Sampler: 24%|██▍ | 12/50 [00:01<00:04, 8.46it/s] Spaced Sampler: 26%|██▌ | 13/50 [00:01<00:04, 8.47it/s] Spaced Sampler: 28%|██▊ | 14/50 [00:01<00:04, 8.49it/s] Spaced Sampler: 30%|███ | 15/50 [00:01<00:04, 8.50it/s] Spaced Sampler: 32%|███▏ | 16/50 [00:02<00:03, 8.51it/s] Spaced Sampler: 34%|███▍ | 17/50 [00:02<00:03, 8.51it/s] Spaced Sampler: 36%|███▌ | 18/50 [00:02<00:03, 8.51it/s] Spaced Sampler: 38%|███▊ | 19/50 [00:02<00:03, 8.52it/s] Spaced Sampler: 40%|████ | 20/50 [00:02<00:03, 8.52it/s] Spaced Sampler: 42%|████▏ | 21/50 [00:02<00:03, 8.52it/s] Spaced Sampler: 44%|████▍ | 22/50 [00:02<00:03, 8.52it/s] Spaced Sampler: 46%|████▌ | 23/50 [00:02<00:03, 8.53it/s] Spaced Sampler: 48%|████▊ | 24/50 [00:02<00:03, 8.51it/s] Spaced Sampler: 50%|█████ | 25/50 [00:03<00:02, 8.51it/s] Spaced Sampler: 52%|█████▏ | 26/50 [00:03<00:02, 8.51it/s] Spaced Sampler: 54%|█████▍ | 27/50 [00:03<00:02, 8.51it/s] Spaced Sampler: 56%|█████▌ | 28/50 [00:03<00:02, 8.50it/s] Spaced Sampler: 58%|█████▊ | 29/50 [00:03<00:02, 8.50it/s] Spaced Sampler: 60%|██████ | 30/50 [00:03<00:02, 8.51it/s] Spaced Sampler: 62%|██████▏ | 31/50 [00:03<00:02, 8.50it/s] Spaced Sampler: 64%|██████▍ | 32/50 [00:03<00:02, 8.50it/s] Spaced Sampler: 66%|██████▌ | 33/50 [00:04<00:02, 8.49it/s] Spaced Sampler: 68%|██████▊ | 34/50 [00:04<00:01, 8.49it/s] Spaced Sampler: 70%|███████ | 35/50 [00:04<00:01, 8.50it/s] Spaced Sampler: 72%|███████▏ | 36/50 [00:04<00:01, 8.50it/s] Spaced Sampler: 74%|███████▍ | 37/50 [00:04<00:01, 8.50it/s] Spaced Sampler: 76%|███████▌ | 38/50 [00:04<00:01, 8.49it/s] Spaced Sampler: 78%|███████▊ | 39/50 [00:04<00:01, 8.50it/s] Spaced Sampler: 80%|████████ | 40/50 [00:04<00:01, 8.49it/s] Spaced Sampler: 82%|████████▏ | 41/50 [00:04<00:01, 8.49it/s] Spaced Sampler: 84%|████████▍ | 42/50 [00:05<00:00, 8.49it/s] Spaced Sampler: 86%|████████▌ | 43/50 [00:05<00:00, 8.49it/s] Spaced Sampler: 88%|████████▊ | 44/50 [00:05<00:00, 8.50it/s] Spaced Sampler: 90%|█████████ | 45/50 [00:05<00:00, 8.49it/s] Spaced Sampler: 92%|█████████▏| 46/50 [00:05<00:00, 8.49it/s] Spaced Sampler: 94%|█████████▍| 47/50 [00:05<00:00, 8.49it/s] Spaced Sampler: 96%|█████████▌| 48/50 [00:05<00:00, 8.49it/s] Spaced Sampler: 98%|█████████▊| 49/50 [00:05<00:00, 8.49it/s] Spaced Sampler: 100%|██████████| 50/50 [00:06<00:00, 8.49it/s] Spaced Sampler: 100%|██████████| 50/50 [00:06<00:00, 8.26it/s] save to ./._0.png
Prediction
zsxkib/diffbir:51ed1464ID77euyklbgcyarhaczq7uwxulaiStatusSucceededSourceWebHardwareA40 (Large)Total durationCreatedInput
- seed
- 231
- steps
- 50
- tiled
- tile_size
- 512
- has_aligned
- tile_stride
- 256
- repeat_times
- 1
- use_guidance
- color_fix_type
- wavelet
- guidance_scale
- 0
- guidance_space
- latent
- guidance_repeat
- 5
- only_center_face
- guidance_time_stop
- -1
- guidance_time_start
- 1001
- background_upsampler
- RealESRGAN
- face_detection_model
- retinaface_resnet50
- upscaling_model_type
- faces
- restoration_model_type
- general_scenes
- super_resolution_factor
- 1
- disable_preprocess_model
- reload_restoration_model
- background_upsampler_tile
- 400
- background_upsampler_tile_stride
- 400
{ "seed": 231, "input": "https://replicate.delivery/pbxt/JgdmREudlAXBDFZnIvZjfgSxwxtNd3aHk7gXHScaLGFltLGe/0427.png", "steps": 50, "tiled": false, "tile_size": 512, "has_aligned": true, "tile_stride": 256, "repeat_times": 1, "use_guidance": false, "color_fix_type": "wavelet", "guidance_scale": 0, "guidance_space": "latent", "guidance_repeat": 5, "only_center_face": false, "guidance_time_stop": -1, "guidance_time_start": 1001, "background_upsampler": "RealESRGAN", "face_detection_model": "retinaface_resnet50", "upscaling_model_type": "faces", "restoration_model_type": "general_scenes", "super_resolution_factor": 1, "disable_preprocess_model": false, "reload_restoration_model": false, "background_upsampler_tile": 400, "background_upsampler_tile_stride": 400 }
Install Replicate’s Node.js client library:npm install replicate
Set theREPLICATE_API_TOKEN
environment variable:export REPLICATE_API_TOKEN=<paste-your-token-here>
Find your API token in your account settings.
Import and set up the client:import Replicate from "replicate"; const replicate = new Replicate({ auth: process.env.REPLICATE_API_TOKEN, });
Run zsxkib/diffbir using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
const output = await replicate.run( "zsxkib/diffbir:51ed1464d8bbbaca811153b051d3b09ab42f0bdeb85804ae26ba323d7a66a4ac", { input: { seed: 231, input: "https://replicate.delivery/pbxt/JgdmREudlAXBDFZnIvZjfgSxwxtNd3aHk7gXHScaLGFltLGe/0427.png", steps: 50, tiled: false, tile_size: 512, has_aligned: true, tile_stride: 256, repeat_times: 1, use_guidance: false, color_fix_type: "wavelet", guidance_scale: 0, guidance_space: "latent", guidance_repeat: 5, only_center_face: false, guidance_time_stop: -1, guidance_time_start: 1001, background_upsampler: "RealESRGAN", face_detection_model: "retinaface_resnet50", upscaling_model_type: "faces", restoration_model_type: "general_scenes", super_resolution_factor: 1, disable_preprocess_model: false, reload_restoration_model: false, background_upsampler_tile: 400, background_upsampler_tile_stride: 400 } } ); console.log(output);
To learn more, take a look at the guide on getting started with Node.js.
Install Replicate’s Python client library:pip install replicate
Set theREPLICATE_API_TOKEN
environment variable:export REPLICATE_API_TOKEN=<paste-your-token-here>
Find your API token in your account settings.
Import the client:import replicate
Run zsxkib/diffbir using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
output = replicate.run( "zsxkib/diffbir:51ed1464d8bbbaca811153b051d3b09ab42f0bdeb85804ae26ba323d7a66a4ac", input={ "seed": 231, "input": "https://replicate.delivery/pbxt/JgdmREudlAXBDFZnIvZjfgSxwxtNd3aHk7gXHScaLGFltLGe/0427.png", "steps": 50, "tiled": False, "tile_size": 512, "has_aligned": True, "tile_stride": 256, "repeat_times": 1, "use_guidance": False, "color_fix_type": "wavelet", "guidance_scale": 0, "guidance_space": "latent", "guidance_repeat": 5, "only_center_face": False, "guidance_time_stop": -1, "guidance_time_start": 1001, "background_upsampler": "RealESRGAN", "face_detection_model": "retinaface_resnet50", "upscaling_model_type": "faces", "restoration_model_type": "general_scenes", "super_resolution_factor": 1, "disable_preprocess_model": False, "reload_restoration_model": False, "background_upsampler_tile": 400, "background_upsampler_tile_stride": 400 } ) print(output)
To learn more, take a look at the guide on getting started with Python.
Set theREPLICATE_API_TOKEN
environment variable:export REPLICATE_API_TOKEN=<paste-your-token-here>
Find your API token in your account settings.
Run zsxkib/diffbir using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
curl -s -X POST \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ -H "Content-Type: application/json" \ -H "Prefer: wait" \ -d $'{ "version": "51ed1464d8bbbaca811153b051d3b09ab42f0bdeb85804ae26ba323d7a66a4ac", "input": { "seed": 231, "input": "https://replicate.delivery/pbxt/JgdmREudlAXBDFZnIvZjfgSxwxtNd3aHk7gXHScaLGFltLGe/0427.png", "steps": 50, "tiled": false, "tile_size": 512, "has_aligned": true, "tile_stride": 256, "repeat_times": 1, "use_guidance": false, "color_fix_type": "wavelet", "guidance_scale": 0, "guidance_space": "latent", "guidance_repeat": 5, "only_center_face": false, "guidance_time_stop": -1, "guidance_time_start": 1001, "background_upsampler": "RealESRGAN", "face_detection_model": "retinaface_resnet50", "upscaling_model_type": "faces", "restoration_model_type": "general_scenes", "super_resolution_factor": 1, "disable_preprocess_model": false, "reload_restoration_model": false, "background_upsampler_tile": 400, "background_upsampler_tile_stride": 400 } }' \ https://api.replicate.com/v1/predictions
To learn more, take a look at Replicate’s HTTP API reference docs.
You can run this model locally using Cog. First, install Cog:brew install cog
If you don’t have Homebrew, there are other installation options available.
Run this to download the model and run it in your local environment:
cog predict r8.im/zsxkib/diffbir@sha256:51ed1464d8bbbaca811153b051d3b09ab42f0bdeb85804ae26ba323d7a66a4ac \ -i 'seed=231' \ -i 'input="https://replicate.delivery/pbxt/JgdmREudlAXBDFZnIvZjfgSxwxtNd3aHk7gXHScaLGFltLGe/0427.png"' \ -i 'steps=50' \ -i 'tiled=false' \ -i 'tile_size=512' \ -i 'has_aligned=true' \ -i 'tile_stride=256' \ -i 'repeat_times=1' \ -i 'use_guidance=false' \ -i 'color_fix_type="wavelet"' \ -i 'guidance_scale=0' \ -i 'guidance_space="latent"' \ -i 'guidance_repeat=5' \ -i 'only_center_face=false' \ -i 'guidance_time_stop=-1' \ -i 'guidance_time_start=1001' \ -i 'background_upsampler="RealESRGAN"' \ -i 'face_detection_model="retinaface_resnet50"' \ -i 'upscaling_model_type="faces"' \ -i 'restoration_model_type="general_scenes"' \ -i 'super_resolution_factor=1' \ -i 'disable_preprocess_model=false' \ -i 'reload_restoration_model=false' \ -i 'background_upsampler_tile=400' \ -i 'background_upsampler_tile_stride=400'
To learn more, take a look at the Cog documentation.
Run this to download the model and run it in your local environment:
docker run -d -p 5000:5000 --gpus=all r8.im/zsxkib/diffbir@sha256:51ed1464d8bbbaca811153b051d3b09ab42f0bdeb85804ae26ba323d7a66a4ac
curl -s -X POST \ -H "Content-Type: application/json" \ -d $'{ "input": { "seed": 231, "input": "https://replicate.delivery/pbxt/JgdmREudlAXBDFZnIvZjfgSxwxtNd3aHk7gXHScaLGFltLGe/0427.png", "steps": 50, "tiled": false, "tile_size": 512, "has_aligned": true, "tile_stride": 256, "repeat_times": 1, "use_guidance": false, "color_fix_type": "wavelet", "guidance_scale": 0, "guidance_space": "latent", "guidance_repeat": 5, "only_center_face": false, "guidance_time_stop": -1, "guidance_time_start": 1001, "background_upsampler": "RealESRGAN", "face_detection_model": "retinaface_resnet50", "upscaling_model_type": "faces", "restoration_model_type": "general_scenes", "super_resolution_factor": 1, "disable_preprocess_model": false, "reload_restoration_model": false, "background_upsampler_tile": 400, "background_upsampler_tile_stride": 400 } }' \ http://localhost:5000/predictions
To learn more, take a look at the Cog documentation.
Output
We were unable to load these images. Please make sure the URLs are valid.
{ "input": "https://replicate.delivery/pbxt/JgdmREudlAXBDFZnIvZjfgSxwxtNd3aHk7gXHScaLGFltLGe/0427.png", "outut": "https://replicate.delivery/pbxt/tjBj5e8QUiSAHaJhYwLUV2Sb5fmmp9VuIvfb6X4fG6UCHp1GB/tmpbr7p39dy0427.png" }
{ "completed_at": "2023-10-12T13:19:45.432677Z", "created_at": "2023-10-12T13:17:41.439299Z", "data_removed": false, "error": null, "id": "77euyklbgcyarhaczq7uwxulai", "input": { "seed": 231, "input": "https://replicate.delivery/pbxt/JgdmREudlAXBDFZnIvZjfgSxwxtNd3aHk7gXHScaLGFltLGe/0427.png", "steps": 50, "tiled": false, "tile_size": 512, "has_aligned": true, "tile_stride": 256, "repeat_times": 1, "use_guidance": false, "color_fix_type": "wavelet", "guidance_scale": 0, "guidance_space": "latent", "guidance_repeat": 5, "only_center_face": false, "guidance_time_stop": -1, "guidance_time_start": 1001, "background_upsampler": "RealESRGAN", "face_detection_model": "retinaface_resnet50", "upscaling_model_type": "faces", "restoration_model_type": "general_scenes", "super_resolution_factor": 1, "disable_preprocess_model": false, "reload_restoration_model": false, "background_upsampler_tile": 400, "background_upsampler_tile_stride": 400 }, "logs": "ckptckptckpt weights/face_full_v1.ckpt\nSwitching from mode 'FULL' to 'FACE'...\nBuilding and loading 'FACE' mode model...\nControlLDM: Running in eps-prediction mode\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.\nDiffusionWrapper has 865.91 M params.\nmaking attention of type 'vanilla-xformers' with 512 in_channels\nbuilding MemoryEfficientAttnBlock with 512 in_channels...\nWorking with z of shape (1, 4, 32, 32) = 4096 dimensions.\nmaking attention of type 'vanilla-xformers' with 512 in_channels\nbuilding MemoryEfficientAttnBlock with 512 in_channels...\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.\nSetting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.\nSetting up [LPIPS] perceptual loss: trunk [alex], v[0.1], spatial [off]\nLoading model from: /root/.pyenv/versions/3.9.18/lib/python3.9/site-packages/lpips/weights/v0.1/alex.pth\nreload swinir model from weights/general_swinir_v1.ckpt\nENABLE XFORMERS!\nModel successfully switched to 'FACE' mode.\n{'bg_tile': 400,\n'bg_tile_stride': 400,\n'bg_upsampler': 'RealESRGAN',\n'ckpt': 'weights/face_full_v1.ckpt',\n'color_fix_type': 'wavelet',\n'config': 'configs/model/cldm.yaml',\n'detection_model': 'retinaface_resnet50',\n'device': 'cuda',\n'disable_preprocess_model': False,\n'g_repeat': 5,\n'g_scale': 0.0,\n'g_space': 'latent',\n 'g_t_start': 1001,\n 'g_t_stop': -1,\n 'has_aligned': True,\n'image_size': 512,\n'input': '/tmp/tmpbr7p39dy0427.png',\n 'only_center_face': False,\n 'output': '.',\n 'reload_swinir': False,\n'repeat_times': 1,\n 'seed': 231,\n 'show_lq': False,\n 'skip_if_exist': False,\n 'sr_scale': 1,\n'steps': 50,\n 'swinir_ckpt': 'weights/general_swinir_v1.ckpt',\n'tile_size': 512,\n'tile_stride': 256,\n 'tiled': False,\n 'use_guidance': False}\nGlobal seed set to 231\n/root/.pyenv/versions/3.9.18/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=None`.\nwarnings.warn(msg)\nDownloading: \"https://github.com/xinntao/facexlib/releases/download/v0.1.0/detection_Resnet50_Final.pth\" to /root/.pyenv/versions/3.9.18/lib/python3.9/site-packages/facexlib/weights/detection_Resnet50_Final.pth\n 0%| | 0.00/104M [00:00<?, ?B/s]\n 4%|▎ | 3.81M/104M [00:00<00:02, 39.8MB/s]\n 8%|▊ | 8.60M/104M [00:00<00:02, 45.9MB/s]\n 14%|█▎ | 14.1M/104M [00:00<00:01, 51.3MB/s]\n 20%|█▉ | 20.6M/104M [00:00<00:01, 57.8MB/s]\n 27%|██▋ | 28.1M/104M [00:00<00:01, 65.5MB/s]\n 34%|███▍ | 35.7M/104M [00:00<00:01, 70.4MB/s]\n 43%|████▎ | 45.0M/104M [00:00<00:00, 79.3MB/s]\n 53%|█████▎ | 54.9M/104M [00:00<00:00, 86.8MB/s]\n 63%|██████▎ | 65.8M/104M [00:00<00:00, 95.7MB/s]\n 74%|███████▍ | 77.1M/104M [00:01<00:00, 103MB/s] \n 85%|████████▌ | 89.2M/104M [00:01<00:00, 110MB/s]\n 97%|█████████▋| 102M/104M [00:01<00:00, 116MB/s] \n100%|██████████| 104M/104M [00:01<00:00, 89.6MB/s]\nDownloading: \"https://github.com/xinntao/facexlib/releases/download/v0.2.2/parsing_parsenet.pth\" to /root/.pyenv/versions/3.9.18/lib/python3.9/site-packages/facexlib/weights/parsing_parsenet.pth\n 0%| | 0.00/81.4M [00:00<?, ?B/s]\n 5%|▌ | 4.19M/81.4M [00:00<00:01, 43.6MB/s]\n 13%|█▎ | 10.6M/81.4M [00:00<00:01, 57.4MB/s]\n 22%|██▏ | 18.3M/81.4M [00:00<00:00, 67.9MB/s]\n 36%|███▌ | 29.2M/81.4M [00:00<00:00, 86.4MB/s]\n 53%|█████▎ | 43.3M/81.4M [00:00<00:00, 108MB/s] \n 68%|██████▊ | 55.1M/81.4M [00:00<00:00, 114MB/s]\n 83%|████████▎ | 67.5M/81.4M [00:00<00:00, 119MB/s]\n100%|██████████| 81.4M/81.4M [00:00<00:00, 107MB/s]\nLoading RealESRGAN_x2plus.pth for background upsampling...\ntimesteps used in spaced sampler:\n[0, 20, 41, 61, 82, 102, 122, 143, 163, 183, 204, 224, 245, 265, 285, 306, 326, 347, 367, 387, 408, 428, 449, 469, 489, 510, 530, 550, 571, 591, 612, 632, 652, 673, 693, 714, 734, 754, 775, 795, 816, 836, 856, 877, 897, 917, 938, 958, 979, 999]\nSpaced Sampler: 0%| | 0/50 [00:00<?, ?it/s]\nSpaced Sampler: 2%|▏ | 1/50 [00:00<00:10, 4.78it/s]\nSpaced Sampler: 6%|▌ | 3/50 [00:00<00:05, 8.71it/s]\nSpaced Sampler: 10%|█ | 5/50 [00:00<00:04, 10.22it/s]\nSpaced Sampler: 14%|█▍ | 7/50 [00:00<00:03, 11.02it/s]\nSpaced Sampler: 18%|█▊ | 9/50 [00:00<00:03, 11.47it/s]\nSpaced Sampler: 22%|██▏ | 11/50 [00:01<00:03, 11.76it/s]\nSpaced Sampler: 26%|██▌ | 13/50 [00:01<00:03, 11.94it/s]\nSpaced Sampler: 30%|███ | 15/50 [00:01<00:02, 12.07it/s]\nSpaced Sampler: 34%|███▍ | 17/50 [00:01<00:02, 12.15it/s]\nSpaced Sampler: 38%|███▊ | 19/50 [00:01<00:02, 12.16it/s]\nSpaced Sampler: 42%|████▏ | 21/50 [00:01<00:02, 12.19it/s]\nSpaced Sampler: 46%|████▌ | 23/50 [00:02<00:02, 12.22it/s]\nSpaced Sampler: 50%|█████ | 25/50 [00:02<00:02, 12.23it/s]\nSpaced Sampler: 54%|█████▍ | 27/50 [00:02<00:01, 12.26it/s]\nSpaced Sampler: 58%|█████▊ | 29/50 [00:02<00:01, 12.27it/s]\nSpaced Sampler: 62%|██████▏ | 31/50 [00:02<00:01, 12.27it/s]\nSpaced Sampler: 66%|██████▌ | 33/50 [00:02<00:01, 12.27it/s]\nSpaced Sampler: 70%|███████ | 35/50 [00:02<00:01, 12.24it/s]\nSpaced Sampler: 74%|███████▍ | 37/50 [00:03<00:01, 12.14it/s]\nSpaced Sampler: 78%|███████▊ | 39/50 [00:03<00:00, 12.18it/s]\nSpaced Sampler: 82%|████████▏ | 41/50 [00:03<00:00, 12.19it/s]\nSpaced Sampler: 86%|████████▌ | 43/50 [00:03<00:00, 12.19it/s]\nSpaced Sampler: 90%|█████████ | 45/50 [00:03<00:00, 12.20it/s]\nSpaced Sampler: 94%|█████████▍| 47/50 [00:03<00:00, 12.17it/s]\nSpaced Sampler: 98%|█████████▊| 49/50 [00:04<00:00, 12.12it/s]\nSpaced Sampler: 100%|██████████| 50/50 [00:04<00:00, 11.86it/s]\nFace image tmpbr7p39dy0427 saved to ./..", "metrics": { "predict_time": 36.889872, "total_time": 123.993378 }, "output": [ "https://replicate.delivery/pbxt/tjBj5e8QUiSAHaJhYwLUV2Sb5fmmp9VuIvfb6X4fG6UCHp1GB/tmpbr7p39dy0427.png" ], "started_at": "2023-10-12T13:19:08.542805Z", "status": "succeeded", "urls": { "get": "https://api.replicate.com/v1/predictions/77euyklbgcyarhaczq7uwxulai", "cancel": "https://api.replicate.com/v1/predictions/77euyklbgcyarhaczq7uwxulai/cancel" }, "version": "51ed1464d8bbbaca811153b051d3b09ab42f0bdeb85804ae26ba323d7a66a4ac" }
Generated inckptckptckpt weights/face_full_v1.ckpt Switching from mode 'FULL' to 'FACE'... Building and loading 'FACE' mode model... ControlLDM: Running in eps-prediction mode Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. DiffusionWrapper has 865.91 M params. making attention of type 'vanilla-xformers' with 512 in_channels building MemoryEfficientAttnBlock with 512 in_channels... Working with z of shape (1, 4, 32, 32) = 4096 dimensions. making attention of type 'vanilla-xformers' with 512 in_channels building MemoryEfficientAttnBlock with 512 in_channels... Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads. Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads. Setting up [LPIPS] perceptual loss: trunk [alex], v[0.1], spatial [off] Loading model from: /root/.pyenv/versions/3.9.18/lib/python3.9/site-packages/lpips/weights/v0.1/alex.pth reload swinir model from weights/general_swinir_v1.ckpt ENABLE XFORMERS! Model successfully switched to 'FACE' mode. {'bg_tile': 400, 'bg_tile_stride': 400, 'bg_upsampler': 'RealESRGAN', 'ckpt': 'weights/face_full_v1.ckpt', 'color_fix_type': 'wavelet', 'config': 'configs/model/cldm.yaml', 'detection_model': 'retinaface_resnet50', 'device': 'cuda', 'disable_preprocess_model': False, 'g_repeat': 5, 'g_scale': 0.0, 'g_space': 'latent', 'g_t_start': 1001, 'g_t_stop': -1, 'has_aligned': True, 'image_size': 512, 'input': '/tmp/tmpbr7p39dy0427.png', 'only_center_face': False, 'output': '.', 'reload_swinir': False, 'repeat_times': 1, 'seed': 231, 'show_lq': False, 'skip_if_exist': False, 'sr_scale': 1, 'steps': 50, 'swinir_ckpt': 'weights/general_swinir_v1.ckpt', 'tile_size': 512, 'tile_stride': 256, 'tiled': False, 'use_guidance': False} Global seed set to 231 /root/.pyenv/versions/3.9.18/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=None`. warnings.warn(msg) Downloading: "https://github.com/xinntao/facexlib/releases/download/v0.1.0/detection_Resnet50_Final.pth" to /root/.pyenv/versions/3.9.18/lib/python3.9/site-packages/facexlib/weights/detection_Resnet50_Final.pth 0%| | 0.00/104M [00:00<?, ?B/s] 4%|▎ | 3.81M/104M [00:00<00:02, 39.8MB/s] 8%|▊ | 8.60M/104M [00:00<00:02, 45.9MB/s] 14%|█▎ | 14.1M/104M [00:00<00:01, 51.3MB/s] 20%|█▉ | 20.6M/104M [00:00<00:01, 57.8MB/s] 27%|██▋ | 28.1M/104M [00:00<00:01, 65.5MB/s] 34%|███▍ | 35.7M/104M [00:00<00:01, 70.4MB/s] 43%|████▎ | 45.0M/104M [00:00<00:00, 79.3MB/s] 53%|█████▎ | 54.9M/104M [00:00<00:00, 86.8MB/s] 63%|██████▎ | 65.8M/104M [00:00<00:00, 95.7MB/s] 74%|███████▍ | 77.1M/104M [00:01<00:00, 103MB/s] 85%|████████▌ | 89.2M/104M [00:01<00:00, 110MB/s] 97%|█████████▋| 102M/104M [00:01<00:00, 116MB/s] 100%|██████████| 104M/104M [00:01<00:00, 89.6MB/s] Downloading: "https://github.com/xinntao/facexlib/releases/download/v0.2.2/parsing_parsenet.pth" to /root/.pyenv/versions/3.9.18/lib/python3.9/site-packages/facexlib/weights/parsing_parsenet.pth 0%| | 0.00/81.4M [00:00<?, ?B/s] 5%|▌ | 4.19M/81.4M [00:00<00:01, 43.6MB/s] 13%|█▎ | 10.6M/81.4M [00:00<00:01, 57.4MB/s] 22%|██▏ | 18.3M/81.4M [00:00<00:00, 67.9MB/s] 36%|███▌ | 29.2M/81.4M [00:00<00:00, 86.4MB/s] 53%|█████▎ | 43.3M/81.4M [00:00<00:00, 108MB/s] 68%|██████▊ | 55.1M/81.4M [00:00<00:00, 114MB/s] 83%|████████▎ | 67.5M/81.4M [00:00<00:00, 119MB/s] 100%|██████████| 81.4M/81.4M [00:00<00:00, 107MB/s] Loading RealESRGAN_x2plus.pth for background upsampling... timesteps used in spaced sampler: [0, 20, 41, 61, 82, 102, 122, 143, 163, 183, 204, 224, 245, 265, 285, 306, 326, 347, 367, 387, 408, 428, 449, 469, 489, 510, 530, 550, 571, 591, 612, 632, 652, 673, 693, 714, 734, 754, 775, 795, 816, 836, 856, 877, 897, 917, 938, 958, 979, 999] Spaced Sampler: 0%| | 0/50 [00:00<?, ?it/s] Spaced Sampler: 2%|▏ | 1/50 [00:00<00:10, 4.78it/s] Spaced Sampler: 6%|▌ | 3/50 [00:00<00:05, 8.71it/s] Spaced Sampler: 10%|█ | 5/50 [00:00<00:04, 10.22it/s] Spaced Sampler: 14%|█▍ | 7/50 [00:00<00:03, 11.02it/s] Spaced Sampler: 18%|█▊ | 9/50 [00:00<00:03, 11.47it/s] Spaced Sampler: 22%|██▏ | 11/50 [00:01<00:03, 11.76it/s] Spaced Sampler: 26%|██▌ | 13/50 [00:01<00:03, 11.94it/s] Spaced Sampler: 30%|███ | 15/50 [00:01<00:02, 12.07it/s] Spaced Sampler: 34%|███▍ | 17/50 [00:01<00:02, 12.15it/s] Spaced Sampler: 38%|███▊ | 19/50 [00:01<00:02, 12.16it/s] Spaced Sampler: 42%|████▏ | 21/50 [00:01<00:02, 12.19it/s] Spaced Sampler: 46%|████▌ | 23/50 [00:02<00:02, 12.22it/s] Spaced Sampler: 50%|█████ | 25/50 [00:02<00:02, 12.23it/s] Spaced Sampler: 54%|█████▍ | 27/50 [00:02<00:01, 12.26it/s] Spaced Sampler: 58%|█████▊ | 29/50 [00:02<00:01, 12.27it/s] Spaced Sampler: 62%|██████▏ | 31/50 [00:02<00:01, 12.27it/s] Spaced Sampler: 66%|██████▌ | 33/50 [00:02<00:01, 12.27it/s] Spaced Sampler: 70%|███████ | 35/50 [00:02<00:01, 12.24it/s] Spaced Sampler: 74%|███████▍ | 37/50 [00:03<00:01, 12.14it/s] Spaced Sampler: 78%|███████▊ | 39/50 [00:03<00:00, 12.18it/s] Spaced Sampler: 82%|████████▏ | 41/50 [00:03<00:00, 12.19it/s] Spaced Sampler: 86%|████████▌ | 43/50 [00:03<00:00, 12.19it/s] Spaced Sampler: 90%|█████████ | 45/50 [00:03<00:00, 12.20it/s] Spaced Sampler: 94%|█████████▍| 47/50 [00:03<00:00, 12.17it/s] Spaced Sampler: 98%|█████████▊| 49/50 [00:04<00:00, 12.12it/s] Spaced Sampler: 100%|██████████| 50/50 [00:04<00:00, 11.86it/s] Face image tmpbr7p39dy0427 saved to ./..
Want to make some of these yourself?
Run this model