zsyoaoa / invsr

Arbitrary-steps Image Super-resolution via Diffusion Inversion

Cold

Public
2.1K runs
T4
GitHub
Weights
Paper
License

Run with an API

Playground API Examples README Versions

Input

Run this model in Node.js with one line of code:

npx create-replicate --model=zsyoaoa/invsr

or set up a project from scratch

Install Replicate’s Node.js client library:

npm install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import and set up the client:

import Replicate from "replicate";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run zsyoaoa/invsr using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "zsyoaoa/invsr:37eebabfb6cdc4be2892b884b96b361d6fedc9f6a934d2fa3c1a2f85f004b0f0",
  {
    input: {
      seed: 12345,
      in_path: "https://replicate.delivery/pbxt/M8qhJrY5aD7tG40HumHd3gIIR3LXjMKThkOCNB1oSfGrimcu/32.jpg",
      num_steps: 1,
      chopping_size: 128
    }
  }
);
console.log(output);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import the client:

import replicate

Run zsyoaoa/invsr using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "zsyoaoa/invsr:37eebabfb6cdc4be2892b884b96b361d6fedc9f6a934d2fa3c1a2f85f004b0f0",
    input={
        "seed": 12345,
        "in_path": "https://replicate.delivery/pbxt/M8qhJrY5aD7tG40HumHd3gIIR3LXjMKThkOCNB1oSfGrimcu/32.jpg",
        "num_steps": 1,
        "chopping_size": 128
    }
)
print(output)

To learn more, take a look at the guide on getting started with Python.

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Run zsyoaoa/invsr using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "37eebabfb6cdc4be2892b884b96b361d6fedc9f6a934d2fa3c1a2f85f004b0f0",
    "input": {
      "seed": 12345,
      "in_path": "https://replicate.delivery/pbxt/M8qhJrY5aD7tG40HumHd3gIIR3LXjMKThkOCNB1oSfGrimcu/32.jpg",
      "num_steps": 1,
      "chopping_size": 128
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

You can run this model locally using Cog. First, install Cog:

brew install cog

If you don’t have Homebrew, there are other installation options available.

Run this to download the model and run it in your local environment:

cog predict r8.im/zsyoaoa/invsr@sha256:37eebabfb6cdc4be2892b884b96b361d6fedc9f6a934d2fa3c1a2f85f004b0f0 \
  -i 'seed=12345' \
  -i 'in_path="https://replicate.delivery/pbxt/M8qhJrY5aD7tG40HumHd3gIIR3LXjMKThkOCNB1oSfGrimcu/32.jpg"' \
  -i 'num_steps=1' \
  -i 'chopping_size=128'

To learn more, take a look at the Cog documentation.

Run this to download the model and run it in your local environment:

docker run -d -p 5000:5000 --gpus=all r8.im/zsyoaoa/invsr@sha256:37eebabfb6cdc4be2892b884b96b361d6fedc9f6a934d2fa3c1a2f85f004b0f0
curl -s -X POST \
  -H "Content-Type: application/json" \
  -d $'{
    "input": {
      "seed": 12345,
      "in_path": "https://replicate.delivery/pbxt/M8qhJrY5aD7tG40HumHd3gIIR3LXjMKThkOCNB1oSfGrimcu/32.jpg",
      "num_steps": 1,
      "chopping_size": 128
    }
  }' \
  http://localhost:5000/predictions

To learn more, take a look at the Cog documentation.

Output

We were unable to load these images. Please make sure the URLs are valid.

{
  "input": "https://replicate.delivery/pbxt/M8qhJrY5aD7tG40HumHd3gIIR3LXjMKThkOCNB1oSfGrimcu/32.jpg",
  "outut": "https://replicate.delivery/czjl/BqklqAF5Wu5XOxH2WQJ8lV5HzkMruQ9V74VacSfYecMUdv6TA/out.png"
}

{
  "completed_at": "2024-12-14T08:44:04.962719Z",
  "created_at": "2024-12-14T08:42:11.447000Z",
  "data_removed": false,
  "error": null,
  "id": "j2xw7errexrge0ckrhatgy909g",
  "input": {
    "seed": 12345,
    "in_path": "https://replicate.delivery/pbxt/M8qhJrY5aD7tG40HumHd3gIIR3LXjMKThkOCNB1oSfGrimcu/32.jpg",
    "num_steps": 1,
    "chopping_size": 128
  },
  "logs": "Setting timesteps for inference: [200]\nDownloading: \"https://huggingface.co/OAOA/InvSR/resolve/main/noise_predictor_sd_turbo_v5.pth\" to /src/weights/noise_predictor_sd_turbo_v5.pth\n  0%|          | 0.00/129M [00:00<?, ?B/s]\n  8%|▊         | 9.88M/129M [00:00<00:01, 102MB/s]\n 26%|██▌       | 33.6M/129M [00:00<00:00, 188MB/s]\n 40%|███▉      | 51.6M/129M [00:00<00:01, 45.9MB/s]\n 48%|████▊     | 62.5M/129M [00:01<00:01, 44.9MB/s]\n 55%|█████▍    | 70.8M/129M [00:01<00:01, 44.3MB/s]\n 60%|██████    | 77.5M/129M [00:01<00:01, 44.0MB/s]\n 65%|██████▍   | 83.4M/129M [00:01<00:01, 42.5MB/s]\n 69%|██████▊   | 88.5M/129M [00:01<00:01, 42.3MB/s]\n 72%|███████▏  | 93.2M/129M [00:02<00:00, 42.4MB/s]\n 76%|███████▌  | 97.9M/129M [00:02<00:00, 43.0MB/s]\n 79%|███████▉  | 102M/129M [00:02<00:00, 43.1MB/s] \n 83%|████████▎ | 107M/129M [00:02<00:00, 42.7MB/s]\n 86%|████████▌ | 111M/129M [00:02<00:00, 42.8MB/s]\n 89%|████████▉ | 115M/129M [00:02<00:00, 43.0MB/s]\n 93%|█████████▎| 120M/129M [00:02<00:00, 42.8MB/s]\n 96%|█████████▌| 124M/129M [00:02<00:00, 40.1MB/s]\n 99%|█████████▉| 128M/129M [00:02<00:00, 40.8MB/s]\n100%|██████████| 129M/129M [00:02<00:00, 46.3MB/s]\nFetching 12 files:   0%|          | 0/12 [00:00<?, ?it/s]\nFetching 12 files:  17%|█▋        | 2/12 [00:00<00:02,  4.21it/s]\nFetching 12 files:  33%|███▎      | 4/12 [00:32<01:17,  9.65s/it]\nFetching 12 files:  83%|████████▎ | 10/12 [01:23<00:17,  8.72s/it]\nFetching 12 files: 100%|██████████| 12/12 [01:23<00:00,  6.92s/it]\nLoading pipeline components...:   0%|          | 0/5 [00:00<?, ?it/s]\nLoading pipeline components...:  20%|██        | 1/5 [00:00<00:03,  1.19it/s]\nLoading pipeline components...:  40%|████      | 2/5 [00:01<00:03,  1.01s/it]\nLoading pipeline components...:  80%|████████  | 4/5 [00:02<00:00,  2.43it/s]\nLoading pipeline components...: 100%|██████████| 5/5 [00:02<00:00,  2.96it/s]\nLoading pipeline components...: 100%|██████████| 5/5 [00:02<00:00,  2.22it/s]\nYou have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing `safety_checker=None`. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .\nYou have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion_inversion_sr.StableDiffusionInvEnhancePipeline'> by passing `safety_checker=None`. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .\nActivating gradient checkpoing for vae...\nLoading started model from ./weights/noise_predictor_sd_turbo_v5.pth...\n/src/sampler_invsr.py:101: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.\nstate = torch.load(ckpt_path, map_location=f\"cuda\")\nLoading Done\n/root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/diffusers/configuration_utils.py:140: FutureWarning: Accessing config attribute `vae_latent_channels` directly via 'VaeImageProcessor' object attribute is deprecated. Please access 'vae_latent_channels' over 'VaeImageProcessor's config object instead, e.g. 'scheduler.config.vae_latent_channels'.\ndeprecate(\"direct config name access\", \"1.0.0\", deprecation_message, standard_warn=False)\n  0%|          | 0/1 [00:00<?, ?it/s]\n100%|██████████| 1/1 [00:00<00:00,  4.53it/s]\n100%|██████████| 1/1 [00:00<00:00,  4.52it/s]\n  0%|          | 0/1 [00:00<?, ?it/s]\n100%|██████████| 1/1 [00:00<00:00, 21.85it/s]\n  0%|          | 0/1 [00:00<?, ?it/s]\n100%|██████████| 1/1 [00:00<00:00, 10.38it/s]\nProcessing done, enjoy the results in invsr_output",
  "metrics": {
    "predict_time": 96.037241225,
    "total_time": 113.515719
  },
  "output": "https://replicate.delivery/czjl/BqklqAF5Wu5XOxH2WQJ8lV5HzkMruQ9V74VacSfYecMUdv6TA/out.png",
  "started_at": "2024-12-14T08:42:28.925478Z",
  "status": "succeeded",
  "urls": {
    "stream": "https://stream.replicate.com/v1/files/fddq-2q5gkhdzdt4tnofrkfrdklb4e3uv7jv6stgo56xqhojelc2254rq",
    "get": "https://api.replicate.com/v1/predictions/j2xw7errexrge0ckrhatgy909g",
    "cancel": "https://api.replicate.com/v1/predictions/j2xw7errexrge0ckrhatgy909g/cancel"
  },
  "version": "37eebabfb6cdc4be2892b884b96b361d6fedc9f6a934d2fa3c1a2f85f004b0f0"
}

Generated in

1 minute 36 seconds

Tweak it ShareReport

Setting timesteps for inference: [200]
Downloading: "https://huggingface.co/OAOA/InvSR/resolve/main/noise_predictor_sd_turbo_v5.pth" to /src/weights/noise_predictor_sd_turbo_v5.pth
  0%|          | 0.00/129M [00:00<?, ?B/s]
  8%|▊         | 9.88M/129M [00:00<00:01, 102MB/s]
 26%|██▌       | 33.6M/129M [00:00<00:00, 188MB/s]
 40%|███▉      | 51.6M/129M [00:00<00:01, 45.9MB/s]
 48%|████▊     | 62.5M/129M [00:01<00:01, 44.9MB/s]
 55%|█████▍    | 70.8M/129M [00:01<00:01, 44.3MB/s]
 60%|██████    | 77.5M/129M [00:01<00:01, 44.0MB/s]
 65%|██████▍   | 83.4M/129M [00:01<00:01, 42.5MB/s]
 69%|██████▊   | 88.5M/129M [00:01<00:01, 42.3MB/s]
 72%|███████▏  | 93.2M/129M [00:02<00:00, 42.4MB/s]
 76%|███████▌  | 97.9M/129M [00:02<00:00, 43.0MB/s]
 79%|███████▉  | 102M/129M [00:02<00:00, 43.1MB/s] 
 83%|████████▎ | 107M/129M [00:02<00:00, 42.7MB/s]
 86%|████████▌ | 111M/129M [00:02<00:00, 42.8MB/s]
 89%|████████▉ | 115M/129M [00:02<00:00, 43.0MB/s]
 93%|█████████▎| 120M/129M [00:02<00:00, 42.8MB/s]
 96%|█████████▌| 124M/129M [00:02<00:00, 40.1MB/s]
 99%|█████████▉| 128M/129M [00:02<00:00, 40.8MB/s]
100%|██████████| 129M/129M [00:02<00:00, 46.3MB/s]
Fetching 12 files:   0%|          | 0/12 [00:00<?, ?it/s]
Fetching 12 files:  17%|█▋        | 2/12 [00:00<00:02,  4.21it/s]
Fetching 12 files:  33%|███▎      | 4/12 [00:32<01:17,  9.65s/it]
Fetching 12 files:  83%|████████▎ | 10/12 [01:23<00:17,  8.72s/it]
Fetching 12 files: 100%|██████████| 12/12 [01:23<00:00,  6.92s/it]
Loading pipeline components...:   0%|          | 0/5 [00:00<?, ?it/s]
Loading pipeline components...:  20%|██        | 1/5 [00:00<00:03,  1.19it/s]
Loading pipeline components...:  40%|████      | 2/5 [00:01<00:03,  1.01s/it]
Loading pipeline components...:  80%|████████  | 4/5 [00:02<00:00,  2.43it/s]
Loading pipeline components...: 100%|██████████| 5/5 [00:02<00:00,  2.96it/s]
Loading pipeline components...: 100%|██████████| 5/5 [00:02<00:00,  2.22it/s]
You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing `safety_checker=None`. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .
You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion_inversion_sr.StableDiffusionInvEnhancePipeline'> by passing `safety_checker=None`. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .
Activating gradient checkpoing for vae...
Loading started model from ./weights/noise_predictor_sd_turbo_v5.pth...
/src/sampler_invsr.py:101: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
state = torch.load(ckpt_path, map_location=f"cuda")
Loading Done
/root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/diffusers/configuration_utils.py:140: FutureWarning: Accessing config attribute `vae_latent_channels` directly via 'VaeImageProcessor' object attribute is deprecated. Please access 'vae_latent_channels' over 'VaeImageProcessor's config object instead, e.g. 'scheduler.config.vae_latent_channels'.
deprecate("direct config name access", "1.0.0", deprecation_message, standard_warn=False)
  0%|          | 0/1 [00:00<?, ?it/s]
100%|██████████| 1/1 [00:00<00:00,  4.53it/s]
100%|██████████| 1/1 [00:00<00:00,  4.52it/s]
  0%|          | 0/1 [00:00<?, ?it/s]
100%|██████████| 1/1 [00:00<00:00, 21.85it/s]
  0%|          | 0/1 [00:00<?, ?it/s]
100%|██████████| 1/1 [00:00<00:00, 10.38it/s]
Processing done, enjoy the results in invsr_output

Examples

View more examples

Run time and cost

This model costs approximately $0.055 to run on Replicate, or 18 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia T4 GPU hardware. Predictions typically complete within 5 minutes. The predict time for this model varies significantly based on the inputs.

Readme

InvSR Model Card

This model card focuses on the models associated with the InvSR project, which is available here.

Model Details

Developed by: Zongsheng Yue
Model type: Arbitrary-steps Image Super-resolution via Diffusion Inversion
Model Description: This is the model used in Paper.
Resources for more information: GitHub Repository.
Cite as:

@article{yue2024invSR, author = {Zongsheng Yue, Kang Liao, Chen Change Loy}, title = {Arbitrary-steps Image Super-resolution via Diffusion Inversion}, journal = {arXiv preprint arXiv:2412.09013}, year = {2024}, }

Limitations

InvSR requires a tiled operation for generating a high-resolution image, which would largely increase the inference time.
InvSR sometimes cannot keep 100% fidelity due to its generative nature.
InvSR sometimes cannot generate perfect details under complex real-world scenarios.

Training

Training Data The model developer used the following dataset for training the model:

Our model is finetuned on LSDIR + 20K samples from FFHQ datasets.

Training Procedure InvSR achieves the goal of image super-resolution via diffusion inversion technique on SD-Turbo, detailed training pipelines can be found in our GitHub repo.

We currently provide the following checkpoints:

noise_predictor_sd_turbo_v5.pth: Noise estimation network trained for SD-Turbo.

Evaluation Results

See Paper for details.