Input

input_image

file

Input image

prompt

string

Shift + Return to add a new line

Pastel painting, with vibrant colours and good vibesPastel painting, with vibrant colours and good vibes

Prompt

Default: "A cozy cottage in an oil painting, with rich textures and vibrant green foliage"

a_prompt

string

Shift + Return to add a new line

Added prompt

Default: "best quality, extremely detailed"

n_prompt

string

Shift + Return to add a new line

worst quality, low quality, lose detailsworst quality, low quality, lose details

Negative prompt

Default: "worst quality, low quality, lose details"

image_resolution

integer

(minimum: 256, maximum: 896)

ControlNet image resolution

Default: 896

ddim_steps

integer

(minimum: 1, maximum: 50)

Number of steps

Default: 20

guess_mode

boolean

Guess Mode

Default: false

strength

number

(minimum: 0, maximum: 2)

Control strength

Default: 1

scale

number

(minimum: 0.1, maximum: 50)

Guidance scale

Default: 9

seed

integer

Random seed. Leave blank to randomize the seed

eta

number

Eta (DDIM)

Default: 0

mode

string

Tiling mode

Default: "P49"

patch_number

integer

(minimum: 1, maximum: 256)

Number of random patches

Default: 256

resolution_h

integer

(minimum: 256, maximum: 2700)

Processing resolution height

Default: 2160

resolution_w

integer

(minimum: 256, maximum: 4800)

Processing resolution width

Default: 3840

patch_size_h

integer

(minimum: 256, maximum: 675)

Patch size height

Default: 540

patch_size_w

integer

(minimum: 256, maximum: 1200)

Patch size width

Default: 960

color_map

string

Colormap used to render depth map

Default: "magma"

Run this model in Node.js with one line of code:

npx create-replicate --model=zsxkib/patch-fusion

or set up a project from scratch

Install Replicate’s Node.js client library:

npm install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import and set up the client:

import Replicate from "replicate";
import fs from "node:fs";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run zsxkib/patch-fusion using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "zsxkib/patch-fusion:db14d618e1a28b48decb7c91ff550d2f07e66cdb632630a65716f9be73c4d0ad",
  {
    input: {
      eta: 0,
      mode: "P49",
      seed: -1,
      scale: 9,
      prompt: "Pastel painting, with vibrant colours and good vibes",
      a_prompt: "best quality, extremely detailed",
      n_prompt: "worst quality, low quality, lose details",
      strength: 1,
      color_map: "magma",
      ddim_steps: 20,
      guess_mode: false,
      input_image: "https://replicate.delivery/pbxt/K7bks8mWnMxlQC19IPeFGtfr9YtGEUw5vgQhiu3olsD6vcoU/example_2.jpeg",
      patch_number: 256,
      patch_size_h: 540,
      patch_size_w: 960,
      resolution_h: 2160,
      resolution_w: 3840,
      image_resolution: 896
    }
  }
);

// To access the file URL:
console.log(output[0].url()); //=> "http://example.com"

// To write the file to disk:
fs.writeFile("my-image.png", output[0]);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import the client:

import replicate

Run zsxkib/patch-fusion using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "zsxkib/patch-fusion:db14d618e1a28b48decb7c91ff550d2f07e66cdb632630a65716f9be73c4d0ad",
    input={
        "eta": 0,
        "mode": "P49",
        "seed": -1,
        "scale": 9,
        "prompt": "Pastel painting, with vibrant colours and good vibes",
        "a_prompt": "best quality, extremely detailed",
        "n_prompt": "worst quality, low quality, lose details",
        "strength": 1,
        "color_map": "magma",
        "ddim_steps": 20,
        "guess_mode": False,
        "input_image": "https://replicate.delivery/pbxt/K7bks8mWnMxlQC19IPeFGtfr9YtGEUw5vgQhiu3olsD6vcoU/example_2.jpeg",
        "patch_number": 256,
        "patch_size_h": 540,
        "patch_size_w": 960,
        "resolution_h": 2160,
        "resolution_w": 3840,
        "image_resolution": 896
    }
)

# To access the file URL:
print(output[0].url())
#=> "http://example.com"

# To write the file to disk:
with open("my-image.png", "wb") as file:
    file.write(output[0].read())

To learn more, take a look at the guide on getting started with Python.

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Run zsxkib/patch-fusion using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "zsxkib/patch-fusion:db14d618e1a28b48decb7c91ff550d2f07e66cdb632630a65716f9be73c4d0ad",
    "input": {
      "eta": 0,
      "mode": "P49",
      "seed": -1,
      "scale": 9,
      "prompt": "Pastel painting, with vibrant colours and good vibes",
      "a_prompt": "best quality, extremely detailed",
      "n_prompt": "worst quality, low quality, lose details",
      "strength": 1,
      "color_map": "magma",
      "ddim_steps": 20,
      "guess_mode": false,
      "input_image": "https://replicate.delivery/pbxt/K7bks8mWnMxlQC19IPeFGtfr9YtGEUw5vgQhiu3olsD6vcoU/example_2.jpeg",
      "patch_number": 256,
      "patch_size_h": 540,
      "patch_size_w": 960,
      "resolution_h": 2160,
      "resolution_w": 3840,
      "image_resolution": 896
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

Output

{
  "completed_at": "2023-12-27T12:40:36.870165Z",
  "created_at": "2023-12-27T12:39:48.349408Z",
  "data_removed": false,
  "error": null,
  "id": "uz3rtkdb7nr6yv3jqe5xrhbcc4",
  "input": {
    "eta": 0,
    "mode": "P49",
    "seed": -1,
    "scale": 9,
    "prompt": "Pastel painting, with vibrant colours and good vibes",
    "a_prompt": "best quality, extremely detailed",
    "n_prompt": "worst quality, low quality, lose details",
    "strength": 1,
    "color_map": "magma",
    "ddim_steps": 20,
    "guess_mode": false,
    "input_image": "https://replicate.delivery/pbxt/K7bks8mWnMxlQC19IPeFGtfr9YtGEUw5vgQhiu3olsD6vcoU/example_2.jpeg",
    "patch_number": 256,
    "patch_size_h": 540,
    "patch_size_w": 960,
    "resolution_h": 2160,
    "resolution_w": 3840,
    "image_resolution": 896
  },
  "logs": "Global seed set to 47720\nData shape for DDIM sampling is (1, 4, 112, 112), eta 0.0\nRunning DDIM Sampling with 20 timesteps\nDDIM Sampler:   0%|          | 0/20 [00:00<?, ?it/s]\nDDIM Sampler:   5%|▌         | 1/20 [00:01<00:26,  1.39s/it]\nDDIM Sampler:  10%|█         | 2/20 [00:02<00:25,  1.39s/it]\nDDIM Sampler:  15%|█▌        | 3/20 [00:04<00:23,  1.39s/it]\nDDIM Sampler:  20%|██        | 4/20 [00:05<00:22,  1.39s/it]\nDDIM Sampler:  25%|██▌       | 5/20 [00:06<00:20,  1.39s/it]\nDDIM Sampler:  30%|███       | 6/20 [00:08<00:19,  1.39s/it]\nDDIM Sampler:  35%|███▌      | 7/20 [00:09<00:18,  1.39s/it]\nDDIM Sampler:  40%|████      | 8/20 [00:11<00:16,  1.40s/it]\nDDIM Sampler:  45%|████▌     | 9/20 [00:12<00:15,  1.40s/it]\nDDIM Sampler:  50%|█████     | 10/20 [00:13<00:13,  1.40s/it]\nDDIM Sampler:  55%|█████▌    | 11/20 [00:15<00:12,  1.40s/it]\nDDIM Sampler:  60%|██████    | 12/20 [00:16<00:11,  1.40s/it]\nDDIM Sampler:  65%|██████▌   | 13/20 [00:18<00:09,  1.40s/it]\nDDIM Sampler:  70%|███████   | 14/20 [00:19<00:08,  1.40s/it]\nDDIM Sampler:  75%|███████▌  | 15/20 [00:20<00:06,  1.40s/it]\nDDIM Sampler:  80%|████████  | 16/20 [00:22<00:05,  1.40s/it]\nDDIM Sampler:  85%|████████▌ | 17/20 [00:23<00:04,  1.40s/it]\nDDIM Sampler:  90%|█████████ | 18/20 [00:25<00:02,  1.40s/it]\nDDIM Sampler:  95%|█████████▌| 19/20 [00:26<00:01,  1.40s/it]\nDDIM Sampler: 100%|██████████| 20/20 [00:27<00:00,  1.40s/it]\nDDIM Sampler: 100%|██████████| 20/20 [00:27<00:00,  1.40s/it]",
  "metrics": {
    "predict_time": 48.459972,
    "total_time": 48.520757
  },
  "output": [
    "https://replicate.delivery/pbxt/GeyWb4sL7iyMZaEGFxMT9dwOQrxlUbYVvgRNALoHGnyhaODJA/image_0.png",
    "https://replicate.delivery/pbxt/A22SJX0yPe0efJVxg08pzmH5XkmwUoUKOB5R5DV6hlWJq5MkA/image_1.png"
  ],
  "started_at": "2023-12-27T12:39:48.410193Z",
  "status": "succeeded",
  "urls": {
    "get": "https://api.replicate.com/v1/predictions/uz3rtkdb7nr6yv3jqe5xrhbcc4",
    "cancel": "https://api.replicate.com/v1/predictions/uz3rtkdb7nr6yv3jqe5xrhbcc4/cancel"
  },
  "version": "db14d618e1a28b48decb7c91ff550d2f07e66cdb632630a65716f9be73c4d0ad"
}

Generated in

48.5 seconds

Tweak it Iterate in playgroundReport View full prediction

Examples

View more examples

Run time and cost

This model costs approximately $0.15 to run on Replicate, or 6 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia L40S GPU hardware. Predictions typically complete within 3 minutes. The predict time for this model varies significantly based on the inputs.

Readme

PatchFusion

An End-to-End Tile-Based Framework
for High-Resolution Monocular Metric Depth Estimation

Zhenyu Li, Shariq Farooq Bhat, Peter Wonka.
KAUST

</center> <center>

</center>

DEMO

Our official huggingface demo is available here! You can test with your own high-resolution image, even without a local GPU! It only takes 1 minute for depth prediction plus ControlNet generation!

Thanks for the kind support from hysts!

Environment setup

The project depends on : - pytorch (Main framework) - timm (Backbone helper for MiDaS) - ZoeDepth (Main baseline) - ControlNet (For potential application) - pillow, matplotlib, scipy, h5py, opencv (utilities)

Install environment using environment.yml :

Using mamba (fastest):

mamba env create -n patchfusion --file environment.yml
mamba activate patchfusion

Using conda :

conda env create -n patchfusion --file environment.yml
conda activate patchfusion

Pre-Train Model

Download our pre-trained model here, and put this checkpoint at nfs/patchfusion_u4k.pt as preparation for the following steps.

If you want to play the ControlNet demo, please download the pre-trained ControlNet model here, and put this checkpoint at nfs/control_sd15_depth.pth.

Gradio Demo

We provide a UI demo built using gradio. To get started, install UI requirements:

pip install -r ui_requirements.txt

Launch the gradio UI for depth estimation or image to 3D:

python ./ui_prediction.py --model zoedepth_custom --ckp_path nfs/patchfusion_u4k.pt --model_cfg_path ./zoedepth/models/zoedepth_custom/configs/config_zoedepth_patchfusion.json

Launch the gradio UI for depth-guided image generation with ControlNet:

python ./ui_generative.py --model zoedepth_custom --ckp_path nfs/patchfusion_u4k.pt --model_cfg_path ./zoedepth/models/zoedepth_custom/configs/config_zoedepth_patchfusion.json

User Inference

Put your images in folder path/to/your/folder
Run codes: bash python ./infer_user.py --model zoedepth_custom --ckp_path nfs/patchfusion_u4k.pt --model_cfg_path ./zoedepth/models/zoedepth_custom/configs/config_zoedepth_patchfusion.json --rgb_dir path/to/your/folder --show --show_path path/to/show --save --save_path path/to/save --mode r128 --boundary 0 --blur_mask
Check visualization results in path/to/show and depth results in path/to/save, respectively.

Args - We recommend using --blur_mask to reduce patch artifacts, though we didn’t use it in our standard evaluation process. - --mode: select from p16, p49, and rn, where n is the number of random added patches. - Please refer to infer_user.py for more details.

Citation

If you find our work useful for your research, please consider citing the paper

@article{li2023patchfusion,
    title={PatchFusion: An End-to-End Tile-Based Framework for High-Resolution Monocular Metric Depth Estimation}, 
    author={Zhenyu Li and Shariq Farooq Bhat and Peter Wonka},
    year={2023},
    eprint={2312.02284},
    archivePrefix={arXiv},
    primaryClass={cs.CV}}