zsxkib / patch-fusion

Super High Quality Depth Maps πŸ—ΊοΈ: An End-to-End Tile-Based Framework πŸ—οΈ for High-Resolution Monocular Metric Depth Estimation πŸ”πŸ“

  • Public
  • 360 runs
  • L40S
  • GitHub
  • Paper
  • License

Input

input_image
file

Input image

string
Shift + Return to add a new line

Prompt

Default: "A cozy cottage in an oil painting, with rich textures and vibrant green foliage"

string
Shift + Return to add a new line

Added prompt

Default: "best quality, extremely detailed"

string
Shift + Return to add a new line

Negative prompt

Default: "worst quality, low quality, lose details"

integer
(minimum: 256, maximum: 896)

ControlNet image resolution

Default: 896

integer
(minimum: 1, maximum: 50)

Number of steps

Default: 20

boolean

Guess Mode

Default: false

number
(minimum: 0, maximum: 2)

Control strength

Default: 1

number
(minimum: 0.1, maximum: 50)

Guidance scale

Default: 9

integer

Random seed. Leave blank to randomize the seed

number

Eta (DDIM)

Default: 0

string

Tiling mode

Default: "P49"

integer
(minimum: 1, maximum: 256)

Number of random patches

Default: 256

integer
(minimum: 256, maximum: 2700)

Processing resolution height

Default: 2160

integer
(minimum: 256, maximum: 4800)

Processing resolution width

Default: 3840

integer
(minimum: 256, maximum: 675)

Patch size height

Default: 540

integer
(minimum: 256, maximum: 1200)

Patch size width

Default: 960

string

Colormap used to render depth map

Default: "magma"

Output

outputoutput
Generated in

Run time and cost

This model costs approximately $0.15 to run on Replicate, or 6 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia L40S GPU hardware. Predictions typically complete within 3 minutes. The predict time for this model varies significantly based on the inputs.

Readme

PatchFusion

An End-to-End Tile-Based Framework
for High-Resolution Monocular Metric Depth Estimation

Zhenyu Li, Shariq Farooq Bhat, Peter Wonka.
KAUST

<center> </center>

<center> </center>

DEMO

Our official huggingface demo is available here! You can test with your own high-resolution image, even without a local GPU! It only takes 1 minute for depth prediction plus ControlNet generation!

Thanks for the kind support from hysts!

Environment setup

The project depends on : - pytorch (Main framework) - timm (Backbone helper for MiDaS) - ZoeDepth (Main baseline) - ControlNet (For potential application) - pillow, matplotlib, scipy, h5py, opencv (utilities)

Install environment using environment.yml :

Using mamba (fastest):

mamba env create -n patchfusion --file environment.yml
mamba activate patchfusion

Using conda :

conda env create -n patchfusion --file environment.yml
conda activate patchfusion

Pre-Train Model

Download our pre-trained model here, and put this checkpoint at nfs/patchfusion_u4k.pt as preparation for the following steps.

If you want to play the ControlNet demo, please download the pre-trained ControlNet model here, and put this checkpoint at nfs/control_sd15_depth.pth.

Gradio Demo

We provide a UI demo built using gradio. To get started, install UI requirements:

pip install -r ui_requirements.txt

Launch the gradio UI for depth estimation or image to 3D:

python ./ui_prediction.py --model zoedepth_custom --ckp_path nfs/patchfusion_u4k.pt --model_cfg_path ./zoedepth/models/zoedepth_custom/configs/config_zoedepth_patchfusion.json

Launch the gradio UI for depth-guided image generation with ControlNet:

python ./ui_generative.py --model zoedepth_custom --ckp_path nfs/patchfusion_u4k.pt --model_cfg_path ./zoedepth/models/zoedepth_custom/configs/config_zoedepth_patchfusion.json

User Inference

  1. Put your images in folder path/to/your/folder

  2. Run codes: bash python ./infer_user.py --model zoedepth_custom --ckp_path nfs/patchfusion_u4k.pt --model_cfg_path ./zoedepth/models/zoedepth_custom/configs/config_zoedepth_patchfusion.json --rgb_dir path/to/your/folder --show --show_path path/to/show --save --save_path path/to/save --mode r128 --boundary 0 --blur_mask

  3. Check visualization results in path/to/show and depth results in path/to/save, respectively.

Args - We recommend using --blur_mask to reduce patch artifacts, though we didn’t use it in our standard evaluation process. - --mode: select from p16, p49, and rn, where n is the number of random added patches. - Please refer to infer_user.py for more details.

Citation

If you find our work useful for your research, please consider citing the paper

@article{li2023patchfusion,
    title={PatchFusion: An End-to-End Tile-Based Framework for High-Resolution Monocular Metric Depth Estimation}, 
    author={Zhenyu Li and Shariq Farooq Bhat and Peter Wonka},
    year={2023},
    eprint={2312.02284},
    archivePrefix={arXiv},
    primaryClass={cs.CV}}