zsxkib / patch-fusion

Super High Quality Depth Maps ๐Ÿ—บ๏ธ: An End-to-End Tile-Based Framework ๐Ÿ—๏ธ for High-Resolution Monocular Metric Depth Estimation ๐Ÿ”๐Ÿ“

  • Public
  • 301 runs
  • GitHub
  • Paper
  • License

Input

Output

Run time and cost

This model runs on Nvidia A40 (Large) GPU hardware. Predictions typically complete within 3 minutes. The predict time for this model varies significantly based on the inputs.

Readme

PatchFusion

An End-to-End Tile-Based Framework
for High-Resolution Monocular Metric Depth Estimation

Zhenyu Li, Shariq Farooq Bhat, Peter Wonka.
KAUST

<center> </center>

<center> </center>

DEMO

Our official huggingface demo is available here! You can test with your own high-resolution image, even without a local GPU! It only takes 1 minute for depth prediction plus ControlNet generation!

Thanks for the kind support from hysts!

Environment setup

The project depends on : - pytorch (Main framework) - timm (Backbone helper for MiDaS) - ZoeDepth (Main baseline) - ControlNet (For potential application) - pillow, matplotlib, scipy, h5py, opencv (utilities)

Install environment using environment.yml :

Using mamba (fastest):

mamba env create -n patchfusion --file environment.yml
mamba activate patchfusion

Using conda :

conda env create -n patchfusion --file environment.yml
conda activate patchfusion

Pre-Train Model

Download our pre-trained model here, and put this checkpoint at nfs/patchfusion_u4k.pt as preparation for the following steps.

If you want to play the ControlNet demo, please download the pre-trained ControlNet model here, and put this checkpoint at nfs/control_sd15_depth.pth.

Gradio Demo

We provide a UI demo built using gradio. To get started, install UI requirements:

pip install -r ui_requirements.txt

Launch the gradio UI for depth estimation or image to 3D:

python ./ui_prediction.py --model zoedepth_custom --ckp_path nfs/patchfusion_u4k.pt --model_cfg_path ./zoedepth/models/zoedepth_custom/configs/config_zoedepth_patchfusion.json

Launch the gradio UI for depth-guided image generation with ControlNet:

python ./ui_generative.py --model zoedepth_custom --ckp_path nfs/patchfusion_u4k.pt --model_cfg_path ./zoedepth/models/zoedepth_custom/configs/config_zoedepth_patchfusion.json

User Inference

  1. Put your images in folder path/to/your/folder

  2. Run codes: bash python ./infer_user.py --model zoedepth_custom --ckp_path nfs/patchfusion_u4k.pt --model_cfg_path ./zoedepth/models/zoedepth_custom/configs/config_zoedepth_patchfusion.json --rgb_dir path/to/your/folder --show --show_path path/to/show --save --save_path path/to/save --mode r128 --boundary 0 --blur_mask

  3. Check visualization results in path/to/show and depth results in path/to/save, respectively.

Args - We recommend using --blur_mask to reduce patch artifacts, though we didn’t use it in our standard evaluation process. - --mode: select from p16, p49, and rn, where n is the number of random added patches. - Please refer to infer_user.py for more details.

Citation

If you find our work useful for your research, please consider citing the paper

@article{li2023patchfusion,
    title={PatchFusion: An End-to-End Tile-Based Framework for High-Resolution Monocular Metric Depth Estimation}, 
    author={Zhenyu Li and Shariq Farooq Bhat and Peter Wonka},
    year={2023},
    eprint={2312.02284},
    archivePrefix={arXiv},
    primaryClass={cs.CV}}