zsxkib / patch-fusion

Super High Quality Depth Maps ๐Ÿ—บ๏ธ: An End-to-End Tile-Based Framework ๐Ÿ—๏ธ for High-Resolution Monocular Metric Depth Estimation ๐Ÿ”๐Ÿ“

  • Public
  • 310 runs
  • GitHub
  • Paper
  • License

Input

Output

Run time and cost

This model costs approximately $0.11 to run on Replicate, or 9 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia A40 (Large) GPU hardware. Predictions typically complete within 3 minutes. The predict time for this model varies significantly based on the inputs.

Readme

PatchFusion

An End-to-End Tile-Based Framework
for High-Resolution Monocular Metric Depth Estimation

Zhenyu Li, Shariq Farooq Bhat, Peter Wonka.
KAUST

<center> </center>

<center> </center>

DEMO

Our official huggingface demo is available here! You can test with your own high-resolution image, even without a local GPU! It only takes 1 minute for depth prediction plus ControlNet generation!

Thanks for the kind support from hysts!

Environment setup

The project depends on : - pytorch (Main framework) - timm (Backbone helper for MiDaS) - ZoeDepth (Main baseline) - ControlNet (For potential application) - pillow, matplotlib, scipy, h5py, opencv (utilities)

Install environment using environment.yml :

Using mamba (fastest):

mamba env create -n patchfusion --file environment.yml
mamba activate patchfusion

Using conda :

conda env create -n patchfusion --file environment.yml
conda activate patchfusion

Pre-Train Model

Download our pre-trained model here, and put this checkpoint at nfs/patchfusion_u4k.pt as preparation for the following steps.

If you want to play the ControlNet demo, please download the pre-trained ControlNet model here, and put this checkpoint at nfs/control_sd15_depth.pth.

Gradio Demo

We provide a UI demo built using gradio. To get started, install UI requirements:

pip install -r ui_requirements.txt

Launch the gradio UI for depth estimation or image to 3D:

python ./ui_prediction.py --model zoedepth_custom --ckp_path nfs/patchfusion_u4k.pt --model_cfg_path ./zoedepth/models/zoedepth_custom/configs/config_zoedepth_patchfusion.json

Launch the gradio UI for depth-guided image generation with ControlNet:

python ./ui_generative.py --model zoedepth_custom --ckp_path nfs/patchfusion_u4k.pt --model_cfg_path ./zoedepth/models/zoedepth_custom/configs/config_zoedepth_patchfusion.json

User Inference

  1. Put your images in folder path/to/your/folder

  2. Run codes: bash python ./infer_user.py --model zoedepth_custom --ckp_path nfs/patchfusion_u4k.pt --model_cfg_path ./zoedepth/models/zoedepth_custom/configs/config_zoedepth_patchfusion.json --rgb_dir path/to/your/folder --show --show_path path/to/show --save --save_path path/to/save --mode r128 --boundary 0 --blur_mask

  3. Check visualization results in path/to/show and depth results in path/to/save, respectively.

Args - We recommend using --blur_mask to reduce patch artifacts, though we didn’t use it in our standard evaluation process. - --mode: select from p16, p49, and rn, where n is the number of random added patches. - Please refer to infer_user.py for more details.

Citation

If you find our work useful for your research, please consider citing the paper

@article{li2023patchfusion,
    title={PatchFusion: An End-to-End Tile-Based Framework for High-Resolution Monocular Metric Depth Estimation}, 
    author={Zhenyu Li and Shariq Farooq Bhat and Peter Wonka},
    year={2023},
    eprint={2312.02284},
    archivePrefix={arXiv},
    primaryClass={cs.CV}}