chenxwh / lotus

Diffusion-based Visual Foundation Model for High-quality Dense Prediction

  • Public
  • 93 runs
  • GitHub
  • Weights
  • Paper
  • License

Run time and cost

This model runs on Nvidia A40 (Large) GPU hardware. We don't yet have enough runs of this model to provide performance information.

Readme

lotus Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction

We present Lotus, a diffusion-based visual foundation model for dense geometry prediction. With minimal training data, Lotus achieves SoTA performance in two key geometry perception tasks, i.e., zero-shot depth and normal estimation. “Avg. Rank” indicates the average ranking across all metrics, where lower values are better. Bar length represents the amount of training data used.

🎓 Citation

If you find our work useful in your research, please consider citing our paper:

@article{he2024lotus,
    title={Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction},
    author={He, Jing and Li, Haodong and Yin, Wei and Liang, Yixun and Li, Leheng and Zhou, Kaiqiang and Liu, Hongbo and Liu, Bingbing and Chen, Ying-Cong},
    journal={arXiv preprint arXiv:2409.18124},
    year={2024}
}