chenxwh / docentr

End-to-End Document Image Enhancement Transformer

  • Public
  • 2.3K runs
  • GitHub
  • Paper
  • License

Input

Output

Run time and cost

This model runs on Nvidia T4 GPU hardware. Predictions typically complete within 2 seconds.

Readme

This is a cog implementation of https://github.com/dali92002/DocEnTR

DocEnTR

Use Python version 3.8.12

Description

Pytorch implementation of the paper DocEnTr: An End-to-End Document Image Enhancement Transformer. This model is implemented on top of the vit-pytorch vision transformers library. The proposed model can be used to enhance (binarize) degraded document images, as shown in the following samples.

Degraded Images Our Binarization
1 2
1 2

Citation

If you find this useful for your research, please cite it as follows:

@inproceedings{souibgui2022docentr,
  title={DocEnTr: An end-to-end document image enhancement transformer},
  author={ Souibgui, Mohamed Ali and Biswas, Sanket and  Jemni, Sana Khamekhem and Kessentini, Yousri and Forn{\'e}s, Alicia and Llad{\'o}s, Josep and Pal, Umapada},
  booktitle={2022 26th International Conference on Pattern Recognition (ICPR)},
  year={2022}
}

Authors

Conclusion

There should be no bugs in this code, but if there is, we are sorry for that :’) !!