cjwbw / docentr

End-to-End Document Image Enhancement Transformer

Demo API Examples Versions (0bc45621)


View more examples

Run time and cost

Predictions run on Nvidia T4 GPU hardware. Predictions typically complete within 15 seconds. The predict time for this model varies significantly based on the inputs.

This is a cog implementation of https://github.com/dali92002/DocEnTR


Use Python version 3.8.12


Pytorch implementation of the paper DocEnTr: An End-to-End Document Image Enhancement Transformer. This model is implemented on top of the vit-pytorch vision transformers library. The proposed model can be used to enhance (binarize) degraded document images, as shown in the following samples.

Degraded Images Our Binarization
1 2
1 2


If you find this useful for your research, please cite it as follows:

  title={DocEnTr: An end-to-end document image enhancement transformer},
  author={ Souibgui, Mohamed Ali and Biswas, Sanket and  Jemni, Sana Khamekhem and Kessentini, Yousri and Forn{\'e}s, Alicia and Llad{\'o}s, Josep and Pal, Umapada},
  booktitle={2022 26th International Conference on Pattern Recognition (ICPR)},



There should be no bugs in this code, but if there is, we are sorry for that :’) !!