cjwbw / docentr

End-to-End Document Image Enhancement Transformer

  • Public
  • 3.7K runs
  • GitHub
  • Paper
  • License

Input

Output

Run time and cost

This model costs approximately $0.00056 to run on Replicate, or 1785 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia T4 GPU hardware. Predictions typically complete within 3 seconds.

Readme

This is a cog implementation of https://github.com/dali92002/DocEnTR

DocEnTR

Use Python version 3.8.12

Description

Pytorch implementation of the paper DocEnTr: An End-to-End Document Image Enhancement Transformer. This model is implemented on top of the vit-pytorch vision transformers library. The proposed model can be used to enhance (binarize) degraded document images, as shown in the following samples.

Degraded Images Our Binarization
1 2
1 2

Citation

If you find this useful for your research, please cite it as follows:

@inproceedings{souibgui2022docentr,
  title={DocEnTr: An end-to-end document image enhancement transformer},
  author={ Souibgui, Mohamed Ali and Biswas, Sanket and  Jemni, Sana Khamekhem and Kessentini, Yousri and Forn{\'e}s, Alicia and Llad{\'o}s, Josep and Pal, Umapada},
  booktitle={2022 26th International Conference on Pattern Recognition (ICPR)},
  year={2022}
}

Authors

Conclusion

There should be no bugs in this code, but if there is, we are sorry for that :’) !!