End-to-End Document Image Enhancement Transformer

Run time and cost

Predictions run on Nvidia T4 GPU hardware. Predictions typically complete within 15 seconds. The predict time for this model varies significantly based on the inputs.

This is a cog implementation of https://github.com/dali92002/DocEnTR


Pytorch implementation of the paper DocEnTr: An End-to-End Document Image Enhancement Transformer. This model is implemented on top of the vit-pytorch vision transformers library. The proposed model can be used to enhance (binarize) degraded document images, as shown in the following samples.

