A simple implementation in COG of Donut 🍩: Document Understanding Transformer.
…
Model description
Donut 🍩, Document understanding transformer, is a new method of document understanding that utilizes an OCR-free end-to-end Transformer model. Donut does not require off-the-shelf OCR engines/APIs, yet it shows state-of-the-art performances on various visual document understanding tasks, such as visual document classification or information extraction (a.k.a. document parsing). (from the official repo README)
Intended use
This model uses the (pretrained weights from Huggingface)[https://huggingface.co/naver-clova-ix/donut-base-finetuned-cord-v2/tree/official] finetuned with CORD, a document parsing dataset. The task is fixed on cord-v2.
Cover image by AI