willywongi / donut

Extract structured data from receipt images using Donut 🍩 (Document Understanding Transformer)

  • Public
  • 2.1K runs
  • GitHub
  • Paper
  • License



Run time and cost

This model runs on Nvidia T4 GPU hardware. Predictions typically complete within 5 minutes. The predict time for this model varies significantly based on the inputs.


A simple implementation in COG of Donut 🍩: Document Understanding Transformer.

Model description

Donut 🍩, Document understanding transformer, is a new method of document understanding that utilizes an OCR-free end-to-end Transformer model. Donut does not require off-the-shelf OCR engines/APIs, yet it shows state-of-the-art performances on various visual document understanding tasks, such as visual document classification or information extraction (a.k.a. document parsing). (from the official repo README)

Intended use

This model uses the (pretrained weights from Huggingface)[https://huggingface.co/naver-clova-ix/donut-base-finetuned-cord-v2/tree/official] finetuned with CORD, a document parsing dataset. The task is fixed on cord-v2.

Cover image by AI