willywongi / donut

Extract structured data from receipt images using Donut 🍩 (Document Understanding Transformer)

  • Public
  • 2.1K runs
  • GitHub
  • Paper
  • License

Input

Output

Run time and cost

This model costs approximately $0.054 to run on Replicate, or 18 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia T4 GPU hardware. Predictions typically complete within 5 minutes. The predict time for this model varies significantly based on the inputs.

Readme

A simple implementation in COG of Donut 🍩: Document Understanding Transformer.

Model description

Donut 🍩, Document understanding transformer, is a new method of document understanding that utilizes an OCR-free end-to-end Transformer model. Donut does not require off-the-shelf OCR engines/APIs, yet it shows state-of-the-art performances on various visual document understanding tasks, such as visual document classification or information extraction (a.k.a. document parsing). (from the official repo README)

Intended use

This model uses the (pretrained weights from Huggingface)[https://huggingface.co/naver-clova-ix/donut-base-finetuned-cord-v2/tree/official] finetuned with CORD, a document parsing dataset. The task is fixed on cord-v2.

Cover image by AI