adirik / nougat

Nougat: Neural Optical Understanding for Academic Documents

  • Public
  • 3.6K runs
  • GitHub
  • Paper
  • License

Input

Output

Run time and cost

This model runs on Nvidia A40 GPU hardware. Predictions typically complete within 35 seconds. The predict time for this model varies significantly based on the inputs.

Readme

Nougat

Nougat (Neural Optical Understanding for Academic Documents) is a Visual Transformer model that performs an Optical Character Recognition (OCR) task for processing scientific documents into a markup language.

How to Use Nougat

To use Nougat, simply upload a scanned target document as a PDF or image (.png, .jpeg, .jpg, .tiff) file and optionally enable early_stopping for faster inference and postprocess to ensure full markdown compatibility.

Early Stopping

Nougat generates a maximum of 4096 tokens unless no “end of sentence” token is generated by the model. Nougat features an optional early stopping heuristic based on the variance of the logits for a sliding window of 15 tokens for efficiency purposes. Note that enabling early stopping might degrade model performance.

Output Format

Nougat outputs text (.txt) files with MultiMarkdown (.mmd) content, an extension of Markdown that supports mathematical notation, tables and footnotes. To render your output markdown content, copy paste the content of your output text file to a markdown viewer.

Model Details