Readme

Nougat

Nougat (Neural Optical Understanding for Academic Documents) is a Visual Transformer model that performs an Optical Character Recognition (OCR) task for processing scientific documents into a markup language.

How to Use Nougat

To use Nougat, simply upload a scanned target document as a PDF or image (.png, .jpeg, .jpg, .tiff) file and optionally enable early_stopping for faster inference and postprocess to ensure full markdown compatibility.

Early Stopping

Nougat generates a maximum of 4096 tokens unless no “end of sentence” token is generated by the model. Nougat features an optional early stopping heuristic based on the variance of the logits for a sliding window of 15 tokens for efficiency purposes. Note that enabling early stopping might degrade model performance.

Output Format

Nougat outputs text (.txt) files with MultiMarkdown (.mmd) content, an extension of Markdown that supports mathematical notation, tables and footnotes. To render your output markdown content, copy paste the content of your output text file to a markdown viewer.

Model Details

Developed by: Meta Research
Model type: Transformer-based OCR model
Code license: MIT License
Model weights license: CC-BY-NC License
Resources for more information: Check out the original GitHub Repository, project page and Hugging Face Demo.