Readme
Nougat
Nougat (Neural Optical Understanding for Academic Documents) is a Visual Transformer model that performs an Optical Character Recognition (OCR) task for processing scientific documents into a markup language.
How to Use Nougat
To use Nougat, simply upload a scanned target document as a PDF or image (.png, .jpeg, .jpg, .tiff) file and optionally enable early_stopping for faster inference and postprocess to ensure full markdown compatibility.
Early Stopping
Nougat generates a maximum of 4096 tokens unless no “end of sentence” token is generated by the model. Nougat features an optional early stopping heuristic based on the variance of the logits for a sliding window of 15 tokens for efficiency purposes. Note that enabling early stopping might degrade model performance.
Output Format
Nougat outputs text (.txt) files with MultiMarkdown (.mmd) content, an extension of Markdown that supports mathematical notation, tables and footnotes. To render your output markdown content, copy paste the content of your output text file to a markdown viewer.
Model Details
- Developed by: Meta Research
- Model type: Transformer-based OCR model
- Code license: MIT License
- Model weights license: CC-BY-NC License
- Resources for more information: Check out the original GitHub Repository, project page and Hugging Face Demo.