These models perform optical character recognition, extracting text from images. They can help digitize text from scanned documents, photos, and other visual media.
For most OCR tasks, we recommend the abiruyt/text-extract-ocr model. This versatile tool makes it simple to extract plain text from a wide variety of images.
To get clean markdown or JSON from PDF, epub, or other document formats, use Marker. It's a pipeline of models that supports all languages, removes headers and footers, formats equations and code blocks, and more. It can also OCR text from PDFs saved in image format.
Some other useful models for your text extraction pipeline:
Featured models
Recommended Models
Recommended Models

bytedance/dolphin
Document Image Parsing via Heterogeneous Anchor Prompting
Updated 3 days, 15 hours ago
965 runs


lucataco/deepseek-ocr
Convert documents to markdown, extract raw text, and locate specific content
Updated 5 days, 14 hours ago
3.5K runs

datalab-to/ocr
Detect and transcribe text in images with accurate bounding boxes, layout analysis, reding order, and table recognition, in 90 languages
Updated 1 week ago
415 runs


pbevan1/llama-3.1-8b-ocr-correction
LLaMA 3.1-8B, finetuned on a synthetic OCR dataset for superior OCR correction.
Updated 1 year, 2 months ago
52 runs


cuuupid/glm-4v-9b
GLM-4V is a multimodal model released by Tsinghua University that is competitive with GPT-4o and establishes a new SOTA on several benchmarks, including OCR.
Updated 1 year, 3 months ago
92.3K runs


cudanexus/ocr-surya
Surya is a document OCR toolkit that does:
Updated 1 year, 7 months ago
6.5K runs


cuuupid/marker
Convert scanned or electronic documents to markdown, very very very fast
Updated 1 year, 10 months ago
2.9K runs


mickeybeurskens/latex-ocr
Optical character recognition to turn images of latex equations into latex format.
Updated 1 year, 11 months ago
871 runs


abiruyt/text-extract-ocr
A simple OCR Model that can easily extract text from an image.
Updated 2 years ago
90M runs


awilliamson10/meta-nougat
Nougat: Neural Optical Understanding for Academic Documents
Updated 2 years, 1 month ago
4.8K runs


kshitijagrwl/pii-extractor-llm
PII Data Extraction from Text
Updated 2 years, 3 months ago
167 runs


willywongi/donut
Extract structured data from receipt images using Donut 🍩 (Document Understanding Transformer)
Updated 2 years, 6 months ago
2.2K runs


cjwbw/docentr
End-to-End Document Image Enhancement Transformer
Updated 3 years, 1 month ago
4.7K runs