Readme
DeepSeek OCR
Extract text from images and convert documents into clean, structured markdown.
What it does
DeepSeek OCR reads text from images and turns it into markdown. Upload a screenshot, PDF, scanned document, or photo with text, and it’ll extract everything while preserving the structure like tables, headings, and formatting.
This model handles:
Documents and PDFs: Academic papers, financial reports, textbooks, newspapers, and handwritten notes across 100 languages
Complex layouts: Multi-column documents, tables, forms, and receipts with the structure intact
Mathematical content: Equations and formulas converted to LaTeX format
Scientific content: Chemical formulas and geometric figures
Charts and visualizations: Data extraction from graphs and charts into structured formats
How it works differently
Most optical character recognition tools just find text and spit it back out. DeepSeek OCR understands the entire document as a visual sequence. It sees the layout, interprets the structure, and generates markdown like a person would write it.
The result is markdown you can immediately use in notebooks, documentation, or feed into other AI models without cleanup.
What makes it interesting
DeepSeek OCR compresses visual information incredibly efficiently. A document that would normally need 700-800 text tokens can be processed using just 100 visual tokens while maintaining 97% accuracy. This breakthrough approach treats documents as compressed visual data rather than just extracting characters.
The model adapts its compression based on document complexity. Simple slides might use 64 tokens, while dense newspapers automatically switch to a higher-detail mode using around 800 tokens.
Performance
On a single A100 GPU, DeepSeek OCR can process over 200,000 pages per day at around 2,500 tokens per second. It achieves state-of-the-art accuracy on document parsing benchmarks while using fewer tokens than other models.
Example outputs
The model excels at preserving document structure
Tables: Extracts rows and columns into proper HTML or markdown table format, maintaining alignment and relationships between cells
Equations: Recognizes mathematical expressions and outputs them as properly formatted LaTeX
Multi-language documents: Handles mixed scripts within the same document, like Korean and English side-by-side
Handwritten notes: Digitizes handwritten text while attempting to preserve the original structure
Common use cases
Document digitization: Convert scanned papers, books, or archival materials into searchable, structured text
Data extraction: Pull tables and figures from reports for analysis
Invoice processing: Extract line items, totals, and structured data from receipts and invoices
Academic research: Convert PDFs of papers into markdown for note-taking or further processing
Training data generation: Process large volumes of documents to create datasets for training other AI models
Technical details
The model combines a visual encoder (around 380 million parameters) with a small mixture-of-experts language model decoder (3 billion parameters with 570 million activated). The encoder uses both local window attention for fine details and global attention for broader context understanding.
DeepSeek OCR was trained on 30 million PDF pages covering approximately 100 languages, plus synthetic data including 10 million charts, 5 million chemical formulas, and 1 million geometric figures.
Learn more
For technical details and the research behind the model, check out the DeepSeek OCR paper and GitHub repository.
Try the model yourself on Replicate Playground.