datalab-to/marker

Convert PDF to markdown + JSON quickly with high accuracy

100 runs

Marker converts documents to markdown, JSON, chunks, and HTML quickly and accurately.

  • Converts PDF, image, PPTX, DOCX, XLSX, HTML, EPUB files in all languages
  • Formats tables, forms, equations, inline math, links, references, and code blocks
  • Extracts and saves images
  • Removes headers/footers/other artifacts
  • Extensible with your own formatting and logic
  • Does structured extraction, given a JSON schema (beta)
  • Optionally boost accuracy with LLMs (and your own prompt)

For more information, visit https://datalab.to