Collections

Extract text from images

These models perform optical character recognition, extracting text from images. They can help digitize text from scanned documents, photos, and other visual media.

Best for image to text extraction: abiruyt/text-extract-ocr

For most OCR tasks, we recommend the abiruyt/text-extract-ocr model. This versatile tool makes it simple to extract plain text from a wide variety of images.

Best for document extraction: cuuupid/marker

To get clean markdown or JSON from PDF, epub, or other document formats, use Marker. It’s a pipeline of models that supports all languages, removes headers and footers, formats equations and code blocks, and more. It can also OCR text from PDFs saved in image format.

Other utilities

Some other useful models for your text extraction pipeline: