vwtyler / ocr-pdf

simple pdf to text from a url using tesseract

  • Public
  • 503 runs
  • GitHub
  • License

OCR-PDF Project

Overview

This project extracts text from PDF files using Tesseract Optical Character Recognition (OCR). It downloads a PDF from a given URL, converts each page into an image, and then extracts the text using Tesseract OCR. The project is on Github.

Usage

Provide a url for a pdf and it will provide the text of the pdf.

License

This project is licensed under the MIT License