vwtyler / ocr-pdf

simple pdf to text from a url using tesseract (Updated 10 months ago)

  • Public
  • 1.7K runs
  • GitHub
  • License
Iterate in playground

OCR-PDF Project

Overview

This project extracts text from PDF files using Tesseract Optical Character Recognition (OCR). It downloads a PDF from a given URL, converts each page into an image, and then extracts the text using Tesseract OCR. The project is on Github.

Usage

Provide a url for a pdf and it will provide the text of the pdf.

License

This project is licensed under the MIT License