cuuupid / markitdown

Microsoft's tool to convert Office documents, PDFs, images, audio, and more to LLM-ready markdown.

  • Public
  • 122 runs
  • GitHub
  • Weights
  • License

The MarkItDown library is a utility tool for converting various files to Markdown (e.g., for indexing, text analysis, etc.).

It presently supports:

  • PDF (.pdf)
  • PowerPoint (.pptx)
  • Word (.docx)
  • Excel (.xlsx)
  • Images (EXIF metadata, and OCR)
  • Audio (EXIF metadata, and speech transcription)
  • HTML (special handling of Wikipedia, etc.)
  • Various other text-based formats (csv, json, xml, etc.)