MarkItDown

Python utility tool from Microsoft for converting various files to Markdown

14 Dec 2024

Very surprising (and exciting) to see Microsoft coming out with an open-source Python utility tool for converting various files to Markdown!

My attempts to write my own scripts for this have led to messy results, so exciting to test this out.

"The MarkItDown library is a utility tool for converting various files to Markdown (e.g., for indexing, text analysis, etc.)

It presently supports:

PDF (.pdf)
PowerPoint (.pptx)
Word (.docx)
Excel (.xlsx)
Images (EXIF metadata, and OCR)
Audio (EXIF metadata, and speech transcription)
HTML (special handling of Wikipedia, etc.)
Various other text-based formats (csv, json, xml, etc.)"

Install as:

pip install markitdown

The API is simple:

from markitdown import MarkItDown

markitdown = MarkItDown()
result = markitdown.convert("test.xlsx")
print(result.text_content)

links

social