LiteParse | Nic's notes

Table of Contents

What it does
Language
Install
Key features
Supported formats
Value

run-llama/liteparse

A fast, helpful, and open-source document parser

What it does

LiteParse is an open-source document parser from the LlamaIndex team. It performs spatial text parsing with bounding boxes entirely locally, no cloud dependencies. Built on PDF.js for fast native PDF parsing with optional Tesseract.js OCR for scanned documents.

Language

TypeScript (71.9%) with Python components (26.5%) for the OCR server.

Install

# npm
npm i -g @llamaindex/liteparse

# Homebrew (macOS/Linux)
brew install llamaindex-liteparse

Also available to build from source.

Key features

Fast text parsing using PDF.js
Flexible OCR with built-in Tesseract.js support
Multiple output formats - JSON and text
Precise bounding boxes for text positioning
Screenshot generation for LLM agents
Multi-platform - Linux, macOS, Windows
No cloud required - runs fully standalone

Supported formats

Beyond native PDFs, LiteParse handles automatic conversion for:

Office documents (Word, PowerPoint, spreadsheets)
Images (JPG, PNG, GIF, TIFF, WebP)

Value

A solid alternative to cloud-based document parsers like LlamaParse (also from the LlamaIndex team, but cloud-hosted). The local-first approach is good for privacy-sensitive workflows and air-gapped environments. The bounding box output is useful for layout-aware RAG pipelines where you need to know where text sits on the page, not just what it says.