Prerequisites
- Ensure
pdfgrep
is installed via Homebrew (brew install pdfgrep
).
Searching Inside Text-Based PDFs
find ~/ -type f -name "*.pdf" -exec pdfgrep -il "MY_SEARCH_STRING" {} + 2>/dev/null
Explanation
find ~/
- Searches within the user's home directory.-type f
- Looks for regular files.-name "*.pdf"
- Filters for files with a.pdf
extension.-exec pdfgrep -il "MY_SEARCH_STRING" {} +
- Runspdfgrep
on each found PDF file:-i
makes the search case-insensitive.-l
only outputs the names of matching files.2>/dev/null
- Suppresses permission errors from directories the user cannot access.
Notes
- If the search needs to be limited to a specific folder, replace
~/
with the desired directory path. - This method does not search inside image-based PDFs that require OCR processing.
Searching Inside Image-Based PDFs
If the PDFs contain scanned images rather than selectable text, use OCR to extract text before searching:
Convert PDFs to Text with OCR
First, install the necessary tool for OCR:
brew install tesseract poppler
Then, use the following command to search for the term "MY_SEARCH_STRING" in PDFs:
for file in $(find ~/ -type f -name "*.pdf"); do
pdftotext "$file" - | grep -il "MY_SEARCH_STRING" && echo "$file"
done
Explanation
pdftotext "$file" -
- Extracts text from the PDF.grep -il "MY_SEARCH_STRING"
- Searches for the term within the extracted text.
Notes
- Ensure
pdfgrep
and OCR tools (tesseract
,poppler
) are installed. - If the search needs to be limited to a specific folder, replace
~/
with the desired directory path. - This method enables searching inside both text-based and image-based PDFs.