Searching for PDFs by content using the command line

Prerequisites

find ~/ -type f -name "*.pdf" -exec pdfgrep -il "MY_SEARCH_STRING" {} + 2>/dev/null

find ~/ - Searches within the user's home directory.
-type f - Looks for regular files.
-name "*.pdf" - Filters for files with a .pdf extension.
-exec pdfgrep -il "MY_SEARCH_STRING" {} + - Runs pdfgrep on each found PDF file:
-i makes the search case-insensitive.
-l only outputs the names of matching files.
2>/dev/null - Suppresses permission errors from directories the user cannot access.

If the search needs to be limited to a specific folder, replace ~/ with the desired directory path.
This method does not search inside image-based PDFs that require OCR processing.

If the PDFs contain scanned images rather than selectable text, use OCR to extract text before searching:

First, install the necessary tool for OCR:

brew install tesseract poppler

Then, use the following command to search for the term "MY_SEARCH_STRING" in PDFs:

for file in $(find ~/ -type f -name "*.pdf"); do
    pdftotext "$file" - | grep -il "MY_SEARCH_STRING" && echo "$file"
done

pdftotext "$file" - - Extracts text from the PDF.
grep -il "MY_SEARCH_STRING" - Searches for the term within the extracted text.

Ensure pdfgrep and OCR tools (tesseract, poppler) are installed.
If the search needs to be limited to a specific folder, replace ~/ with the desired directory path.
This method enables searching inside both text-based and image-based PDFs.