Server OCR and PDF extraction check
This page shows whether the server can convert scanned/image-only PDFs automatically.
Text PDFs
Limited
Scanned PDFs
Needs OCR tools
Images
Needs Tesseract
| Tool | Purpose | Status | Path |
|---|---|---|---|
| pdftotext | Extracts selectable text from normal PDF files | Not detected | - |
| pdftoppm | Converts scanned PDF pages to images before OCR | Not detected | - |
| tesseract | Reads text from scanned images/PDF pages | Not detected | - |
| shell_exec | Allows PHP to call the above server tools | Enabled | - |
What to ask your hosting/server team to install
For normal text-layer PDFs, the best tool is Poppler pdftotext. For scanned/image-only PDFs, the app needs Poppler pdftoppm and Tesseract OCR.
On a Linux server, the required packages are normally:
poppler-utils tesseract-ocr
On shared cPanel hosting, the host may not allow these packages or may disable shell_exec. In that case, the fallback is still available: upload/store the scanned PDF, then paste OCR text into the Parse Documents tab.