---
title: "Troubleshooting \"Unable to extract PDF content\""
date: "2022-02-25T17:18:28+00:00"
summary:
image:
type: "article"
url: "/acquia-cloud-platform/help/93751-troubleshooting-unable-extract-pdf-content"
id: "768c789f-be51-408a-8dea-92ab90eb5a76"
---

Table of contents will be added

Issue
-----

While indexing files, you receive this error:

> `Caused by: org.apache.tika.exception.TikaException: Unable to extract PDF content`

Resolution
----------

You can check Solr sees by running the Tika extractor manually:

1.  Install Java 
2.  Download [https://archive.apache.org/dist/tika/tika-app-0.10.jar](https://archive.apache.org/dist/tika/tika-app-0.10.jar)
3.  Download the PDF in question.
4.  Run the below command:
    *   `java -jar tika-app-0.10.jar {filename-to-test}`

Cause
-----

There are many possible causes for Tika to give this error, but here are a few:

*   The PDF could be password-protected.
*   It could be too big.
*   It could be an incompatible format.

To rule out a version incompatibility, you can convert the PDF file that is generating the error to an earlier version. You can use something like this sample Ghostscript ([https://www.ghostscript.com/](https://www.ghostscript.com/)) to achieve this:

    $ gs                        \
       -sDEVICE=pdfwrite        \
       -dCompatibilityLevel=1.5 \
       -o output.pdf            \
       input.pdf