Acquia Search is a complex platform for hosting Solr indexes, and is not infinitely scalable. There are challenges both on the Drupal side and the server side that must be identified and considered when indexing large files.
Note: In general, indexing large files is not recommended.
Solr (including Acquia Search) completes the following three steps when indexing attachments:
POST
request to Solr at a special URL with the original document binary data.Either of these modules will return text. See the issues and limitations.
This section describes some of the limitations to the indexing process when dealing with large amounts of data.
If your Solr instance is external to your website, like Acquia Search is, and you communicate with it using HTTP, you can encounter the following problems when the file is being sent to the Tika service:
Websites not hosted by Acquia can work around these issues by running a locally hosted Java Tika application. Acquia Cloud does not run Java.
Your database may not be able to handle large files. The returned text needs to be stored in the database. This can greatly increase database table size and risks errors or data truncation if the table's field size is not large enough.
When you're attempting to index a large file, your website can have additional problems:
solrconfig.xml
, for example:
<maxfieldlength>20000</maxfieldlength>
Even if everything else works, if the extracted text has more words than the maxFieldLength
, Solr will truncate the indexed data to this amount. Smaller PDF files (such as a 5MB PDF file) can contain far more than 20,000 words.
Limits when using Acquia Search for Tika extraction and searching
Acquia Search has several non-configurable limits:
maxFieldLength
in solrconfig.xml
is set to a maximum of 20,000 tokens in our platform. A dedicated search farm is required to go above this limit. Contact your account manager if you require a dedicated farm.If you absolutely need to index large files, there are several options available for your use, including the following:
The Search API attachments module has this feature. To use it:
There is a similar options for the D7 apachesolr.module
If this content did not answer your questions, try searching or contacting our support team for further assistance.
Wed Oct 22 2025 08:59:29 GMT+0000 (Coordinated Universal Time)