Information for: DEVELOPERS   PARTNERS

Acquia Search with non-Latin languages

This documentation page describes features and procedures for a limited availability release, and its contents may change at any time. Acquia does not recommend bookmarking this page as its location may change without notice.

Although the Acquia Search default schema optimizes for English searches, Acquia also provides schemas in other languages. For more information, see Selecting a language. Since most of the Latin-based languages have grammatical and linguistic similarities, you can use them in a straightforward fashion.

Non-Latin languages, such as Chinese, Japanese, and Korean (CJK), use different stemming and spacing rules compared to Latin-based languages. The Solr / Lucene search engine must handle non-latin languages differently than Latin-based languages.

For instructions on how to add support for the CJK languages, see Custom Solr configuration.

Note

The SmartChineseSentenceTokenizerFactory and SmartChineseWordTokenFilterFactory classes are available if your Acquia Search core uses Solr 3.5. If your Acquia Search core uses Solr 4.5.1, use CJKTokenizerFactory instead.

Custom Solr configurations are not yet supported with Solr 7. When they are, Acquia Search with Solr 7 will use CJKBigramFilterFactory. See the following section for more information.

After you have developed and tested your configuration changes, contact Acquia Support, and we will review and deploy the changed files to your Acquia Search index.

More information

For more background information about indexing and searching CJK languages, see the following resources: