Language support and configuration

You can now configure a specific language for standard attributes and for queries.

Additional supported languages

Besides English, Endeca Server now supports record and value searches in 21 other languages, such as German, French, Chinese, Japanese, and Russian. For a list of supported language codes, see the Oracle Endeca Server Developer's Guide.

The SearchFilter and ValueSearchConfig types have a Language attribute to specify the language for the search. This per-query language code enables the Dgraph to select the appropriate dictionary for a given query.

Language property for PDRs

Each PDR (Property Description Record) now has an mdex-property_Language field that specifies the language of that standard attribute. The value can either be explicitly set by the user during ingest time or, if not explicitly set, defaults to the global PDR language code for the system. The global PDR language code can be set by the setPropertyDefaultLanguage operation described in the next section.

It is recommended that during ingest time, you explicitly set the mdex-property_Language field for each of your standard attributes.

Setting the global PDR language code in the Configuration Web Service

The Configuration Web Service has been updated to include two new operations: setPropertyDefaultLanguage and getPropertyDefaultLanguage.

The setPropertyDefaultLanguage operation sets the default language for new standard attributes (PDRs) that are created automatically by the Data Ingest Web Service or the Bulk Load Interface. The default language is also used if the mdex-property_Language property is not explicitly set during the creation of a PDR by the Data Ingest Web Service or the Bulk Load Interface. (Note that PDRs created by the Configuration Web Service's putProperties and import operations must be fully and explicitly specified.)

The getPropertyDefaultLanguage operation returns the default language code that is used for PDRs. The language ID will be either unknown (the default) or the language ID that was set by a previous setPropertyDefaultLanguage operation.

Diacritic folding behavior

In version 7.4.x, diacritic folding was turned off by default. However, you could use the Dgraph --latin1 flag to enable diacritic folding.

In version 7.5.x, diacritic folding is the default behavior for all supported languages (including "unknown") during record searches. In addition, the Dgraph --latin1 flag has been removed. Note that you cannot disable this diacritic folding behavior.

Thesaurus and stop word support

The thesaurus feature is supported in version 7.5.x, similar to version 7.4.x. Note, however, that only one global thesaurus is supported for an Endeca data domain. In other words, language-specific thesauruses are not supported (such as one thesaurus for English, a second for French, and so on).

Stop words are supported only for searches that are marked with the "unknown" language identifier.

See the Oracle Endeca Server Developer's Guide for more information on the thesaurus and stop words features.