Oracle Endeca Server support for the Unicode Standard version 4.0 allows an Endeca data domain to process and serve data in many of the world’s languages.
At either data ingest time (or later via a Configuration Web Service operation), you can specify that a given standard attribute will use internationalized data when it is provided in a native encoding. At query time, you can specify the language to be used for the record search or value search.
For more information about the Unicode Standard and character encoding, see http://unicode.org.
Feature | Language support |
---|---|
Auto-correction spelling | Language-specific auto spelling correction is available for supported languages (i.e., spelling dictionaries are available for all supported languages). |
Stemming | Language-specific stemming is available for all supported languages. |
Did You Mean (DYM) suggestions | Language-specific DYM is available for all supported languages. |
Snippeting | Available for all supported languages. |
Thesaurus | One language-agnostic thesaurus is available for use with queries in any of the supported languages (i.e., language-specific thesauruses are not supported). |
Search characters | Available only for the unknown language identifier. |
Stop words | Available only for the unknown language identifier. |
Language auto-detection | Auto-detection of languages at ingest or query time is not supported. The user must explicitly specify the language for the PDR or the query. |
Language collation | Language-specification collation (sorting) is not available for the supported languages. |
Diacritic folding is the default behavior for all supported languages (including "unknown") during record searches. This feature is the automatic mapping of ISO-Latin1 international characters to ASCII equivalents in record search queries. It basically ignores character accents so that search queries containing international characters will match against Anglicized result text. For example, an English query for "café" will match "café" in records. Note that you cannot disable this diacritic folding behavior.