The following topics describe how to specify language identifiers
for your application.
In an Oracle Endeca Server application, language can be specified in two
places:
- The language of a standard
or managed attribute can be specified in the PDR of that attribute.
- The language of a search
query can be specified with search configuration options.
Keep in mind that the following language-identification scenarios are
not supported:
- A global language identifier
(for all of your data and queries) is not supported. However, you can set a
global PDR language code that is used when PDRs are automatically created by
the DIWS (Data Ingest Web Service) and Bulk Load interfaces.
- A per-record language
identifier is not supported. Language codes can be set only on attributes, not
on records.
- The use of multiple language
identifiers for a single search query is not supported. That is, each query can
have a maximum of one language identifier, which means that the language can
vary on a per-query basis. A per-query language identifier should be used in
your front-end application if the language varies on a per-query basis.
Language effect on documents and searches
For every document, the language of the corresponding PDR determines:
- How it is tokenized
- How it is normalized
- In what language word
forms are returned for its terms
- Which language's wordform
expansion indexes do the returned forms contribute to
- Which language's spelling
dictionary its terms contribute to
For every search, the language configured on the search determines:
- How it is tokenized
- How it is normalized
- In what language are word
forms returned for its terms
- Which language's spelling
dictionary to use for spelling-related re-queries