
CHAPTER 13. Linguistic Specifications
This chapter includes reference information for the ConText Option linguistic processing..
The following topics are covered in this chapter:
Predefined Linguistic Setting Configurations
ConText Option provides the following predefined setting configurations that can be used for theme indexing, theme queries, and Linguistic Services requests.
Note: The style parser and the theme parser have been combined into a single parser that delivers the high quality of the style parser with enhanced performance that equals the performance of the theme parser.
As a result, the 'S' setting configurations do not provide any advantage over the other setting configurations and should not be used.
If the 'S' setting configurations are used, ConText Option successfully processes documents through the linguistics (theme indexing or Linguistic Services), but returns an error message for each document. These error messages can be ignored.
Label
| Definition
| Notes
|
GENERIC
| Full theme parsing
Gist generation
| This configuration utilizes information at the word, sentence, and paragraph level to produce theme information.
It also accumulates the paragraph information required for creating Gists.
This configuration is the default for each ConText Server that you start with a Linguistics personality.
|
S
| Style parsing
Gist generation
| This configuration represents the broadest settings for the system. It analyzes text at the intra-document level, using word position, context, sentence structure, and word relationships to produce in-depth linguistic output.
This configuration by default supports Gist generation, because, during style parsing, the system automatically accumulates the paragraph information required for creating Gists.
This configuration is best used on books, magazines, newspapers, and other edited documents.
|
SA
| Style parsing
Gist generation
Case-sensitivity
| This configuration is identical to "S", except it processes the text of each document through the case-sensitivity routines.
It should be used only when text is all-uppercase or all-lowercase, or where you are not sure of the accuracy of the case.
|
P
| Limited theme parsing
| This configuration represents the fastest mode that you can specify for the Linguistic Services. It analyzes text only at the document level to produce its theme information.
In this mode, the paragraph information required for generating Gists is not accumulated, so no Gists are generated.
This configuration is best used for web pages, e-mail, lists, and other documents that are not edited and may not contain complete, grammatical sentences.
|
PP
| Limited theme parsing
Gist generation
| This configuration is identical to "P" except it accumulates the paragraph information required to create Gists.
|
PS
| Full theme parsing
| This configuration is similar to "P" ; however it utilizes information at the word, sentence, and paragraph level to produce theme information.
|
PSP
| Full theme parsing
Gist generation
| This configuration is identical to "PS" except it accumulates the paragraph information necessary to create Gists.
|
PSA
| Full theme parsing
Case-sensitivity
| This configuration is identical to "PS", except it processes the text of each document through the case-sensitivity routines.
It should be used only when text is all-uppercase or all-lowercase, or where you are not sure of the accuracy of the case.
|
PSAP
| Full theme parsing
Gist generation
Case-sensitivity
| This configuration is identical to "PSP", except it processes the text of each document through the case-sensitivity routines.
It should be used only when text is all-uppercase or all-lowercase, or where you are not sure of the accuracy of the case.
|
Table 13 - 1. Linguistic Services Predefined Setting Configurations
(Page 3 of 3)
For more information about setting the label for a setting configuration, see "Specifying Settings and Error Handling" in "Using the Linguistic Services(Chapter 7)"
Linguistic Services Output Table Structure
The output tables store the results returned by the Linguistic Services. The output tables serve only as temporary holding areas for the Linguistic Services output. You modify, augment, or truncate the output into a form best suited for your application.
For more information about generating linguistic output, see "Using the Linguistic Services (Chapter 7)"
The theme results table stores one row for each theme generated by CTX_LING.REQUEST_THEMES.
The table can be named anything, but must include the following columns (with names and datatypes as specified):
Name
| Type
| Desc
|
CID
| NUMBER
| Policy ID
|
PK
| VARCHAR2(64)
| Primary key (textkey) for the text table
|
THEME
| VARCHAR2(256)
| Theme phrase
|
WEIGHT
| NUMBER
| Weight of theme phrase, relative to other theme phrases for the document
|
Composite Textkey Theme Tables
You can use CTX_LING.REQUEST_THEMES to generate themes for a document contained in a composite textkey table. When you do so, the schema of the resulting theme table is the same as for when you request a theme on a single column textkey table, except that the composite textkey result table has additional PK columns.
The number of textkey columns in the theme table match the number of textkey columns in the original text table. The textkey columns in the theme table are named PK1, PK2, PK3, ..., PKN, where N is the number of textkeys in the original text table. N is always less than or equal to 16.
For example, if you request a theme on a text table that had four textkeys, the schema of the output table would be (CID, PK1, PK2, PK3, PK4, THEME, WEIGHT).
The resulting textkey columns in the theme table are populated in the same order as they were registered.
The Gist result table stores one row for each Gist generated by CTX_LING.REQUEST_GIST.
The table can be named anything, but must include the following columns (with names and datatypes as specified):
Name
| Type
| Desc
|
CID
| NUMBER
| Policy ID
|
PK
| VARCHAR2(64)
| Primary key (textkey) for the text table
|
POV
| VARCHAR2(256)
| Document point-of-view (theme)
|
GIST
| LONG
| Text (ASCII) of Gist
|
The value in the POV column for a point-of-view (POV) Gist is a string which identifies the theme used to generate the POV Gist for the document.
The value in the POV column for a generic Gist is the term GENERIC.
Note: GENERIC is the only value that is consistently in all-uppercase. For all other themes in the POV column, the case depends on how the themes were used in the document.
Composite Textkey Gist Tables
You can use CTX_LING.REQUEST_GIST to generate Gists for a document contained in a composite textkey table. When you do so, the schema of the resulting Gist table is the same as for when you request a Gist on a single column textkey table, except that the composite textkey result table has additional PK columns.
The number of textkey columns in the Gist table match the number of textkey columns in the original text table. The textkey columns in the Gist table are named PK1, PK2, PK3, ..., PKN, where N is the number of textkeys in the original text table. N is always less than or equal to 16.
For example, if you request a Gist on a text table that had four textkeys, the schema of the resulting hitlist table is (CID, PK1, PK2, PK3, PK4, POV, GIST).
The resulting textkey columns in the Gist table are populated in the same order as they were registered.