Oracle Context Option Application Developer's Guide Go to Product Documentation Library
Library
Go to books for this product
Product
Go to Contents for this book
Contents
Go to Index
Index



Go to previous file in sequence Go to next file in sequence

CHAPTER 13. Linguistic Specifications


This chapter includes reference information for the ConText Option linguistic processing..

The following topics are covered in this chapter:

Predefined Linguistic Setting Configurations

ConText Option provides the following predefined setting configurations that can be used for theme indexing, theme queries, and Linguistic Services requests.

Note: The style parser and the theme parser have been combined into a single parser that delivers the high quality of the style parser with enhanced performance that equals the performance of the theme parser.

As a result, the 'S' setting configurations do not provide any advantage over the other setting configurations and should not be used.

If the 'S' setting configurations are used, ConText Option successfully processes documents through the linguistics (theme indexing or Linguistic Services), but returns an error message for each document. These error messages can be ignored.

Label Definition Notes
GENERIC Full theme parsing
Gist generation
This configuration utilizes information at the word, sentence, and paragraph level to produce theme information.
It also accumulates the paragraph information required for creating Gists.
This configuration is the default for each ConText Server that you start with a Linguistics personality.
S Style parsing
Gist generation
This configuration represents the broadest settings for the system. It analyzes text at the intra-document level, using word position, context, sentence structure, and word relationships to produce in-depth linguistic output.
This configuration by default supports Gist generation, because, during style parsing, the system automatically accumulates the paragraph information required for creating Gists.
This configuration is best used on books, magazines, newspapers, and other edited documents.
SA Style parsing
Gist generation
Case-sensitivity
This configuration is identical to "S", except it processes the text of each document through the case-sensitivity routines.
It should be used only when text is all-uppercase or all-lowercase, or where you are not sure of the accuracy of the case.
P Limited theme parsing This configuration represents the fastest mode that you can specify for the Linguistic Services. It analyzes text only at the document level to produce its theme information.
In this mode, the paragraph information required for generating Gists is not accumulated, so no Gists are generated.
This configuration is best used for web pages, e-mail, lists, and other documents that are not edited and may not contain complete, grammatical sentences.
PP Limited theme parsing
Gist generation
This configuration is identical to "P" except it accumulates the paragraph information required to create Gists.
PS Full theme parsing This configuration is similar to "P" ; however it utilizes information at the word, sentence, and paragraph level to produce theme information.
PSP Full theme parsing
Gist generation
This configuration is identical to "PS" except it accumulates the paragraph information necessary to create Gists.
PSA Full theme parsing
Case-sensitivity
This configuration is identical to "PS", except it processes the text of each document through the case-sensitivity routines.
It should be used only when text is all-uppercase or all-lowercase, or where you are not sure of the accuracy of the case.
PSAP Full theme parsing
Gist generation
Case-sensitivity
This configuration is identical to "PSP", except it processes the text of each document through the case-sensitivity routines.
It should be used only when text is all-uppercase or all-lowercase, or where you are not sure of the accuracy of the case.
Table 13 - 1. Linguistic Services Predefined Setting Configurations (Page 3 of 3)



For more information about setting the label for a setting configuration, see "Specifying Settings and Error Handling" in "Using the Linguistic Services(Chapter 7)"

Linguistic Services Output Table Structure

The output tables store the results returned by the Linguistic Services. The output tables serve only as temporary holding areas for the Linguistic Services output. You modify, augment, or truncate the output into a form best suited for your application.

For more information about generating linguistic output, see "Using the Linguistic Services (Chapter 7)"

Theme Table

The theme results table stores one row for each theme generated by CTX_LING.REQUEST_THEMES.

The table can be named anything, but must include the following columns (with names and datatypes as specified):

Name Type Desc
CID NUMBER Policy ID
PK VARCHAR2(64) Primary key (textkey) for the text table
THEME VARCHAR2(256) Theme phrase
WEIGHT NUMBER Weight of theme phrase, relative to other theme phrases for the document

Composite Textkey Theme Tables

You can use CTX_LING.REQUEST_THEMES to generate themes for a document contained in a composite textkey table. When you do so, the schema of the resulting theme table is the same as for when you request a theme on a single column textkey table, except that the composite textkey result table has additional PK columns.

The number of textkey columns in the theme table match the number of textkey columns in the original text table. The textkey columns in the theme table are named PK1, PK2, PK3, ..., PKN, where N is the number of textkeys in the original text table. N is always less than or equal to 16.

For example, if you request a theme on a text table that had four textkeys, the schema of the output table would be (CID, PK1, PK2, PK3, PK4, THEME, WEIGHT).

The resulting textkey columns in the theme table are populated in the same order as they were registered.

Gist Table

The Gist result table stores one row for each Gist generated by CTX_LING.REQUEST_GIST.

The table can be named anything, but must include the following columns (with names and datatypes as specified):

Name Type Desc
CID NUMBER Policy ID
PK VARCHAR2(64) Primary key (textkey) for the text table
POV VARCHAR2(256) Document point-of-view (theme)
GIST LONG Text (ASCII) of Gist
The value in the POV column for a point-of-view (POV) Gist is a string which identifies the theme used to generate the POV Gist for the document.

The value in the POV column for a generic Gist is the term GENERIC.

Note: GENERIC is the only value that is consistently in all-uppercase. For all other themes in the POV column, the case depends on how the themes were used in the document.

Composite Textkey Gist Tables

You can use CTX_LING.REQUEST_GIST to generate Gists for a document contained in a composite textkey table. When you do so, the schema of the resulting Gist table is the same as for when you request a Gist on a single column textkey table, except that the composite textkey result table has additional PK columns.

The number of textkey columns in the Gist table match the number of textkey columns in the original text table. The textkey columns in the Gist table are named PK1, PK2, PK3, ..., PKN, where N is the number of textkeys in the original text table. N is always less than or equal to 16.

For example, if you request a Gist on a text table that had four textkeys, the schema of the resulting hitlist table is (CID, PK1, PK2, PK3, PK4, POV, GIST).

The resulting textkey columns in the Gist table are populated in the same order as they were registered.




Go to previous file in sequence Go to next file in sequence
Prev Next
Oracle
Copyright © 1996 Oracle Corporation.
All Rights Reserved.
Go to Product Documentation Library
Library
Go to books for this product
Product
Go to Contents for this book
Contents
Go to Index
Index