Skip Headers
Oracle® Warehouse Builder User's Guide
10g Release 2 (10.2.0.2)

Part Number B28223-05
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Feedback page
Contact Us

Go to previous page
Previous
Go to next page
Next
View PDF

21 Data Quality Operators

Because reporting on erroneous data wastes time and money, data quality is a key element of Business Intelligence. Include Warehouse Builder data quality operators in your mappings to load clean, accurate records to your targets.

This chapter discusses the Warehouse Builder mapping operators that help you achieve data quality. This chapter contains the following topics:

Using the Match-Merge Operator

This section includes information and examples on how to use the Match-Merge operator in a mapping. The Match-Merge operator can be used with the Name and Address operator to support householding, which is the process of identifying unique households in name and address data.

Example of Matching and Merging Customer Data

Consider how you could utilize the Match-Merge operator to manage a customer mailing list. Use matching to find records that refer to the same person in a table of customer data containing 10,000 rows. For example, you can define a match rule that screens records that have similar first and last names. Through matching you may discover that 5 rows refer to the same person. You can merge those records into one new record. For example, you can create a merge rule to retain the values from the one of the five matched records with the longest address. The newly merged table now contains one record for each customer.

Table 21-1 shows records that refer to the same person prior to using the Match-Merge operator.

Table 21-1 Sample Records

Row First Name Last Name SSN Address Unit Zip

1

Jane

Doe

NULL

123 Main Street

NULL

22222

2

Jane

Doe

111111111

NULL

NULL

22222

3

J.

Doe

NULL

123 Main Street

Apt 4

22222

4

NULL

Smith

111111111

123 Main Street

Apt 4

22222

5

Jane

Smith-Doe

111111111

NULL

NULL

22222


Table 21-2 shows the single record for Jane Doe after using the Match-Merge operator. Notice that the new record retrieves data from different rows in the sample.

Table 21-2 Jane Doe Record After Using Match-Merge Operator

First Name Last Name SSN Address Unit Zip

Jane

Doe

111111111

123 Main Street

Apt 4

22222


Understanding Matching Concepts

When you use Warehouse Builder to match records, you can define a single match rule or multiple match rules. If you create more than one match rule, Warehouse Builder determines two rows match if those rows satisfy any of the match rules. In other words, Warehouse Builder evaluates multiple match rules using OR logic.

Example of Multiple Match Rules

The following example illustrates how Warehouse Builder evaluates multiple match rules using OR logic.

In the top portion of the Match Rules tab, create two match rules as described in Table 21-3:

Table 21-3

Name Position Rule Type Usage Description

Rule_1

1

Conditional

Active

Match SSN

Rule _2

2

Conditional

Active

Match Last Name and PHN


In the lower portion of the tab, assign the details to Rule_1 as described in Table 21-4:

Table 21-4

Attribute Position Algorithm Similarity Score Blank Matching

SSN

1

Exact

0

Do not match if either is blank


For Rule_2, assign the details as described in Table 21-5:

Table 21-5

Attribute Position Algorithm Similarity Score Blank Matching

LastName

1

Exact

0

Do not match if either is blank

PHN

2

Exact

0

Do not match if either is blank


Assume you have the data listed in Table 21-6:

Table 21-6

Row First Name Last Name PHN SSN

A

John

Doe

650-123-1111

NULL

B

Jonathan

Doe

650-123-1111

555-55-5555

C

John

Dough

650-123-1111

555-55-5555


According to Rule_1, rows B and C match. According to Rule_2, rows A and B match. Therefore, since Warehouse Builder handles match rules using OR logic, all three records match.

Example of Transitive Matching

The general rule is, if A matches B, and B matches C, then A matches C. Assign a conditional match rule based on similarity such as described in Table 21-7:

Table 21-7 Conditional Match Rule

Attribute Position Algorithm Similarity Score Blank Matching

LastName

1

Similarity

80

Do not match if either is blank


Assume you have the data listed in Table 21-8:

Table 21-8 Sample Data

Row First Name Last Name PHN SSN

A

John

Jones

650-123-1111

NULL

B

Jonathan

James

650-123-1111

555-55-5555

C

John

Jamos

650-123-1111

555-55-5555


Jones matches James with a similarity of 80, and James matches Jamos with a similarity of 80. Jones does not match Jamos because the similarity is 60, which is less than the threshold of 80. However, because Jones matches James, and James matches Jamos, all three records match (Jones, James, and Jamos).

Designing Mappings with a Match-Merge Operator

Figure 21-1 shows a mapping you can design using a Match-Merge operator. Notice that the Match-Merge operator is preceded by a Name and Address operator, NAMEADDR, and a staging table, CLN_CUSTOMERS. You can design your mapping with or without a Name and Address operator. Preceding the Match-Merge operator with a Name and Address operator provides clean and standardized data before launching time consuming match and merge operations.

Figure 21-1 Match-Merge Operator in a Mapping

Screen capture of Match-Merge operator in a mapping
Description of "Figure 21-1 Match-Merge Operator in a Mapping"

Whether you include a Name and Address operator or not, be aware of the following considerations as you design your mapping:

  • Operating modes: Warehouse Builder operators may accept either set-based or row-based input and generate either set-based or row-based output. SQL is set-based, so a set of records is processed at one time. PL/SQL is row-based, so each row in processed separately. When the Match-Merge operator matches records, it compares each row with the subsequent row in the source and generates row-based code only. A mapping that contains a Match-Merge operator can only run in row-based mode. For more information about set-based and row-based operators, refer to Chapter 8, "Understanding Performance and Advanced ETL Concepts".

  • SQL based operators before Match-Merge: The Match-Merge operator accepts set-based SQL input, but generates only row-based PL/SQL output. Any operators that generate only SQL code must precede the Match-Merge operator. For example, the Joiner, Key Lookup, and Set operators generate set-based SQL output, so they must precede Match-Merge. If set-based operators appear after Match-Merge, then the mapping is invalid.

  • PL/SQL input: The Match-Merge operator requires SQL input except from another Match-Merge operator, as described in "Using Two Match-Merge Operators". If you want to precede a Match-Merge with an operator that generates only PL/SQL output such as the Name and Address operator, you must first load the data into a staging table.

  • Refining Data from Match-Merge operators: To achieve greater data refinement, map the XREF output from one Match-Merge operator into another Match-Merge operator. This scenario is the one exception to the SQL input rule for Match-Merge operators. With additional design elements, the second Match-Merge operator accepts PL/SQL. For more information, see "Using Two Match-Merge Operators".

Using Two Match-Merge Operators

Most match-merge operations can be performed by a single match-merge operator. However, if you are directing the output to two different targets, then you may need to use two match-merge operators in succession.

For example, when householding name and address data, you may need to merge the data first for addresses and then again for names. Assuming you map the MERGE output to a target table, you can map the XREF group to another Match-Merge operator. Although you could map the XREF group to a staging table, this practice can lead to significant loss of performance.

Figure 21-2 shows a mapping that uses two match-merge operators. The XREF group from MM is mapped directly to MM_1. For this mapping to be valid, you must assign the Match ID generated for the first XREF group as the Match Bin rule on the second Match-Merge operator.

Figure 21-2 Householding Data: XREF Group Mapped to Second Match-Merge Operator

Mapping with two Match-Merge operators
Description of "Figure 21-2 Householding Data: XREF Group Mapped to Second Match-Merge Operator"

Match-Merge Wizard and Editor: Name

Use the Name page to specify a name and optional description for the operator. By default, the wizard names the Match-Merge operator MATCHMERGE.

Match-Merge Wizard and Editor: Groups

Use the Groups page to enter customized names and descriptions of the input and output groups.

Group

Lists the predefined input and output groups. You can rename the groups, but you cannot add or delete groups in the Match-Merge operator.

Direction

Identifies whether the group is for input or output. The Match-Merge operator accepts one SQL input group and generates two PL/SQL output groups. The MERGE group has the merged data. The XREF group is an optional group for documenting the merge process.

Description

Enter an optional description.

Match-Merge Wizard and Editor: Input Connections

Use the Input Connections page to select attributes for the input group (INGRP1).

To complete the Input connections page for an operator:

  1. Select complete groups or individual attributes from the left panel.

    To search for a specific attribute or group by name, type the text in Search for and click Go. To find the next match, click Go again.

    Hold the Shift key down to select multiple groups or attributes. If you want to select attributes from different groups, you must first combine the groups with a Joiner or Set operator.

  2. Use the left to right arrow button in the middle of the page to move your selections to the right side of the wizard page.

    You can use the right to left arrow to move groups or attributes from the input connections list. Warehouse Builder removes the selection from the input group and removes the mapping line between the source operator and the current operator.

Match-Merge Wizard and Editor: Input Attributes

Use the Input Attributes page to assign input roles to each input attribute.

Attribute

Automatically lists the attributes that you selected on the Input Connections page. Click Add to add a new input attribute.

Input Role

Input roles indicate what kind of information resides in a line of data. For each attribute, select the input role that most closely matches the data contained in the source attribute.

Data Type, Length, Precision, Scale, Seconds Precision

Attributes are set automatically to NUMBER.

Description

Enter an optional description of the input attributes.

Add

Adds a row so that you can define a new attribute.

Delete

Deletes the selected attribute.

Match-Merge Wizard and Editor: Merge Output

Use this page to specify the attributes for the output MERGE group. The MERGE group produces a consolidated record from the attributes you selected.

Source Attributes

Lists all attributes defined for this Match-Merge operator. Use the shuttle buttons to move selected attributes to the Output Attributes field.

Output Attributes

Lists attributes selected for the output MERGE group.

Match-Merge Wizard and Editor: Cross-Reference Output

Use the Cross-Reference Output page to optionally select attributes for the XREF group. Although the Match-Merge operator creates the XREF group by default, you have the option of adding attributes to the group or leaving it empty.

You can use the XREF group to document the merge process. Create a foreign key relationship between the original data set and the new merged data set. Then send the attributes from the XREF group to a table that records the corresponding source row for each merged row.

Alternatively, you can use the XREF group as input to a second Match-Merge operator. By using two operators, you can direct the merged output from a set of attributes to two different target. For an example, refer to "Using Two Match-Merge Operators".

Source Attributes

Lists the input attributes (INGRP1) and an XREF output attribute for each MATCH output attribute. The XREF attributes are distinguished by a prefix, which has a default value of MM_.

Output Attributes

The attributes that you want to cross-reference. To move selected attributes between the Source Attributes and Output Attributes lists, use the shuttle keys.

Merge Prefix

The prefix used to distinguish cross-reference output from data output.

Set Prefix

Changes the prefix on XREF attributes in both lists to the value displayed in the Merge Prefix field.

Match-Merge Wizard and Editor: Match Bins

Use the Match Bins page to limit the number of rows to be compared. When Warehouse Builder matches the rows, it compares each row with the subsequent row for all rows within the same grouping. Limiting the number of rows can greatly enhance performance, because Warehouse Builder searches for matches only within a bin and not throughout the entire data set.

Ideally, keep the number of rows in each grouping under 2000. The number of comparisons Warehouse Builder performs is based on the following formula:

n=(b*(b-1))/2 

where n is the number of comparisons, and b is the number of records in a bin.

For example, matching 5 records requires 10 comparisons, matching 50 records requires 1,225 comparisons, and matching 500 records requires 124,750 comparisons.

While you want to define Match Bins that separate rows into manageable groupings, you also want to avoid separating rows that should be matched. The attributes you select for grouping similar rows depends on your data. For example, if you have a table of customer addresses with a million rows, you may want to group the data by partial street name, city name, and zip code.

Available Attributes

Lists all input attributes, from which you can select the ones to use for binning.

Selected Attributes

One or more attributes that must match for a row to be included in a particular bin. To move attributes between the Source Attributes and Output Attributes lists, select one or more attributes and click the arrow keys located between the two lists. Use the arrow keys at the right to order the attributes from the more general at the top (such as Country) to the more specific at the bottom (such as Street).

Match New Records Only

After the first deployment, you can choose whether to match and merge all records or only new records. You do not want to match and merge the same data twice because of the impact on performance. Instead, you can just match and merge the new, uncleansed data. This option enables you to add uncleansed data into the data warehouse.

New Record Condition

Displays the conditional expression used to identify new records. Click the Ellipsis button to display the Match New Record Condition Editor (also known as the Expression Builder User Interface).

Match-Merge Wizard and Editor: Match Rules

The Match Rules are used to identify duplicate records, even though some fields may have different values. You can define match rules for a single attribute or multiple attributes in the operator. On the Match Rules tab, create match rules at the top of the page. In the lower portion of Match Rules tab, specify the details for each match rule. Table 21-9 describes the match rules.

If you create more than one match rule, Warehouse Builder determines a match when those two rows satisfy any of the match rules. In other words, Warehouse Builder evaluates multiple match rules using OR logic. This is indicated by the OR icon to the left of each additional row. For more information, see "Understanding Matching Concepts".

Name

An arbitrary name for the rule. Warehouse Builder creates a default name such as MA_0 and MA_1 for each match rule. You can replace these names with meaningful ones. Meaningful names are particularly helpful when referencing the rules from a custom PL/SQL program.

Position

The order of execution. You can change the position of a rule by clicking on the row header and dragging the row to its new location. The row headers are the boxes to the left of the Name column.

Rule Type

Assign one of the rule types listed in Table 21-9. When you select a rule type, the lower portion of the Match Rules tab activates and you can enter details for the match rule.

Usage

You can designate a match rule as either active or passive.

  • Active: Warehouse Builder executes a match rule if you designate it as active. If more than one match rule is active, each is evaluated in turn until a match is found or until all rules are evaluated. If any rule identifies a match, those records are considered a match.

  • Passive: Warehouse Builder does not directly execute passive match rules. It only executes passive rules when they are called through an active custom match rule. All defined match rules appear in a list of available functions in the Custom Match Rule Editor.

Description

Displays an optional description, which you can enter.

Descriptions of Match Rules

Table 21-9 describes the types of match rules.

Table 21-9 Match Rule Descriptions

Match Rule Description

All Match

Matches all the rows within the match bin.

None Match

Turns off matching. No rows match within the match bin.

Conditional

Matches rows based on an algorithm you select. For details, see "Conditional Match Rule".

Weight

Matches rows based on scores that you assign to the attributes. For details, see "Weight Match Rule".

Person

Matches records based on people's names. For details, see "Person Match Rule".

Firm

Matches records based on business names. For details, see "Firm Match Rule".

Address

Matches records based on postal addresses. For details, see "Address Match Rule".

Custom

Create a custom comparison algorithm. Select Edit to launch the Custom Match Rule Editor. For more information, see "Custom Match Rule".


Conditional Match Rule

Use the Conditional Match Rule to combine multiple attribute comparisons into one composite rule. When you assign multiple attributes for comparison, all the comparisons must be true for the records to be considered a match. Warehouse Builder displays the AND icon in the left-most column of subsequent conditions.

Attribute

Identifies the attribute that will be tested for a particular condition. You can select from any input attribute (INGRP1).

Position

The order of execution. You can change the position of a rule by clicking on the row header and dragging the row to its new location. The row headers are the boxes to the left of the Attribute column.

Algorithm

A list of methods that can be used to determine a match. Table 21-10 describes the algorithms.

Similarity Score

The minimum similarity value required for two strings to match, as calculated by the Edit Distance, Standardized Edit Distance, Jaro-Winkler, or Standardized Jaro-Winkler algorithms. Enter a value between 0 and 100. A value of 100 indicates an exact match, and a value of 0 indicates no similarity.

Blank Matching

Lists options for handling empty strings in a match.

Defining a Conditional Match Rule

To define a Conditional match rule, complete the following steps:

  1. On the top portion of the Match Rules tab, select Conditional for the rule type.

    The operator displays a Details section at the bottom of the tab.

  2. Click Add to add a new row.

  3. Select an attribute.

  4. Select an algorithm. Refer to Table 21-10 for descriptions.

  5. Specify a similarity score for the Edit Distance, Standardized Edit Distance, Jaro-Winkler, or Standardized Jaro-Winkler algorithms.

  6. Select a method for handling blanks.

Algorithms for Conditional Match Rules

Table 21-10 describes the algorithms available for defining conditional match rules.

Table 21-10 Algorithms for Conditional Match Rules

Algorithm Description

Exact

Matches values only when they are exactly the same.

For example, 'Dog' and 'dog!' do not match because the second string is not capitalized and contains an extra character. For numeric, date, and other non-character data types, this is the only type of comparison allowed.

Standardized Exact

Eliminates case, spaces, and non-alphanumeric characters before comparing for an exact match.

For example, 'Dog' and 'dog!' do match.

Soundex

Matches phonetically similar stings. The operator converts the strings to phonetic codes. If the codes match, then the two strings match. This algorithm is often used for matching names. It is an older phonetic algorithm than Double Metaphone.

The Soundex algorithm ignores case and spaces. It basically retains the first letter, eliminates vowels, and replaces consonants with numbers. Consonants with similar sounds have the same numeric value.

Edit Distance

Matches strings with a similarity value equal to or greater than the Similarity Score that you specify. This algorithm is often used for correcting typographical errors such as transposed characters.

The Edit Distance algorithm calculates the number of deletions, insertions, or substitutions required to transform one string into another. A similarity value of 100 indicates that the two strings are identical. A value of zero indicates no similarity whatsoever.

For example, if the string 'tootle' is compared with the string 'tootles', then the edit distance is 1. The length of the string 'tootles' is 7. The similarity value is therefore 6/7*100 or 85.

Standardized Edit Distance

Eliminates case, spaces, and non-alphanumeric characters before using the Edit Distance algorithm to determine a match.

Partial Name

Matches strings when one string is contained in the other, starting with the first word. The algorithm performs a Standardized Exact comparison on the entire string before attempting to match a partial name.

For example, 'Midtown Power' matches 'Midtown Power and Light,' but would not match 'Northern Midtown Power'.

Abbreviation

Matches strings when one string contains words that are abbreviations of corresponding words in the other.

The operator first performs a Standardized Exact comparison on the entire string. It then looks for abbreviations for each word. If the larger of the words contains all of the letters from the shorter word and the letters appear in the same order as the shorter word, then the words are considered a match.

For example, 'Intl. Business Products' matches 'International Bus Prd'.

Acronym

Matches strings if one string is an acronym for the other.

The operator first performs a Standardized Edit Distance comparison on the entire string. If no match is found, then each word of one string is compared to the corresponding word in the other string. If the entire word does not match, each character of the word in one string is compared to the first character of each remaining word in the other string. If the characters are the same, the names match.

For example, 'Chase Manhattan Bank NA' matches 'CMB North America.' The comparison ignores case, non-alphanumeric characters, and noise words such as 'and' and 'the.'

Jaro-Winkler

Matches strings based on their similarity value using an improved comparison system over the Edit Distance algorithm. It accounts for the length of the strings and penalizes more for errors at the beginning. It also recognizes common typographical errors. The strings match when their similarity value is equal to or greater than the Similarity Score that you specify.

A similarity value of 100 indicates that the two strings are identical. A value of zero indicates no similarity whatsoever. Note that the value actually calculated by the algorithm (0.0 to 1.0) is multiplied by 100 to correspond to the Edit Distance scores.

Standardized Jaro-Winkler

Eliminates case, spaces, and non-alphanumeric characters before using the Jaro-Winkler algorithm to determine a match.

Double Metaphone

Matches phonetically similar strings using an improved coding system over the Soundex algorithm. It generates two codes for strings that could be pronounced in multiple ways. If the primary codes match for the two strings, or if the secondary codes match, then the strings match.

The Double Metaphone algorithm accounts for alternate pronunciations in Italian, Spanish, French, and Germanic and Slavic languages. Unlike the Soundex algorithm, Double Metaphone encodes the first letter, so that 'Kathy' and 'Cathy' evaluate to the same phonetic code.


Weight Match Rule

Use this rule to match rows based on a weight value. A weighted match rule is most useful when comparing a large number of attributes. This rule can prevent a single attribute from invalidating a match, which results from the AND logic of conditional rules.

Warehouse Builder compares each attribute using a similarity algorithm that returns a score between 0 and 100 to represent the similarity between the rows. For two rows to be considered a match, the total counts must be greater than the overall score you designate.

Similarity Algorithm

The method used to determine a match. Choose from these algorithms:

  • Edit Distance: Calculates the number of deletions, insertions, or substitutions required to transform one string into another.

  • Jaro-Winkler: Uses an improved comparison system over the Edit Distance algorithm. It accounts for the length of the strings and penalizes more for errors at the beginning. It also recognizes common typographical errors.

Attribute

Identifies the attribute that will be tested for a particular condition. You can select from any input attribute (INGRP1).

Maximum Score

The weight value for the attribute. This value should be greater than the value of Required Score to Match.

Score When Blank

The similarity value when one of the records is empty.

Required Score to Match

A value that represents the similarity required for a match. A value of 100 indicates that the two values are identical. A value of zero indicates there is no similarity.

Using the Weight Match Rule

To use the Weight match rule, complete the following steps:

  1. On the Match Rules tab, select Weight as the Rule Type.

    The Details tab is displayed at the bottom of the page.

  2. Select Add at the bottom of the page to add a new row.

  3. For each row, select an attribute to add to the rule.

  4. In Maximum Score, assign a weight to each attribute. Warehouse Builder compares each attribute using a similarity algorithm that returns a score between 0 and 100 to represent the similarity between the rows.

  5. In Required score to match, assign an overall score for the match.

    For two rows to be considered a match, the total counts must be greater than the Required score.

Example of a Weight Match Rule

Assume you want to apply the Weight match rule to the data in Table 21-11.

Table 21-11 Example Records for Matching

Record Number Attr_1 Attr_2

Rec_1

CA

QQ

Rec_2

CA

QQ

Rec_3

CA

QR


For Maximum score, assign a value of 50 to both Att_1 and Att_2. Assign a value of 80 for the Required score to match. You can expect the following results:

  • Rec_1 is the new record. The operator reads it first.

  • In Rec_2, the value for Attr_1 is CA. That value has a similarity of 100 with the value in the new record, Rec_1. Since the weight value for Attr_1 is 50, its score is 50 (100% of 50).

  • In Rec_2, the value for Attr_2 is QQ and has a similarity of 100. The weight value for Attr_2 is also 50 and its score is therefore 50 (100% of 50). The total maximum score is 100 (50 + 50). This equals or exceeds the value of the Required score for match, so Rec_2 and Rec_1 match.

  • In Rec_3, Attr_1 is CA and has a similarity of 100 with Rec_1. Since the weight value for Attr_1 is 50, its weighted score is 50 (100% of 50).

  • In Rec_3, the value for Attr_2 is QR and that has a similarity of 50. The maximum value for Attr_2 is 50, so its score is 25 (50% of 50). The total weighted score is 75 (50+25). This is less than the value of the Required score to match. Therefore, Rec_3 and Rec_1 do not match.

Person Match Rule

Use the Person match rule to match records based on names. Matching by names is most effective when you first correct the address data using the Name and Address operator before the Match-Merge operator.

When you select the Person match rule, the Match Rules page displays the Person Attributes and Details tabs.

Person Attributes Tab

  • Eligible Attributes: Lists all input attributes.

  • Attributes: The attributes containing parts of names. Use the shuttle keys to move attributes from Eligible Attributes to the Attributes column of Name Roles.

  • Name Roles: Lists the roles for different parts of a name. Select the appropriate role for each attribute. Table 21-12 describes the roles.

Table 21-12 Name Roles for Person Match Rules

Role Description

Prename

The operator compares prenames only if First Name Standardized is blank for one of the records, the 'Mrs.' option is selected, and the Last Name and any Middle Name role match. Given that criteria, the operator matches the record 'Mrs. William Webster' with 'Mrs. Webster'.

First Name Standardized

First names match if both are blank. A blank first name will not match a non-blank first name unless the Prename role has been assigned and the 'Mrs. Match' option is set.

Middle Name Standardized, Middle Name 2 Standardized, Middle Name 3 Standardized

The operator compares and cross compares any of the middle names assigned. By default, the middle names must match exactly. Middle names match if either or both are blank. To assign any of the middle name roles, you must also assign the First Name Standardized role.

Last Name

The operator assigns last names as matching if both are blank and not matching if only one is blank.

Maturity Post Name

This is the same as post names such as 'Jr.' and 'Sr.'. The operator assigns these as matching if the values are exact or if either is blank.


Person Details Tab

Use the Details tab to set options for determining a match. Table 21-13 lists the rule options you can select for each component of the name. For a description of the algorithms, refer to Table 21-10.

Table 21-13 Options for Person Match Rule

Option Description

Detect switched name order

Detects switched name orders such as matching 'Elmer Fudd' to 'Fudd Elmer'. You can select this option if you selected First Name and Last Name roles for attributes on the Person Attributes tab.

Match on initials

Matches initials to names such as 'R.' and 'Robert'. You can select this option for first name and middle name roles.

Match on substrings

Matches substrings to names such as 'Rob' to 'Robert'. You can select this option for first name and middle name roles.

Similarity Score

Uses a similarity score to determine a match, as calculated by the Edit Distance or Jaro-Winkler algorithms. Enter a value between 0 and 100 as the minimum similarity value required for a match. A value of 100 requires an exact match, and a value of 0 requires no similarity whatsoever.

Match on Phonetic Codes

Determines a match using either the Soundex or the Double Metaphone algorithms.

Detect compound name

Matches compound names to names such as 'De Anne' to 'Deanne'. You can select this option for the first name role.

"Mrs" Match

Matches prenames to first and last names such as 'Mrs. Washington' to 'George Washington'. You can select this option for the prename role.

Match hyphenated names

The operator matches hyphenated names to unhyphenated names such as 'Reese-Jones' to 'Reese'. You can select this option for the last name role.

Detect missing hyphen

The operator detects missing hyphens, such as matching 'Hillary Rodham Clinton' to 'Hillary Rodham-Clinton'. You can select this option for the last name role.


Defining a Person Match Rule

To define a Person match rule, complete the following steps:

  1. On the Match Rules tab, select Person as the Rule Type.

    The Person Attributes tab and Details tab are displayed at the bottom of the page.

  2. In the left panel of the Person Attributes tab, select the attributes that describe a full name and use the arrow key to move them to Name Roles Attributes.

  3. For each attribute, select the role it plays in a name. You must define either the Last Name or First Name Standardized for the match rule to be effective. See Table 21-12 for the types of roles you can assign.

  4. Select the Details tab and select the applicable options as listed in Table 21-13.

Firm Match Rule

Use the Firm match rule to match records by business name. This type of match is most effective when you first correct the address data using the Name and Address operator before the Match-Merge operator.

When you select the Firm match rule, the Match Rules page displays the Firm Attributes and Details tabs.

Firm Attributes tab

Use the Firm Attributes tab to identify attributes that contain business names.

  • Eligible Attributes: Lists all input attributes.

  • Attributes: The attributes containing the names of businesses. Move one or two attributes containing business names from Eligible Attributes to the Attributes column of Firm Roles.

  • Firm Roles: Lists two roles, Firm 1 and Firm 2. If you selected one attribute, then designate Firm 1 as its role. If you selected two attributes, then designate one of them as Firm 1 and the other as Firm 2.

Firm Details Page

Use the Details tab to set options for determining a match. By default, the operator compares the values in Firm 1 for exact matches.

Table 21-14 lists the rule options you can select for each component of the name. For a description of the algorithms, refer to Table 21-10.

Table 21-14 Options for Firm Rules

Option Description

Strip noise words

Removes words such as 'and' and 'the.'

Cross-match firm 1 and firm 2

Attempts to find matches between the Firm 1 attribute and the Firm 2 attribute.

Match on partial firm name

Uses the Partial Name algorithm to determine a match.

Match on abbreviations

Uses the Abbreviation algorithm to determine a match.

Match on acronyms

Uses the Acronym algorithm to determine a match.

Similarity score

Uses a similarity score to determine a match, as calculated by the Edit Distance or Jaro-Winkler algorithms. Enter a value between 0 and 100 as the minimum similarity value required for a match. A value of 100 requires an exact match, and a value of 0 requires no similarity whatsoever.


Defining a Firm Match Rule

To define a Firm match rule, complete the following steps:

  1. On the Match Rules tab, select Firm as the Rule Type.

    The Firm Attributes tab and Details tab are displayed at the bottom of the page.

  2. In the left panel of the Firm Attributes tab, select one or two attributes that represent the firm name and click the right shuttle button.

    The attributes move to the Firm Roles box.

  3. For each attribute, click Role Required. From the list, select Firm 1 for the first attribute, and Firm 2 for the second attribute, if it exists.

  4. On the Details tab, select the applicable options.

Address Match Rule

Use the Address match rule to match records based on postal addresses.

Matching by address is most effective when you first correct the address data using the Name and Address operator before the Match-Merge operator. The Name and Address operator identifies addresses as existing in a postal matching database and designates the records with the Is Found flag. The Match-Merge operator processes addresses with the Is Found role faster because the data is known to be syntactically correct, legal, and existing.

Address Attributes tab

Use the Address Attributes tab to identify attributes that contain addresses.

  • Eligible Attributes: Lists all input attributes.

  • Attributes: The attributes containing parts of addresses. Use the shuttle keys to move attributes containing address information from Eligible Attributes to the Attributes column of Address Roles.

  • Address Roles: Lists the various parts of an address. Select the one that most closely matches each attribute. Table 21-15 describes the address roles.

Table 21-15 Address Roles

Role Description

Primary Address

A street address such as 100 Main Street or a post office box such as PO Box 100. Assign this role to one attribute; otherwise the match rule is invalid.

Unit Number

Suite numbers, floor numbers, or apartment numbers for the Primary Address.

For addresses with matching primary addresses, the operator compares unit numbers. If both unit numbers are blank, then they match. If only one unit number is blank, they match only when you selected the Match on blank secondary address option.

PO Box

A post office box number in the Primary Address.

The operator compares the post office box number with the number portion of the primary address, when the primary address is a PO Box. When the primary address represents a street address, the PO Box number is blank.

Dual Primary Address

A second address, which may represent either an additional location or a former location.

For addresses with matching primary addresses, the operator compares the dual primary addresses

Dual Unit Number

Suite numbers, floor numbers, or apartment numbers for the Dual Primary Address.

The operator compares the Dual Unit Number in one record with the Unit Number and Dual Unit Number of another record. Unit numbers match when one or both are blank.

Dual PO Box

A post office box number in the Dual Primary Address.

The operator compares the Dual PO Box in one record with both the PO Box and Dual PO Box of another record.

City

Assign this role only when also assigning the State role.

For uncorrected address data, the operator compares each City.

For addresses already corrected by the Name and Address operator, Match-Merge only compares City when the postal codes do not match. If both City and State match, then the operator compares the address roles. Cities match when both are blank but not when only one is blank.

State

Assign this role only when also assigning the City role.

For uncorrected address data, the operator compares each State.

For addresses already corrected by the Name and Address operator, Match-Merge only compares States when the postal codes do not match. If both City and State match, then the operator compares the address roles. States match when both are blank but not when only one is blank.

Postal Code

For uncorrected address data, the operator does not use Postal Code.

For addresses already corrected by the Name and Address operator, Match-Merge only compares each Postal Code. If the codes match, then the operator compares the address roles. If the codes do not match, then the operator compares City and State to determine if it should compare address roles such as Primary Address.

Is Found

Assign this role to an address that you previously cleansed and standardized with the Name and Address operator. The Name and Address operator marks records with the Is Found flag when it identifies the address as part of a country postal matching database.


Address Details Page

Use the Details tab to set options for determining a match. Table 21-16 describes the options you can assign to Address Roles.

Table 21-16 Options for Address Roles

Option Description

Allow differing secondary address

Matches addresses with different unit numbers.

Match on blank secondary address

Matches addresses with one blank unit number.

Match on either street or post office box

Matches records if either the street address or the post office box match.

Address line similarity

Uses a similarity score to determine a match in the address lines, as calculated by the Edit Distance or Jaro-Winkler algorithms. Enter a value between 0 and 100 as the minimum similarity value required for a match. A value of 100 requires an exact match, and a value of 0 requires no similarity whatsoever.

In an address, the street name is the Primary Address and the unit number is the Secondary Address. Address Line Similarity evaluates both the primary and secondary addresses.

Last line similarity

Uses a similarity score to determine a match in the City, State, and Postal Code attributes, as calculated by the Edit Distance or Jaro-Winkler algorithms. Enter a value between 0 and 100 as the minimum similarity value required for a match. A value of 100 requires an exact match, and a value of 0 requires no similarity whatsoever.


Defining an Address Match Rule

To define an Address match rule, complete the following steps:

  1. On the Match Rules tab, select Address as the Rule Type.

    The Address Attributes tab and Details tab are displayed at the bottom of the page.

  2. In the left panel of the Address Attributes tab, select the attribute that represents the primary address. Use the right shuttle key to move it to the Address Roles Attributes column.

  3. Click Role Required and designate that attribute as the Primary Address.

    You must perform this step. If you do not assign the Primary Address role, the match rule is invalid.

  4. Add other attributes and designate their roles as necessary. See Table 21-15 for the types of roles you can assign.

  5. Select the Details tab and select the applicable options as listed in Table 21-16.

Custom Match Rule

Use the Custom match rule to execute your own PL/SQL program as a comparison algorithm.

Match Rules Detail

Displays the PL/SQL code composing your custom algorithm. You can edit code directly in this field or use the Custom Match Rule Editor.

Edit

Displays the Custom Match Rule Editor.

Custom Match Rule Editor

Use the Custom Match Rule Editor to develop custom PL/SQL programs to use as comparison algorithms.

The editor provides basic program development support. It consists of these components:

Menu Bar

The menu bar contains the following menus:

  • Code: Enables you to read from and write to files on your local computer network, save your editing changes, and so forth.

  • Edit: Provides basic cut, copy, and paste functionality.

  • Search: Provides basic search and replace functionality.

  • Test: Checks for syntax errors in the code entered in the Implementation field.

  • Help: Displays this topic.

Search For Field

All or part of a function name you wish to find in the navigation tree. Click Go to search for the first instance, click Go again to find the next instance, and so forth.

Navigation Tree

Lists input parameters and transformations that you can include in your program. The navigation tree contains these folders:

  • Match Functions: List both active and passive rules that you have defined. You can call these rules the same as any other PL/SQL function.

  • Parameters: Lists all input attributes under two subfolders named THIS_ and THAT_. Your program can compare two rows in the same attribute or different ones.

  • Transformation Library: Lists all Warehouse Builder transformations.

To insert a function, parameter, or transformation into your code at the cursor, double-click or drag-and-drop it into the Implementation field.

Implementation Field

Displays your program code.

Definition Tab

Displays the signature of the selected function.

Messages Tab

Displays information about the success or failure of validating the code.

Defining a Custom Match Rule

To define a Custom match rule, complete the following steps:

  1. On the Match Rules tab, select Custom as the Rule Type.

    A Details field is displayed at the bottom of the page with the skeleton of a PL/SQL program.

  2. Click Edit to open the Custom Match Rules Editor.

  3. To enter PL/SQL code, do any combination of the following:

    • To read in a file, choose Open File from the Code menu.

    • To enter text, first position the cursor using the mouse or arrow keys, then begin typing. You can also use the commands on the Edit and Search menus.

    • To reference any function, parameter, or transformation in the navigation tree, first position the cursor, then double-click or drag-and-drop the object onto the Implementation field.

  4. To validate your code, choose Validate from the Test menu.

    The validation results appear on the Messages tab.

  5. To save your code, choose Save from the Code menu.

  6. To close the Custom Match Rules Editor, choose Close from the Code menu.

Merge Rules Page

Use the Merge Rules page to select values for the attributes in the merged record.

Name

An arbitrary name for the rule. Warehouse Builder creates a default name such as ME_0 for each merge rule. You can replace these names with meaningful ones. Meaningful names are particularly helpful when calling rules from a PL/SQL custom rule.

Position

The order of execution. You can change the position of a rule by clicking on the row header and dragging the row to its new location. The row headers are the boxes to the left of the Name column.

Rule Type

Assign one of the rule types listed in Table 21-17. When you select a rule type, the lower portion of the Merge Rules tab activates and you can enter details for the merge rule.

Attribute

Identifies the attribute whose values are selected by the merge rule. Provides a list of all merged attributes.

When you select a Record rule, Warehouse Builder disables the list of attributes. You select multiple attributes on the Attributes tab, which appears on the lower portion of the page.

Description

An optional description of the rule, which you can enter.

Descriptions of Merge Rules

When you define a merge rule, you can define one rule for all the attributes in the merged record or define a rule for each attribute. For instance, if the merged record is a customer record, it may have attributes such as ADDRESS1, ADDRESS2, CITY, STATE, and ZIP. You can write five rules that select the value of each attribute from up to five different records, or one Record rule that selects that values of all five attributes from one record. Use record rules when multiple attributes compose a logical unit, such as an address. For example, City, State, and Zip Code might be three different attributes, but the data for these attributes should all come from the same record.

Table 21-17 describes the types of merge rules.

Table 21-17 Merge Rule Types

Merge Rule Description

Any

Uses the first non-blank value.

Match ID

Merges records that have already been output from another Match-Merge operator. See Match ID Merge Rule for details.

Rank

Uses the ranked values in a second attribute to select the preferred value. See Rank and Rank Record Merge Rules for details.

Sequence

Uses the values in a sequence to generate unique keys as the data is loaded. See Sequence Merge Rule for details.

Min Max

Uses the first value based on the order of another attribute. See Min Max and Min Max Record Merge Rules for details.

Copy

Uses the values from another merged attribute. See Copy Merge Rule for details.

Custom

Uses the PL/SQL code that you provide as the criteria for merging records. See Custom and Custom Record Merge Rules for details.

Any Record

Identical to the Any rule, except that an Any Record rule applies to multiple attributes.

Rank Record

Identical to the Rank rule, except that a Rank Record rule applies to multiple attributes

Min Max Record

Identical to the Min Max rule, except that a Min Max Record rule applies to multiple attributes.

Custom Record

Identical to the Custom rule, except that a Custom Record rule applies to multiple attributes.


Match ID Merge Rule

Use the Match ID merge rule to merge records that have already been output in the XREF group from another Match-Merge operator. No other operator is valid for this type of input. For more information, refer to "Using Two Match-Merge Operators".

Next Value of the Sequence

Identifies the sequence that will be used by the rule.

sequences list

Lists all sequences defined in the current project.

Select Sequence

Sets the sequence for the rule to the sequence currently selected in the list. Move a sequence from the sequences list to Select Sequence.

Rank and Rank Record Merge Rules

Use the Rank and Rank Record rules when merging data from multiple sources. These rules enable you to identify your preference for certain sources. Your data must have a second input attribute on which the rule is based.

For example, the second attribute might identify the data source, and these data sources are ranked in order of reliability. The most reliable value would be used in the merged record. The merge rule might look like this:

INGRP1.SOURCE = 'Order Entry'

Name

An arbitrary name for the rule. Warehouse Builder creates a default name such as RULE_0 for each rank merge rule. You can replace these names with meaningful ones.

Position

The order of execution. You can change the position of a rule by clicking on the row header and dragging the row to its new location. The row headers are the boxes to the left of the Name column.

Expression Record Selection

The custom SQL expression used in the ranking. Click the Ellipsis button to display the Rank Rule Editor (also called the Expression Builder User Interface). Use this editor to develop the ranking expression.

Sequence Merge Rule

The Sequence rule uses the next value in a sequence.

Next Value of the Sequence

Identifies the sequence that will be used by the rule.

sequences list

Lists all sequences defined in the current project.

Select Sequence

Sets the sequence for the rule to the sequence currently selected in the list.

Min Max and Min Max Record Merge Rules

The Min Max and Min Max Record rules select an attribute value based on the size of another attribute value in the record.

For example, you might select the First Name value from the record in each bin that contains the longest Last Name value.

Selecting Attribute

Lists all input attributes. Select the attribute whose values provide the order.

Attribute Relation

Select the characteristic for choosing a value in the selected attribute.

  • Minimum. Selects the smallest numeric value or the oldest date value.

  • Maximum. Selects the largest numeric value or the most recent date value.

  • Shortest. Selects the shortest character value.

  • Longest. Selects the longest character value.

Copy Merge Rule

The Copy rule uses the values from another merged attribute.

Merged Attribute

Lists the other merged attributes, which you selected on the Merge Attributes page.

Custom and Custom Record Merge Rules

The Custom and Custom Record rules use PL/SQL code that you provide to merge the records. The following is an example of a Custom merge rule, which returns the value of the TAXID attribute for record 1.

BEGIN
RETURN M_MATCHES(1)."TAXID";
END;

Following is an example of a Custom Record merge rule, which returns a record for record 1:

BEGIN
RETURN M_MATCHES(1);
END;

Merge Rules Detail

Displays the PL/SQL code composing your custom algorithm. You can edit code directly in this field or use the Custom Merge Rule Editor.

Edit

Displays the Custom Merge Rule Editor.

Custom Merge Rule Editor

Use the Custom Merge Rule Editor to develop custom PL/SQL programs to use as merging algorithms.

The editor provides basic program development support. It consists of these components:

Menu Bar

The menu bar contains the following menus:

  • Code: Enables you to read from and write to files on your local computer network, save your editing changes, and so forth.

  • Edit: Provides basic cut, copy, and paste functionality.

  • Search: Provides basic search and replace functionality.

  • Test: Checks for syntax errors in the code entered in the Implementation field.

  • Help: Displays this topic.

Search For Field

All or part of a function name you wish to find in the navigation tree. Click Go to search for the first instance, click Go again to find the next instance, and so forth.

Navigation Tree

Lists input parameters and transformations that you can include in your program. The navigation tree contains these folders:

  • Parameters: Lists all input attributes under the M_MATCHES folder and the merged attributes under the M_MERGE folder.

  • Transformation Library: Lists all Warehouse Builder transformations.

To insert a parameter or a transformation into your code at the cursor, double-click or drag-and-drop it into the Implementation field.

Implementation Field

Displays your program code.

Definition Tab

Displays the signature of the selected function.

Messages Tab

Displays information about the success or failure of validating the code.

Using the Name and Address Operator in a Mapping

The Name and Address operator accepts one PL/SQL input and generates one PL/SQL output.

If you experience time-out errors, you may need to increase the socket time-out setting of the Name and Address Server. The time-out setting is the number of seconds the server will wait for a parsing request from a mapping before the server drops a connection. The default setting is 600 seconds (10 minutes). After the server drops a connection because of inactivity, subsequent parsing requests fail with a NAS-00021 error.

For most mappings, long time lapses between parsing requests are rare. However, maps operating in row based mode with a filter operator may have long time lapses between record parsing requests, because of the inefficiency of filtering records in row based mode. For this type of mapping, you may need to increase the socket time-out value to prevent connections from being dropped.

To increase the socket time-out setting, refer to "Configuring the Name and Address Server".

Name and Address Wizard and Editor: General

Use the General page to specify a name and optional description for the operator. By default, the wizard names the Name and Address operator NAMEADDR.

Name and Address Wizard and Editor: Definitions

Characterize the nature of your input data by assigning general definitions to this Name and Address operator. In the Definitions page, select a Parsing Type, Primary Country, and Dual Address Assignment.

Parsing Type

Select one of the following parsing types from the drop-down list:

Note:

You can only specify the parsing type when you first add the Name and Address operator to your mapping. You cannot modify the parsing type in the editor.
  • Name Only: Select this option when the input data contains only name data. Names can include both personal and business names. Selecting this option instead of the more generic Name and Address option may improve performance and accuracy, depending on the adapter.

  • Address Only: Select this option when the input data contains only address data and no name data. Selecting this option instead of the more generic Name and Address option may improve performance and accuracy, depending on the adapter.

  • Name and Address: Select this option when the input data contains both name and address data.

Primary Country

Select the country that best represents the country distribution of your data. The primary country is used by some providers of name and address cleansing software as a hint for the appropriate parser or parsing rules to use on the initial parse of the record. For other name and address service providers, external configuration of their installation controls this behavior.

Dual Address Assignment

A dual address contains both a Post Office (PO) box and a street address for the same address record. For records that have dual addresses, your selection determines which address becomes the normal address and which address becomes the dual address. A sample dual address is:

PO Box 2589
4439 Mormon Coulee Rd
La Crosse WI 54601-8231

Note that the choice for Dual Address Assignment affects which postal codes are assigned during postal code correction, because the street address and PO box address may correspond to different postal codes.

  • Street Assignment: The street address is the normal address and the PO Box address is the dual address. This means that the Address component is assigned the street address. In the preceding example, the Address is 4439 MORMON COULEE RD. This choice corrects the postal code to 54601-8220.

  • PO Box Assignment: The PO Box address is the normal address and the street address is the dual address. This means that the Address component is assigned the Post Office (PO) box address. In the preceding example, the Address is PO BOX 2589. This choice corrects the postal code to 54602-2589.

  • Closest to Last Line: Whichever address occurs closest to the last line is the normal address; the other is the dual address. This means that the Address component is assigned the address line closest to the last line. In the preceding example, the Address is 4439 MORMON COULEE RD. This choice corrects the postal code to 54601-8220.

This option has no effect for records having a single street or PO box address.

Note:

Dual Address Assignment may not be supported by all name and address cleansing software providers.

Name and Address Wizard and Editor: Groups

The Groups page lists the input and output groups defined for an operator. By definition, the Name and Address operator has one input group and one output group. You cannot edit, add, or delete groups in the Name and Address operator. The input group is called INGRP1 and the output group is OUTGRP1. You can edit these names. If the input data requires multiple groups, create a separate Name and Address operator for each group.

Name and Address Wizard and Editor: Input Connections

Use the Input Connections page to select attributes from any operator in your mapping that you want to copy and map into the operator. The Available Attributes box lists the available attributes. The Mapped Attributes box lists the attributes that will be processed by the Name and Address operator. You can move an entire group of attributes, or individual attributes from a single group.

If you have not created one or more operators for the source data yet, the Available Attributes column will be empty.

To complete the Input Connections page for an operator:

  1. Select complete groups or individual attributes from the Available Attributes panel. The Available Attributes panel enables you to select attributes from any operator in your mapping.

    To search for a specific attribute or group by name, type the text in Search for and click Go. To find the next match, click Go again.

    Hold the Shift key down to select multiple groups or attributes. If you want to select attributes from different groups, you must first combine the groups with a Joiner or Set operator.

  2. Use the left to right arrow button between the two panels to move your selections to the Mapped Attributes panel.

Name and Address Wizard and Editor: Input Attributes

Use the Input Attributes page to assign input roles to each input attribute.

Attribute

Automatically lists the attributes that you selected on the Input Connections page. Otherwise, the list may be empty. Click Add to add each input attribute.

Map all attributes through the Name and Address operator, including those that you do not wish to process. Bypassing the Name and Address operator was valid in earlier releases of Warehouse Builder, but may cause problems now. Instead, assign these attributes the Pass Through input role in the Name and Address operator.

Input Role

Input roles indicate what kind of name or address information resides in a line of data. For each attribute, select the input role that most closely matches the data contained in the source attribute. Refer to "Input Role Descriptions" for a complete list of input roles and their descriptions.

Whenever possible, choose discrete roles (such as City, State, and Postal Code) rather than non-discrete, line-oriented roles (such as Last Line). Discrete roles give the Name and Address operator more information about the data content and result in better parsing.

Data Type, Length, Precision, Scale, Seconds Precision

Set the data type and related parameters for attributes given the Pass Through input role. Attributes with other input roles are set automatically to VARCHAR2 and cannot be changed.

Description

Enter an optional description of the input attributes.

Input Role Descriptions

Table 21-18 describes the input roles for the Name and Address Operator.

Table 21-18 Name and Address Operator Input Roles

Input Role Description

Pass Through

Any attribute that requires no processing.

First Name

First name, nickname, or shortened version of the first name.

Middle Name

Middle name or initial. Use when there is only one middle name, or for the first of several middle names; for example, 'May' in Ethel May Roberta Louise Mertz.

Middle Name 2

Second middle name; for example, 'Roberta' in Ethel May Roberta Louise Mertz.

Middle Name 3

Third middle name; for example, 'Louise' in Ethel May Roberta Louise Mertz.

Last Name

Last name or surname.

First Part Name

First part of the Person name, including:

  • Pre name

  • First name

  • Middle name(s)

Use when these components are contained in one source column.

Last Part Name

Last part of Person Name, including:

  • Last name

  • Post Name

Use when these components are all contained in one source column.

Pre Name

Information that precedes and qualifies the name; for example, Ms., Mr., or Dr.

Post Name

Generation or other information qualifying the name; for example, Jr. or Ph.D.

Person

Full person name, including:

  • First Part Name (consisting of Pre Name, First Name, and Middle Names)

  • Last Part Name (consisting of Last Name and Post Name)

Use when these components are all contained in one source column.

Person 2

Designates a second person if the input includes multiple personal contacts.

Person 3

Designates a third person if the input includes multiple personal contacts.

Firm Name

Name of the company or organization.

Primary Address

Box, route, or street address, including:

  • Street name

  • House number

  • City map grid direction; for example, SW or N

  • Street type; for example, Avenue, Street, or Road.

This does not include the Unit Designator or the Unit Number.

Secondary Address

The second part of the street address, including:

  • Unit Designator

  • Unit Number

For example, in a secondary address of Suite 2100, the Unit Designator is STE (a standardization of 'Suite') and the Unit Number is 2100.

Address

Full address line, including:

  • Primary Address

  • Secondary Address

Use when these components share one column.

Address 2

Generic address line.

Neighborhood

Neighborhood or barrio, common in South and Latin American addresses.

Locality Name

The city (shi) or island (shima) in Japan.

Locality 2

The ward (ku) in Japan.

Locality 3

The district (machi) or village (mura) in Japan.

Locality 4

The subdistrict (aza, bu, chiwari, or sen) in Japan.

City

Name of city.

State

Name of state or province.

Postal Code

Postal code, such as a ZIP code in the United States or a postal code in Canada.

Country Name

Full country name.

Country Code

The ISO 3166-1993 (E) two- or three-character country code. For example, US or USA for United States; CA or CAN for Canada.

Last Line

Last address line, including:

  • City

  • State or province

  • Postal code

Use when these components are all contained in one source column.

Last Line 2

For Japanese adaptors, specifies additional line information that appears at the end of an address.

Line1... Line10

Use for free-form name, business, personal, and address text of any type. These roles do not provide the parser with any information about the data content. Whenever possible, use the discrete input roles provided instead.


Name and Address Wizard and Editor: Output Attributes

Use the Output Attributes page to define output attributes that determine how the Name and Address operator handles parsed data. Specifically, the output attribute properties characterize the data extracted from the parser output.

The Output Attributes page is empty initially. You can create and edit attributes.

Note:

The attributes for output components with the Pass Through role cannot be changed.

Add button

To create a new output attribute, click Add.

Attribute column

A new attribute has a default name such as OUTPUT1. This default name changes when you select an output component to a default descriptive name, such as Primary_Address. It does not change if you already replaced the name. Click the name to replace it with a new name.

Output Component column

Select an output component for every output attribute. Click the Ellipsis button to the right of the cell to open the Output Attribute Components dialog. See "Descriptions of Output Components" for a complete list of output components.

Be sure to add error handling flags, such as Is Parsed, Is Good Name, and Is Good Address. These flags can be used with the Splitter operator to separate good records from records with errors, and load them into different targets.

Data Type column

Pass Through output components retain their input data type. All other output component types are VARCHAR2. This column is read-only.

Length column

Adjust the field length to match the length of the target attribute to which you intend to map the output attribute. This practice helps prevent data truncation warnings during code generation, or errors during execution.

Output Attribute Components Dialog Box

Use the Output Attribute Components dialog to select a component for each output attribute. The output components indicate which component an attribute constitutes, such as the first name, last name, street name, city, or state.

Select an Output Component

Select a component from the navigation tree to apply to the output attribute. See "Descriptions of Output Components" for a description of these components.

Output Component

Identifies the component currently selected from the tree. If this field is empty, the current selection is a folder. Expand the folder and select a valid component. Note that some folders are valid components.

Address Type

Available only for dual addresses, and not supported by all name and address cleansing software providers. The Dual Address Assignment option you specified in the Definitions Page page determines whether the street address or the PO box address is used as the dual address. Select either NORMAL or DUAL. For more information on dual addresses, see "Dual Address Assignment".

Instance

Specify which instance of an output component to use when there are multiple occurrences of the same attribute in a single record. The instance control applies to all name components and several address components, such as Miscellaneous Address and Complex. This setting enables you to extract numerous attributes of the same nature. The number of instances allowed for various components depends on the third-party name and address cleansing software.

For example, an input record containing John and Jane Doe has two name occurrences: John Doe and Jane Doe. You can extract John Doe by assigning Instance 1 to the First Name and Last Name components. Similarly, you can extract Jane Doe by assigning Instance 2 to the First Name and Last Name components.

Descriptions of Output Components

Output components are grouped in the following categories:

Pass Through

The Pass Through output component is for any attribute that requires no processing. When you create a Pass Through input role, the corresponding Pass Through output component is created automatically. You cannot edit a Pass Through output component, but you can edit the corresponding input role.

Name

Table 21-19 describes the Name output components. Many components can be used multiple times to process a record, as noted in the table. For example, in records with two occurrences of Firm Name, you can extract both by adding two output attributes. Assign one as the First instance, and the other as the Second instance.

Table 21-19 Name Output Components

Subfolder Output Component Description

None

Pre Name

Title or salutation appearing before a name; for example, Ms. or Dr.

Can be used multiple times.

None

First Name Standardized

Standard version of first name; for example, Theodore for Ted or James for Jim. Can be used multiple times.

None

Middle Name Standardized

Standardized version of the middle name; for example, Theodore for Ted or James for Jim. Use when there is only one middle name, or for the first of several middle names. Can be used multiple times.

None

Middle Name 2 Standardized

Standardized version of the second middle name; for example, Theodore for Ted or James for Jim. Can be used multiple times.

None

Middle Name 3 Standardized

Standardized version of the third middle name; for example, Theodore for Ted or James for Jim. Can be used multiple times.

None

Post Name

Name suffix indicating generation; for example, Sr., Jr., or III. Can be used multiple times.

None

Other Post Name

Name suffix indicating certification, academic degree, or affiliation; for example, Ph.D., M.D., or R.N.

Can be used multiple times.

None

Title

Personal title, for example, Manager.

None

Name Designator

Personal name designation; for example, ATTN (to the attention of) or C/O (care of). Can be used multiple times.

None

Relationship

Information related to another person; for example, Trustee For. Can be used multiple times.

None

SSN

Social security number.

None

Email Address

E-mail address.

None

Phone Number

Telephone number.

None

Name/Firm Extra

Extra information associated with the firm or personal name.

None

Person

First name, middle name, and last name. Can be used multiple times.

Person

First Name

The first name found in the input name. Can be used multiple times.

Person

Middle Name

Middle name or initial. Use this for a single middle name, or for the first of several middle names; for example, 'May' in Ethel May Roberta Louise Mertz. Can be used multiple times.

Person

Middle Name 2

Second middle name; for example, 'Roberta' in Ethel May Roberta Louise Mertz. Can be used multiple times.

Person

Middle Name 3

Third middle name; for example, 'Louise' in Ethel May Roberta Louise Mertz. Can be used multiple times.

Person

Last Name

Last name or surname. Can be used multiple times.

Derived

Gender

Probable gender:

  • M = Male

  • F = Female

  • N = Neutral (either male or female)

  • Blank = Unknown

Can be used multiple times.

Derived

Person Count

Number of persons the record references; for example, a record with a Person name of 'John and Jane Doe' has a Person Count of 2.

Business

Firm Name

Name of the company or organization, including divisions. Can be used multiple times.

Business

Firm Count

Number of firms referenced in the record. Can be used multiple times.

Business

Firm Location

Location within a firm; for example, Accounts Payable


Address

Table 21-20 describes the Address output components. In records with dual addresses, you can specify which line is used as the Normal Address (and thus assigned to the Address component) and which is used as the Dual Address for many output components, as noted in the table.

Table 21-20 Address Output Components

Subfolder Output Component Description

None

Address

Full address line, including:

  • Primary Address

  • Secondary Address

Can be used as the Normal Address or the Dual Address.

None

Primary Address

Box, route, or street address, including:

  • Street name

  • House number

  • City map grid direction; for example, SW or N

  • Street type; for example, Avenue, Street, or Road.

Does not include the Unit Designator or the Unit Number. Can be used as the Normal Address or the Dual Address.

Primary Address

Street Number

Number that identifies the address, such as a house or building number, sometimes referred to as the primary range. For example, in 200 Oracle Parkway, the Street Number value is 200. Can be used as the Normal Address or the Dual Address.

Primary Address

Pre Directional

Street directional indicator appearing before the street name; for example, in 100 N University Drive, the Pre Directional value is 'N'. Can be used as the Normal Address or the Dual Address.

Primary Address

Street Name

Name of street. Can be used as the Normal Address or the Dual Address.

Primary Address

Primary Name 2

Second street name, often used for addresses at a street intersection.

Primary Address

Street Type

Street identifier; for example, ST, AVE, RD, DR, or HWY. Can be used as the Normal Address or the Dual Address.

Primary Address

Post Directional

Street directional indicator appearing after the street name; for example, in 100 15th Ave. S., the Post Directional value is 'S'. Can be used as the Normal Address or the Dual Address.

None

Secondary Address

The second part of the street address, including:

  • Unit Designator

  • Unit Number

For example, in a secondary address of Suite 2100, Unit Designator is 'STE' (a standardization of 'Suite') and Unit Number is '2100'. Can be used as the Normal Address or the Dual Address.

Secondary Address

Unit Designator

Type of secondary address, such as APT or STE. For example, in a secondary address of Suite 2100, Unit Designator is 'STE' (a standardization of 'Suite'). Can be used as the Normal Address or the Dual Address.

Secondary Address

Unit Number

A number that identifies the secondary address, such as the apartment or suite number. For example, in a secondary address of Suite 2100, Unit Number is '2100'. Can be used as the Normal Address or the Dual Address.

Secondary Address

Non-postal Secondary Address

A secondary address that is not in official postal format.

Secondary Address

Non-postal Unit Designator

A unit designator that is not in official postal format.

Secondary Address

Non-postal Unit Number

A unit number that is not in official postal format.

Address

Last Line

Final address line, including:

  • City

  • state, province, or county

  • Formatted postal code if the address was fully assigned

Last Line

Neighborhood

Neighborhood or barrio, common in South and Latin American addresses.

Last Line

City

Name of city. The U.S. city names may be converted to United States Postal Service preferred names.

Last Line

City Abbreviated

Abbreviated city name, composed of 13 characters for the United States.

Last Line

City Abbreviated 2

Alternative abbreviation for the city name.

Last Line

Alternate City

An alternate name for a city that may be referenced by more than one name. In the United States, a city may be referenced by its actual name or the name of a larger urban area. For example, Brighton Massachusetts may have Boston as an alternate city name.

Last Line

Locality Code

The last three digits of the International Mailsort Code, which represents a geographical region or locality within each country. Locality Codes are numeric in the range 000-999.

Last Line

Locality Name

In the United Kingdom, the following address is assigned Locality Name KNAPHILL:

Chobham Rd
Knaphill
Woking GU21 2TZ

Last Line

Locality 2

The ward (ku) in Japan.

Last Line

Locality 3

The district (machi) or village (mura) in Japan.

Last Line

Locality 4

The subdistrict (aza, bu, chiwari, or sen) in Japan.

Last Line

County Name

The name of a county in the United Kingdom, United States, or other country.

Last Line

State

Name of state or province.

Last Line

Postal Code

Full postal code with spaces and other non-alphanumeric characters removed.

Last Line

Postal Code Formatted

Formatted version of postal code that includes spaces and other non-alphanumeric characters, such as dashes.

Last Line

Delivery Point

A designation used in the United States and Australia.

  • For the United States, this is the two-digit postal delivery point, which is combined with a full nine-digit postal code and check digit to form a delivery point bar code.

  • For Australia, this is a nine-digit delivery point.

Last Line

Country Code

The ISO 3166-1993 (E) two-character country code, as defined by the International Organization for Standardization; for example, 'US' for United States or 'CA' for Canada.

Last Line

Country Code 3

The ISO 3166-1993 (E) three-character country code, as defined by the International Organization for Standardization; for example, 'USA' for United States, 'FRA' for France, or 'UKR' for Ukraine.

Last Line

Country Name

The full country name.

Address

Address 2

A second address line, typically used for Hong Kong addresses that have both a street address and a building or floor address.

Address

Last Line 2

Additional information that appears at the end of an address in Japan.

Other Address Line

Box Name

The name for a post office box address; for example, for 'PO Box 95', the Box Name is 'PO BOX'. Can be used as the Normal Address or the Dual Address.

Other Address Line

Box Number

The number for a post office box address; for example, for 'PO Box 95', the Box Number is '95'. Can be used as the Normal Address or the Dual Address.

Other Address Line

Route Name

Route name for a rural route address. For an address of 'Route 5 Box 10', the Route Name is 'RTE' (a standardization of 'Route'). Can be used as the Normal Address or the Dual Address.

Other Address Line

Route Number

Route number for a rural route address. For an address of 'Route 5 Box 10', the Route Number is '5'. Can be used as the Normal Address or the Dual Address.

Other Address Line

Building Name

Building name, such as 'Cannon Bridge House'. Building names are common in the United Kingdom.

Other Address Line

Complex

Building, campus, or other complex. For example,

USS John F. Kennedy
Shadow Green Apartments
Cedarvale Gardens
Concordia College

You can use an the Instance field in the Output Components dialog to specify which complex should be returned in cases where an address has more than one complex.

Other Address Line

Miscellaneous Address

Miscellaneous address information.

In records with multiple miscellaneous fields, you can extract them by specifying which instance to use in the Output Components page.

Geography

Latitude

Latitude in degrees north of the equator: Positive for north of the equator; negative for south (always positive for North America).

Geography

Longitude

Longitude in degrees east of the Greenwich Meridian: positive for east of GM; negative for west (always negative for North America).

Geography

Geo Match Precision

Indicates how closely the location identified by the latitude and longitude matches the address.


Extra Vendor

Twenty components are open for vendor-specified uses.

Error Status

Table 21-21 describes the Error Status output components. Refer to "Handling Errors in Name and Address Data" for usages notes on the Error Status components.

Table 21-21 Error Status Output Components

Subfolders Output Component Description

Name and Address

Is Good Group

Indicates whether the name group, address group, or name and address group was processed successfully.

  • T =

    For name groups, the name has been successfully parsed.

    For address groups, the address has been found in a postal matching database if one is available, or has been successfully parsed if no postal database is installed.

    For name and address groups, both the name and the address have been successfully processed.

  • F = The group was not parsed successfully.

Using this flag in conjunction with another flag, such as the Is Parsed flag, followed by the Splitter operator, enables you to isolate unsuccessfully parsed records in their own target, where you can address them separately.

Name and Address

Is Parsed

Indicates whether the name or address was parsed:

  • T = The name or address was parsed successfully, although some warning conditions may have been flagged.

  • F = The name or address cannot be parsed.

Check the status of warning flags such as Name Warning or City Warning.

Name and Address

Parse Status

Postal matching software parse status code.

Name and Address

Parse Status Description

Text description of the postal matching software parse status.

Name Only

Is Good Name

Indicates whether the name was parsed successfully:

  • T = The name was parsed successfully, although some warning conditions may have been flagged.

  • F = The name cannot be parsed.

Name Only

Name Warning

Indicates whether the parser found unusual or possibly erroneous data in a name:

  • T = The parser had difficulty parsing a name or found unusual data. Check the Parse Status component for the cause of the warning.

  • F = No difficulty parsing name.

Address Only

Is Good Address

Indicates whether the address was processed successfully:

  • T = Successfully processed. Either the address was found in the postal matching database or, if no postal matching database is installed for the country indicated by the address, the address was successfully parsed.

  • F = Not successfully processed. If a postal matching database is installed for the country indicated by the address, the address was not found in the database. If no postal matching database is available for the country, the address cannot be parsed.

Use this component when you have a mix of records from both postal-matched and non-postal-matched countries.

Address Only

Is Found

Indicates whether the address is listed in the postal matching database for the country indicated by the address:

  • T = The address was found in a postal matching database.

  • F = The address was not found in a postal matching database. This status may indicate either that the address is not a legal address, or that postal matching is not available for the country.

This flag is true only if all of the other 'Found' flags are true. If postal matching is available, this flag is the best indicator of record quality.

Address Only: Is Found

City Found

T = The postal matcher found the city; otherwise, F.

Address Only: Is Found

Street Name Found

T = The postal matcher found the street name; otherwise, F.

Address Only: Is Found

Street Number Found

T = The postal matcher found the street number within a valid range of numbers for the named street, otherwise, F.

Address Only: Is Found

Street Components Found

T = The postal matcher found the street components, such as the Pre Directional or Post Directional; otherwise, F.

Address Only: Is Found

Non-ambiguous Match Found

Indicates whether the postal matcher found a matching address in the postal database:

  • T = The postal matcher found a match between the input record and a single entry in the postal database.

  • F = The address is ambiguous. The postal matcher found that the address matched several postal database entries and could not make a selection. For example, if the input address is '100 4th Avenue,' but the postal database contains '100 4th Ave N' and '100 4th Ave S,' the input's missing directional causes the match to fail.

Address Only

City Warning

T = The parser found unusual or possibly erroneous data in a city; otherwise, F.

Address Only

Street Warning

T = The parser found unusual or possibly erroneous data in a street address otherwise, F.

Address Only

Is Address Verifiable

T = Postal matching is available for the country of the address; otherwise, F.

F does not indicate whether or not a postal matching database is installed for the country in the address. It only indicates that matching is not available for a particular address.

Address Only

Address Corrected

Indicates whether the address was corrected in any way during matching. Standardization is not considered correction in this case.

  • T = Some component of the address was changed, aside from standardization. One of the other Corrected flags must also be true.

  • F = No components of the address were changed, with the possible exception of standardization.

Address Only: Address Corrected

Postal Code Corrected

T = The postal code was corrected during matching, possibly by the addition of a postal extension; otherwise, F.

Address Only: Address Corrected

City Corrected

T = The city name was corrected during matching; otherwise, F.

Postal code input is used to determine the city name preferred by the postal service.

Address Only: Address Corrected

Street Corrected

T = The street name was corrected during matching; otherwise, F.

Some correct street names may be changed to an alternate name preferred by the postal service.

Address Only: Address Corrected

Street Components Corrected

T = One or more street components, such as Pre Directional or Post Directional, were corrected during matching.

Address Only

Address Type

Type of address. The following are common examples; actual values vary with vendors of postal matching software:

  • B= Box

  • F = Firm

  • G= General Delivery

  • H= High-rise apartment or office building

  • HD= High-rise default, where a single Zip+4 postal code applies to the entire building. The Name and Address operator can detect a finer level of postal code assignment if a floor or suite address is provided, in which case the record is treated as an H type, with a more specific Zip+4 code for that floor or suite.

  • M= Military

  • P= Post Office Box

  • R= Rural Code

  • S= Street

Address Only

Parsing Country

Country parser that was used for the final parse of the record.


Country-Specific

Table 21-22 describes the output components that are specific to a particular country.

Table 21-22 Country-Specific Output Components

Subfolder Output Component Description

United States

ZIP5

The five-digit United States postal code.

United States

ZIP4

The four-digit suffix that is added to the five-digit United States postal code to further specify location.

United States

Urbanization Name

Urban unit name used in Puerto Rico.

United States

LACS Flag

T = Address requires a LACS conversion and should be submitted to a LACS vendor; otherwise, F.

The Locatable Address Conversion System (LACS) provides new addresses when a 911 emergency system has been implemented. 911 address conversions typically involve changing rural-style addresses to city-style street addresses, but they may involve renaming or renumbering existing city-style addresses.

United States

CART

Four-character USPS Carrier route.

United States

DPBC Check Digit

Check digit for forming a delivery point bar code.

United States

Automated Zone Indicator

T = The mail in this zip code is sorted by bar code sorting equipment; otherwise, F.

United States

Urban Indicator

T = An address is located within an urban area; otherwise, F.

United States

Line of Travel

United States Postal Service (USPS) line of travel

United States

Line of Travel Order

United States Postal Service (USPS) line of travel order

United States: Census/Geography

Metropolitan Statistical Area

Metropolitan Statistical Area (MSA) number. For example, '0000' indicates that the address does not lie within any MSA, and typically indicates a rural area.

United States: Census/Geography

Minor Census District

Minor Census District.

United States: Census/Geography

CBSA Code

A 5-digit Core Based Statistical Area code that identifies metropolitan and micropolitan areas.

United States: Census/Geography

CBSA Descriptor

Indicates whether the CBSA is metropolitan (population of 50,000 or more) or micropolitan (population of 10,000 to 49,999).

United States: Census/Geography

FIPS Code

The complete (state plus county) code assigned to the county by the Federal Information Processing Standard (FIPS). Because FIPS county codes are unique within a state, a complete FIPS Code includes the two-digit state code followed by the three-digit county code.

United States: Census/Geography

FIPS County

The three-digit county code as defined by the Federal Information Processing Standard (FIPS).

United States: Census/Geography

FIPS Place Code

The five-digit place code as defined by the Federal Information Processing Standard (FIPS).

United States: Geography

Census ID

United States Census tract and block-group number. The first six digits are the tract number; the final digit is the block-group number within the tract. These codes are used for matching to demographic-coding databases.

Canada

Installation Type

A type of Canadian postal installation:

  • STN= Station

  • RPO = Retail Postal Outlet

For example, for the address, 'PO Box 7010, Scarborough ON M1S 3C6,' the Installation Type is 'STN'.

Canada

Installation Name

Name of a Canadian postal installation. For example, for the address, 'PO Box 7010, Scarborough ON M1S 3C6,' the Installation Name is 'AGINCOURT'.

Hong Kong

Delivery Office Code

A mailing code used in Hong Kong. For example, the following address is assigned the Delivery Office Code 'WCH':

Oracle
39/F The Lee Gardens
33 Hysan Ave
Causeway Bay

Hong Kong

Delivery Beat Code

A mailing code used in Hong Kong. For example, the following address is assigned the Delivery Beat Code 'S06':

Oracle
39/F The Lee Gardens
33 Hysan Ave
Causeway Bay

Name and Address Wizard and Editor: Postal Reporting Page

Postal reporting applies only to countries that support address correction and postal matching. Country certification varies with different vendors of name and address cleansing software. The most common country certifications are United States, Canada, and Australia. The process provides mailers a common platform to measure the quality of address-matching software, focusing on the accuracy of postal codes (in the case of the United States, of five-digit ZIP Codes and ZIP+4 Codes), delivery point codes, and carrier route codes applied to all mail. Some vendors of name and address cleansing software may ignore these parameters and require external setup for generating postal reports. For more information, see "About Postal Reporting".

Postal Report

Select Yes for a postal report for the Primary Country you chose in the Definitions Page page. Only one postal report can be active.

Processor Name

The use of this field varies with vendors of name and address cleansing software. Typically, this value appears on the United States Coding Accuracy Support System (CASS) report.

List Name

An optional reference field that appears on the United States and United Kingdom reports under the List Name section, but is not included in other reports. The list name provides a reference for tracking multiple postal reports; for example, 'July 2005 Promotional Campaign'.

Processor Address Lines

These address lines may appear on various postal reports. Various name and address cleansing software vendors use these fields differently. They often contain the full address of your company.

About Postal Reporting

All address lists used to produce mailings for discounted automation postal rates must be matched by postal report-certified software. Certifications depend on the third-party vendors of name and address software and data. The certifications may include the following:

  • United States Postal Service: Coding Accuracy Support System (CASS)

  • Canada Post: Software Evaluation and Recognition Program (SERP)

  • Australia Post: Address Matching Approval System (AMAS)

United States Postal Service CASS Certification

The Coding Accuracy Support System (CASS) was developed by the United States Postal Service (USPS) in cooperation with the mailing industry. The system provides mailers a common platform to measure the quality of address-matching software, focusing on the accuracy of five-digit ZIP Codes, ZIP+4 Codes, delivery point codes, and carrier route codes applied to all mail. All address lists used to produce mailings for automation rates must be matched by CASS-certified software.

To meet USPS requirements, the mailer must submit a CASS report in its original form to the USPS.

Canada Post SERP Certification

Canada Post developed a testing program called Software Evaluation and Recognition Program (SERP), which evaluates software packages for their ability to validate, or validate and correct, mailing lists to Canada Post requirements. Postal programs that meet SERP requirements are listed on the Canada Post Web site.

Canadian postal customers who use Incentive Lettermail, Addressed Admail, and Publications Mail must meet the Address Accuracy Program requirements. Customers can obtain a Statement of Accuracy by comparing their databases to Canada Post's address data.

Australia Post AMAS Certification

The Address Matching Approval System (AMAS) was developed by Australia Post to improve the quality of addressing. It provides a standard by which to test and measure the ability of address-matching software to:

  • Correct and match addresses against the Postal Address File (PAF)

  • Append a unique Delivery Point Identifier (DPID) to each address record, which is a step toward barcoding mail.

AMAS allows companies to develop address matching software which:

  • Prepares addresses for barcode creation

  • Ensures quality addressing

  • Enables qualification for discounts on PreSort letter lodgements

PreSort Letter Service prices are conditional upon customers using AMAS Approved Software with Delivery Point Identifiers (DPIDs) being current against the latest version of the PAF.

A declaration that the mail was prepared appropriately must be made when using the Presort Lodgement Document, available from post offices.

Managing the Name and Address Server

An external Name and Address server provides an interface between Oracle Database and third-party name and address processing libraries. This section discusses methods of configuring, starting, and stopping the Name and Address Server.

Configuring the Name and Address Server

The Name and Address operator generates PL/SQL code, which calls the UTL_NAME_ADDR package installed in the Runtime Schema. A private synonym, NAME_ADDR, is defined in the target schema to reference the UTL_NAME_ADDR package. The UTL_NAME_ADDR package calls Java packages, which send processing requests to an external Name and Address server, which then interfaces with third-party Name and Address processing libraries, such as Trillium.

You can use the server property file, NameAddr.properties, to configure server options. This file is located in owb/bin/admin under the Oracle home of your Oracle Warehouse Builder server-side installation. The following code illustrates several important properties with their default settings.

TraceLevel=0
SocketTimeout=180
ClientThreads=4
Port=4040

The TraceLevel property is often changed to perform diagnostics on server communication and view output from the postal matching program parser. Other properties are rarely changed.

  • TraceLevel: Enables output of file NASvrTrace.log in the owb/bin/admin folder. This file shows all incoming and outgoing data, verifies that your mapping is communicating with the Name and Address Server, and that the Name and Address Server is receiving output from the service provider. The trace log shows all server input and output and is most useful for determining whether any parsing requests are being made by an executing mapping. Set TraceLevel=1 to enable logging. However, tracing degrades performance and creates a large log file. Set TraceLevel=0 to disable logging for production.

  • SocketTimeOut: Specifies the number of seconds the name/address server will wait for a parsing request before closing the connection. You can increase this time to 1800 (30 minutes) when running concurrent mappings to prevent timing out.

  • ClientThreads: Specifies the number of threads used to service client connections. One client connection is made for each database session or slave session if a map is parallelized. Most maps are parallelized, and the number of parallel processes is proportional to the number of processors. On a single processor computer, two parallel processes are spawned for large maps. On a four processor computer, up to eight processes may be spawned. Parallelism may also be controlled by database initialization settings such as Sessions.

    For the best performance, set ClientThreads to the maximum number of clients that will be connected simultaneously. The actual number of connected clients is recorded in NASvr.log after a map run. You should increase the value of ClientThreads when the number of client connections shown in the log is greater.

    When the number of clients exceeds the number of threads, all clients are still serviced because the threads are shared among clients.

  • Port: Specifies the port on which the server listens and was initially assigned by the installer. This value may be changed if the default port conflicts with another process. If the port is changed, the port attribute must also be changed in the runtime_schema.nas_connection table to enable the utl_name_addr package to establish a connection.

Starting and Stopping the Name and Address Server

Whenever you edit the properties file or perform table maintenance, you must stop and restart the Name and Address Server for the changes to take effect.

To manually stop the Name and Addresss Server:

  • In Windows, run OWB_ORACLE_HOME/owb/bin/win32/NAStop.bat.

  • In UNIX, run OWB_ORACLE_HOME/owb/bin/unix/NAStop.sh.

You can automatically restart the Name and Address Server by invoking a mapping in Warehouse Builder. You can also restart the server manually.

To manually restart the Name and Address Server:

  • In Windows, run OWB_ORACLE_HOME/owb/bin/win32/NAStart.bat.

  • In UNIX, run OWB_ORACLE_HOME/owb/bin/unix/NAStart.sh.