Skip Headers
Oracle® Warehouse Builder User's Guide
10g Release 1 (10.1)

Part Number B12146-02
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Feedback page
Contact Us

Go to previous page
Previous
Go to next page
Next
View PDF

20 Data Quality: Name and Address Cleansing

This chapter contains additional information about using the Name and Address operator, discussed in Chapter 8, "Using Mapping Operators". This chapter contains the following topics:

Input Roles

Input roles indicate what kind of name or address information resides in a line of data. Each input attribute in the Name and Address operator must have an input role that most closely matches the data contained in the source attribute. Input roles can either be non-discrete, or line oriented, input roles for free-form data (such as 'Line1'); or they can be discrete roles for specific input attributes (such as 'Person', 'First Name', 'Primary Address', or 'City'). Discrete roles give the Name and Address operator more information about the content of the source attribute.

Table 20-1 lists input roles and descriptions for the Name and Address Operator.

Table 20-1 Name and Address Operator Input Roles

Input Role Description

First Name

First name. This can be a nickname or shortened version of the first name.

Middle Name

Middle name or initial. Use when there is only one middle name, or for the first of several middle names (such as ÒHerbertÓ in George Herbert Walker Bush).

Middle Name 2

Second middle name; for example, ÒWalkerÓ in George Herbert Walker Bush.

Middle Name 3

Third middle name; for example, ÒLouiseÓ in Ethel May Roberta Louise Mertz.

Last Name

Last name, or surname.

First Part Name

First part of the Person name, including:

  • Pre name

  • First name

  • Middle name(s)

Use when these components are contained in one source column.

Last Part Name

Last part of Person Name, including:

  • Last name

  • Post Name

Use when these components are all contained in one source column.

Pre Name

Information that precedes and qualifies the name; for example, Ms., Mr., or Dr.

Post Name

Generation or other information qualifying the name; for example, Jr. or Ph.D.

Person

Full person name, including:

  • First Part Name (consisting of Pre Name, First Name, and Middle Names)

  • Last Part Name (consisting of Last Name and Post Name)

Use when these components are all contained in one source column.

Person 2

Designates a second person if the input includes multiple personal contacts.

Person 3

Designates a third person if the input includes multiple personal contacts.

Firm Name

Name of the company or organization.

Primary Address

Box, route, or street address, including:

  • Street name

  • House number

  • City map grid direction; for example, SW or N

  • Street type; for example, avenue, street, or road.

This does not include the Unit Designator or the Unit Number.

Secondary Address

The second part of the street address, including:

  • Unit Designator

  • Unit Number

For example, in a secondary address of Suite 2100, the Unit Designator is 'STE' (a standardization of ÒSuiteÓ) and the Unit Number is '2100'.

Address

Full address line, including:

  • Primary Address

  • Secondary Address

Use when these components share one column.

Address 2

Generic address line.

Neighborhood

Neighborhood or barrio, common in South and Latin American addresses.

City

Name of city.

State

Name of state or province.

Postal Code

Postal code, such as a ZIP code in the United States or a Postal Code in Canada.

Country Name

Full country name.

Country Code

The ISO 3166-1993 (E) two- or three-character country code. For example, 'US' or 'USA' for United States; 'CA' or 'CAN' for Canada.

Last Line

Last address line, including:

  • City

  • State or province

  • Postal code

Use when these components are all contained in one source column.

Line1, Line2, Line3, Line4, Line5, Line6, Line7, Line8, Line9, Line10

Non-discrete roles intended for use as free form name, business, personal, and address text. These generic selections can store any type of name and address data. They do not provide the parser with any information about the data content. Whenever possible, use the discrete input roles provided instead.


Output Components

Each output component represents a discrete name or address entity, such as a title, a standardized first name, a street number, a street name, or a street type. An output component is assigned to the corresponding output attribute, and identifies the portion of a name or address that the attribute constitutes. Table 20-2 lists the Name and Address operator output components and their descriptions.

Table 20-2 Name and Address Operator Output Components

Parent Node(s) Output Component Description

Name

Pre NameFoot 1 

Title or salutation appearing before a name; for example, Ms. or Dr.

Name

First Name StandardizedFootref 1

Standard version of first name; for example, Theodore for Ted or James for Jim.

Name

Middle Name StandardizedFootref 1

Standardized version of the middle name; for example, Theodore for Ted or James for Jim. Used when there is only one middle name, or for the first of several middle names (such as ÒHerbertÓ in George Herbert Walker Bush).

Name

Middle Name 2 StandardizedFootref 1

Standardized version of the second middle name; for example, Theodore for Ted or James for Jim.

Name

Middle Name 3 StandardizedFootref 1

Standardized version of the third middle name; for example, Theodore for Ted or James for Jim.

Name

Post NameFootref 1

Name suffix indicating generation; for example, Sr., Jr., or III.

Name

Other Post NameFootref 1

Name suffix indicating certification, academic degree, or affiliation; for example, Ph.D., M.D., or R.N.

Name

Name DesignatorFootref 1

Personal name designation; for example, ÒATTNÓ (to the attention of) or ÒC/OÓ (care of).

Name

RelationshipFootref 1

Information related to another person; for example, ÒTrustee for.Ó

Name

PersonFootref 1

First name, middle name, and last name.

Name: Person

First NameFootref 1

The first name found in the input name.

Name: Person

Middle NameFootref 1

Middle name or initial. Use this for a single middle name, or for the first of several middle names (such as ÒHerbertÓ in George Herbert Walker Bush).

Name: Person

Middle Name 2Footref 1

Second middle name; for example, ÒWalkerÓ in George Herbert Walker Bush.

Name: Person

Middle Name 3Footref 1

Third middle name; for example, ÒLouiseÓ in Ethel May Roberta Louise Mertz.

Name: Person

Last NameFootref 1

Last name, or surname.

Name: Derived

GenderFootref 1

Probable gender:

  • M= Male

  • F= Female

  • N= Neutral (either Male or Female)

  • Blank=unknown.

Name: Derived

Person Count

Number of persons the record references. For example, a record with a Person name of ÒJohn and Jane DoeÓ has a Person Count of 2.

Name: Business

Firm NameFootref 1

Name of the company or organization, including divisions.

Name: Business

Firm CountFootref 1

Number of firms referenced in the record.

Address

AddressFoot 2 

Full address line, including:

  • Primary Address

  • Secondary Address

Address

Primary AddressFootref 2

Box, route, or street address, including:

  • Street name

  • House number

  • City map grid direction; for example, SW or N

  • Street type; for example, avenue, street, or road.

This does not include the Unit Designator or the Unit Number.

Address: Primary Address

Street NumberFootref 2

Number that identifies the address, such as a house or building number, sometimes referred to as the primary range. For example, in 200 Oracle Parkway, the Street Number is 200.

Address: Primary Address

Pre DirectionalFootref 2

Street directional indicator appearing before the street name; for example, in 100 N University Drive, the Pre Directional is 'N'.

Address: Primary Address

Street NameFootref 2

Name of street.

Address: Primary Address

Street TypeFootref 2

Street identifier; for example, ST, AVE, RD, DR, or HWY.

Address: Primary Address

Post DirectionalFootref 2

Street directional indicator appearing after the street name; for example, in 100 15th Ave. S., the Post Directional is 'S'.

Address

Secondary AddressFootref 2

The second part of the street address, including:

  • Unit Designator

  • Unit Number

For example, in a secondary address of Suite 2100, the Unit Designator is 'STE' (a standardization of ÒSuiteÓ) and the Unit Number is '2100'.

Address: Secondary Address

Unit DesignatorFootref 2

Type of secondary address, such as APT or STE. For example, in a secondary address of Suite 2100, the Unit Designator is 'STE' (a standardization of ÒSuiteÓ).

Address: Secondary Address

Unit NumberFootref 2

Number that identifies the secondary address, such as the apartment or suite number. For example, in a secondary address of Suite 2100, the Unit Number is '2100'.

Address

Last Line

Final address line, including:

  • City

  • State or province (if state contains a county instead of a state or province, it is not included)

  • Formatted postal code if address was fully assigned

Address: Last Line

Neighborhood

Neighborhood or barrio, common in South and Latin American addresses.

Address: Last Line

City

Name of city. The US city names may be converted to United States Postal Service preferred names.

Address: Last Line

State

Name of state or province.

Address: Last Line

Postal Code

Full postal code with spaces and other non-alphanumeric characters removed.

Address: Last Line

Postal Code Formatted

Formatted version of postal code that includes spaces and other non-alphanumeric characters, such as dashes.

Address: Last Line

Delivery Point

Applies to United States and Australia.

  • For the United States, this is the two-digit postal delivery point, which is combined with a full nine-digit postal code and check digit to form a delivery point bar code.

  • For Australia, this is a nine-digit delivery point.

Address: Last Line

Country Code

The ISO 3166-1993 (E) two-character country code, as defined by the International Organization for Standardization; for example, 'US' for United States or 'CA' for Canada.

Address: Last Line

Country Code 3

The ISO 3166-1993 (E) three-character country code, as defined by the International Organization for Standardization; for example, 'USA' for United States, 'FRA' for France, or 'UKR' for Ukraine.

Address: Last Line

Country Name

The full country name.

Address: Other Address Line

Box NameFootref 2

The name for a post office box address; for example, for ÒPO Box 95,Ó the Box Name is 'PO BOX'.

Address: Other Address Line

Box NumberFootref 2

The number for a post office box address; for example, for ÒPO Box 95,Ó the Box Number is '95'.

Address: Other Address Line

Route NameFootref 2

Route name for a rural route address. For an address of ÒRoute 5 Box 10,Ó the Route Name is 'RTE' (a standardization of ÒRouteÓ).

Address: Other Address Line

Route NumberFootref 2

Route number for a rural route address. For an address of ÒRoute 5 Box 10,Ó the Route Number is '5'.

Address: Other Address Line

Building Name

Building name, such as ÒCannon Bridge House.Ó Building names are common in the United Kingdom.

Address: Other Address Line

Complex

Building, campus, or other complex. For example, USS John F. Kennedy, Shadow Green Apartments, Cedarvale Gardens, Concordia College. You can use an the Instance field in the Output Components dialog to specify which complex should be returned in cases where an address has more than one complex.

Address: Other Address Line

Miscellaneous Address

Miscellaneous address information, such as a telephone number or an e-mail address.

In records with multiple miscellaneous fields, you can extract several by specifying which instance to use in the Name and Address operator Output Components screen. For example, in a record that has both an e-mail address and a telephone number, you can extract both items of information by using Miscellaneous Address 1 and Miscellaneous Address 2.

Error Status: Name and Address

Is Good Group

Possible values: T or F. Indicates whether the name group, address group, or name and address group was processed successfully:

  • For name groups, T = The name has been successfully parsed.

  • For address groups, T = The address has been found in a postal matching database if one is available, or has been successfully parsed if no postal database is installed.

  • For name and address groups, T = Both the name and the address have been successfully processed.

  • F = The group was not parsed successfully.

Using this flag in conjunction with another flag, such as the Is Parsed Flag, followed by the Splitter operator, enables you to isolate unsuccessfully parsed records in their own target, where you can address them separately.

Error Status: Name and Address

Is Parsed

Indicates whether the name or address was parsed:

  • T= The name or address was parsed successfully. A name or address is considered to be successfully parsed even if some parsing warnings exist.

  • F= The name or address may not be parsed.

Check the status of parsing warning flags (such as Name Warning, or City Warning).

Error Status: Name and Address

Parse Status

Postal matching software parse status code.

Error Status: Name and Address

Parse Status Description

Text description of the postal matching software parse status.

Error Status: Name Only

Is Good Name

Indicates whether the name was parsed successfully:

  • T= The name was parsed successfully. Note that a name is considered to be successfully parsed even if some parsing warnings exist.

  • F= The name may not be parsed.

Error Status: Name Only

Name Warning

Indicates whether the parser found unusual or possibly erroneous data in a name:

  • T= The parser had difficulty parsing a name or found unusual data. Check the Parse Status component for the cause of the warning.

  • F= No difficulty parsing name.

Error Checking: Address Only

Is Good Address

Indicates whether the address was processed successfully:

  • T= Successfully processed: either the address was found in the postal matching database or, if no postal matching database is installed for the country indicated by the address, the address was successfully parsed.

  • F= Not successfully processed: if a postal matching database is installed for the country indicated by the address, this indicates that address was not found in the database. If no postal matching database is available for the country, this indicates that the address may not be parsed.

This flag is easier to use if you have a mix of records from both postal matched and non postal matched countries.

Error Checking: Address Only

Is Found

Indicates whether the address is listed in the postal matching database for the country indicated by the address:

  • T= The address was found in a postal matching database.

  • F= The address was not found in a postal matching database. This can mean that the address is not a legal address, and also that postal matching is not available for the country.

This flag is true only if all of the flags listed in the following section are true. If postal matching is available, this flag is the best indicator of record quality.

Error Checking: Address Only: Is Found

City Found

Indicates whether the postal matcher found the city:

  • T= The postal matcher found the city.

  • F= The postal matcher did not find the city.

Error Checking: Address Only: Is Found

Street Name Found

Indicates whether the postal matcher found the street name:

  • T= The postal matcher found the street name.

  • F= The postal matcher did not find the street name.

Error Checking: Address Only: Is Found

Street Number Found

Indicates whether the postal matcher found the street number within a valid range of numbers for the named street:

  • T= The postal matcher found the street number.

  • F= The postal matcher did not find the street number within a valid range of numbers for the named street.

Error Checking: Address Only: Is Found

Street Components Found

Indicates whether the postal matcher found the street components, such as the Pre Directional or Post Directional:

  • T= The postal matcher found the street components.

  • F= The postal matcher did not find the street components.

Error Checking: Address Only: Is Found

Non-ambiguous Match Found

Indicates whether the postal matcher found a matching address in the postal database:

  • T= The postal matcher found a match between the input record and a single entry in the postal database.

  • F= The address is ambiguous. The postal matcher found that the address matched several postal database entries and could not make a selection. For example, if the input address is Ò100 4th Avenue,Ó but the postal database contains Ò100 4th Ave NÓ and Ò100 4th Ave S,Ó the input's missing directional causes the match to fail.

Error Checking: Address Only

City Warning

Indicates whether the parser found unusual or possibly erroneous data in a city:

  • T= The parser had difficulty parsing a city.

  • F= No difficulty parsing city.

Error Checking: Address Only

Street Warning

Indicates whether the parser found unusual or possibly erroneous data in a street address:

  • T= The parser had difficulty parsing a street address.

  • F= No difficulty parsing street address.

Error Checking: Address Only

Is Address Verifiable

Indicates whether postal matching is available for the country of the address, but does not indicate the outcome of the matching operation:

  • T= Postal matching is available for the country address.

  • F= Matching is not available for the country of the address. This indicates that a postal matching database is installed for the country indicated by the address but matching is not available for the address. No postal matching database is installed for the country indicated by the address.

Error Checking: Address Only

Address Corrected

Indicates whether the address was corrected in any way during matching. Standardization is not considered correction in this case.

  • T= Some component of the address was changed, aside from standardization. One of the other Corrected flags, indicating the component that was changed, must also be True.

  • F= No components of the address were changed, with the possible exception of standardization.

Error Checking: Address Only: Address Corrected

Postal Code Corrected

Indicates whether the postal code was corrected during matching. Correction can include the addition of a postal extension:

  • T= The postal code was corrected.

  • F= The postal code was not corrected.

Error Checking: Address Only: Address Corrected

City Corrected

Indicates whether the city name was corrected during matching. Postal code input is used to determine the city name preferred by the postal service.

  • T= The city name was corrected during matching.

  • F= The city name was not corrected.

Error Checking: Address Only: Address Corrected

Street Corrected

Indicates whether the street name was corrected during matching. Some correct street names may be changed to an alternate name preferred by the postal service.

  • T= The street name was corrected during matching.

  • F= The street name was not corrected.

Error Checking: Address Only: Address Corrected

Street Components Corrected

Indicates whether any of the street components, such as the Pre Directional or Post Directional, were corrected during matching:

  • T= Street components were corrected.

  • F= Street components were not corrected.

Error Checking: Address Only

Address Type

Type of address. The following are common examples; actual values vary with vendors of postal matching software:

  • F= Firm

  • G= General Delivery

  • H= High-rise apartment or office building

  • HD= High-rise default, where a single Zip+4 postal code applies to the entire building. The Name and Address operator can detect a finer level of postal code assignment if a floor or suite address is provided, in which case the record is treated as an H type, with a more specific Zip+4 code for that floor or suite.

  • B= Box

  • R= Rural Code

  • S= Street

  • M= Military

  • P= Post Office Box

Error Checking: Address Only

Parsing Country

Country parser that was used for the final parse of the record.

Country Specific: United States

ZIP 5

The five-digit United States postal code. This applies to United States addresses only.

Country Specific: United States

ZIP 4

The four-digit suffix that is added to the five-digit United States postal code to further specify location. This applies to United States addresses only.

Country Specific: United States

Urbanization Name

Urban unit name used in Puerto Rico. This applies to United States (Puerto Rico) addresses only.

Country Specific: United States

LACS Flag

Indicates whether the address requires a LACS conversion.

The Locatable Address Conversion System (LACS) provides new addresses when a 911 emergency system has been implemented. 911 address conversions typically involve changing rural-style addresses to city-style addresses, but on occasion they may involve renaming or renumbering existing city-style addresses.

  • T= Address needs 9-1-1 conversion (from old rural route to new street address) and should be submitted to a LACS vendor

  • F= Address does not require conversion.

This applies to United States addresses only.

Country Specific: United States

CART

Four-character USPS Carrier route. This applies to United States addresses only.

Country Specific: United States

DPBC Check Digit

Check digit for forming a delivery point bar code. This applies to United States addresses only.

Country Specific: United States: Geography

Metropolitan Statistical Area

Metropolitan Statistical Area (MSA) number. For example, '0000' indicates that the address does not lie within any MSA, and typically indicates a rural area. This applies to United States addresses only.

Country Specific: United States: Geography

Minor Census District

Minor Census District. This applies to United States addresses only.

Country Specific: United States: Geography

Latitude

Latitude in degrees north of the equator: positive for north of the equator; negative for south (always positive for North America).

Country Specific: United States: Geography

Longitude

Longitude in degrees east of the Greenwich Meridian: positive for east of GM; negative for west (always negative for North America).

Country Specific: United States: Geography

FIPS County

The three-digit county code as defined by the Federal Information Processing Standard (FIPS). This applies to United States addresses only.

Country Specific: United States: Geography

FIPS Code

The complete (state plus county) code assigned to the county by the Federal Information Processing Standard (FIPS). Because FIPS county codes are unique within a state, a complete FIPS Code includes the two-digit state code followed by the three-digit county code. This applies to United States addresses only.

Country Specific: United States: Geography

Census ID

United States Census tract and block-group number. The first six digits are the tract number; the final digit is the block-group number within the tract. These codes are used for matching to demographic-coding databases. This applies to United States addresses only.

Country Specific: United Kingdom

Locality Code

This applies to United Kingdom addresses only. For example, the following address is assigned Locality Code 23591:

Chobham Rd Knaphill Woking GU21 2TZ

Country Specific: United Kingdom

Locality Name

This applies to United Kingdom addresses only. For example, the following address is assigned Locality Name KNAPHILL:

Chobham Rd Knaphill Woking GU21 2TZ

Country Specific: United Kingdom

County Name

This applies to United Kingdom addresses only.

Country Specific: Canada

Installation Type

Type of Canadian postal installation:

  • STN= Station

  • RPO= Retail Postal Outlet

For example, for the address, ÒPO Box 7010, Scarborough ON M1S 3C6,Ó the Installation Type is 'STN'.

Country Specific: Canada

Installation Name

Name of Canadian postal installation. For example, for the address, ÒPO Box 7010, Scarborough ON M1S 3C6,Ó the Installation Name is 'AGINCOURT'.

Country Specific: Hong Kong

Delivery Office Code

This applies to Hong Kong addresses only. For example, the following address is assigned the Delivery Office Code WCH:

Oracle 39/F The Lee Gardens 33 Hysan Ave Causeway Bay

Country Specific: Hong Kong

Delivery Beat Code

This applies to Hong Kong addresses only. For example, the following address is assigned the Delivery Beat Code S06:

Oracle 39/F The Lee Gardens 33 Hysan Ave Causeway Bay

Country Specific: Hong Kong

Address 2

The second address line, assigned when both a street address and a building or floor address are present. This applies to Hong Kong only.


Footnote 1 In records where this component occurs multiple times, you can specify which instance to use in the Name and Address operator Output Attributes components dialog. For example, in records with two occurrences of Firm Name, you can extract both by adding two output attributes for Firm1 and Firm2, and then assigning a First instance to one, and then a Second instance to the other, from the Instances drop list in the components dialog.

Footnote 2 In records with dual addresses, you can specify which line is used as the Normal Address (and thus assigned to the Address component) and which is used as the Dual Address. You can specify this in the Name and Address operator Output Attributes components dialog.

Name and Address Operator Supported Countries

Table 20-3 lists supported countries for some vendors of name and address cleansing software. This list varies by vendor.

Table 20-3 Countries Supported by the Name and Address Operator

Name Postal Code Matching Software Available for Use with the Name and Address Operator

Argentina

No

Australia

Yes

Belgium

No

Brazil

Yes

Canada

Yes

Chile

No

Colombia

No

Denmark

No

France

Yes

Germany

Yes

Hong Kong

Yes

India

No

Ireland

No

Italy

No

Mexico

Yes

Malaysia

No

Netherlands

Yes

New Zealand

No

Peru

No

Philippines

No

Portugal

Yes

Singapore

Yes

South Africa

No

Spain

Yes

Sweden

No

Switzerland

No

United Arab Emirates

No

United Kingdom

Great Britain Only

United States

Yes

Venezuela

No


Country Postal Certifications

The postal matching software used by Oracle Warehouse Builder Name and Address can be certified for various countries. The certification depends on the vendor of the postal matching software you use. Possible certifications include:

United States Postal Service CASS Certification

The Coding Accuracy Support System (CASS) is a process developed by the United States Postal Service (USPS) in cooperation with the mailing industry. The process provides mailers a common platform to measure the quality of address-matching software, focusing on the accuracy of five-digit ZIP Codes, ZIP+4 Codes, delivery point codes, and carrier route codes applied to all mail. All address lists used to produce mailings for automation rates must be matched by CASS-certified software. Oracle9i Pure Name and Address is CASS-certified. The CASS report is a text file specified by the USPS and produced by Oracle9i Pure Name and Address. To meet USPS requirements, the mailer must submit a CASS report in its original form to the USPS.

Canada Post SERP Certification

Canada Post has developed a testing program called Software Evaluation and Recognition Program (SERP) which evaluates software packages for their ability to validate or validate and correct mailing lists to Canada Post requirements. Postal programs that meet SERP requirements are listed on the Canada Post Web site.

Customers who utilize Incentive Lettermail, Addressed Admail and Publications Mail must meet the Address Accuracy Program requirements. Customers can obtain a Statement of Accuracy by comparing their database to Canada Post's address data.

Australia Post AMAS Certification

The Address Matching Approval System (AMAS®) is a software approval program that has been developed by Australia Post to improve the quality of addressing. It provides a standard by which to test and measure the ability of address-matching software to:

  • Correct and match addresses against the Postal Address File (PAF)

  • Append a unique Delivery Point Identifier (DPID)® to each address record, which is a step toward the barcoding of mail.

AMAS allows companies to develop address matching software which:

  • Prepares addresses for barcode creation

  • Ensures quality addressing

  • Enables qualification for discounts on PreSort letter lodgements

PreSort Letter Service prices are conditional upon customers using AMAS Approved Software with Delivery Point Identifiers (DPIDs) being current against the latest version of the PAF.

A declaration that your mail was prepared appropriately must be made when using the Presort Lodgement Document, available from Post Offices.

You can only produce one postal report for each mapping. The postal report created is for the primary country specified in the Name and Address operator definition. Ideally, all mapped addresses should be located in the primary country selected in the Definitions page of the Name and Address Wizard. For more information, see "Definitions".

Configuring the Name and Address Server

The Name and Address operator generates PL/SQL code, which calls the UTL_NAME_ADDR package installed in the Runtime Schema. A private synonym, NAME_ADDR, is defined in the target schema to reference the UTL_NAME_ADDR package. The UTL_NAME_ADDR package calls Java packages, which send processing requests to an external Name and Address server, which then interfaces with third-party Name and Address processing libraries, such as Trillium.

You can use the server property file, NameAddr.properties, to configure server options. This file is located under the ORACLE_HOME of your Oracle Warehouse Builder Server Side Install at owb/bin/admin. The following code illustrates several important properties with their default settings.

TraceLevel=0
SocketTimeout=120
ClientThreads=16
Port=4040

The TraceLevel property is often changed to perform diagnostics on server communication and view output from the postal matching program parser. Other properties are rarely changed.

Starting and Stopping the Name and Address Server

Whenever you edit the properties file or perform table maintenance, you must stop and restart the Name and Address Server for the changes to take effect. You must stop and restart the server manually if the auto shutdown property is FALSE and the server is running when you edit the file or perform the table maintenance.

To manually stop the Name and Addresss Server:

You can automatically restart the Name and Address Server by invoking a mapping in Warehouse Builder. You can also restart the server manually.

To manually restart the Name and Address Server:

Best Practices for Using the Name and Address Operator in a Mapping

Use the following best practice tips to define and use the Name and Address Operator in your mapping.

Correcting Errors in Name and Address Data

Oracle Warehouse Builder Name and Address is built on name and address software and data supplied by third-party software vendors who specialize in name and address cleansing. This section discusses additional considerations that pertain to these products, as well as general parsing issues.

Name and Address parsing and correction can provide great improvements in data quality, but can degrade quality if not applied carefully. Name and Address parsing, postal matching, or any cleansing of incorrect data is error-prone because the problem domain of dirty data is not bounded.

Name and Address parsing, like any other type of parsing, depends on identification of keywords and patterns containing those keywords. Unlike computer languages, which contain a small set of keywords, free form Name and Address data of any region is very difficult to parse because the keyword set is so large and the set is never 100% complete. For example, one name and address cleansing software provider's pattern table for USA contains over 80,000 defined keywords and over 5,000 patterns. Keyword sets are built by analyzing millions of records, but each new data set is likely to contain some undefined keywords.

Because most free-form Name and Address record lines contain common patterns of numbers, single letters, and alphanumeric strings, parsing can often be performed based on just the alphanumeric patterns. In more difficult cases, alphanumeric patterns may be ambiguous, with possible interpretation as either an address line or name line. In other cases, a particular pattern may not be found. With computer languages, compilation errors are reported and an executable image is not produced; with Name and Address parsing errors, parsing status codes are set and Name and Address elements may not be parsed.

Using Status Codes

Because failure of the parsing engine at some point is highly probable, perform all Name and Address processing with error processing in mind. Use status codes to control the data mapping. Since the criteria for quality vary between applications, numerous flags are available to help you determine the quality of a particular record. For countries with postal matching support, the Is Good Group flag is the best measure of quality because it verifies that an address is a valid entry in a postal database. For CASS or SERP certified mailings, this flag may be the best criterion for acceptance or rejection of a record.

If you do not perform postal reporting, an address does not have to be found in a postal database to be acceptable. For example, it may not be possible to locate street intersection addresses or addresses using building names may not be locatable in a postal database, but they may still be deliverable. If the parse status is good, some benefit will be derived from standardization of address elements. If the Is Good Group flag indicates postal matching failure, several parser warning/error flags are also available to help determine the parsing status. The Is Parsed flag indicates success or failure of the parsing process. If Is Parsed indicates parsing success, you may still wish to check the parser warning flags, which indicate unusual data. If parser warnings are present, it may be desirable to check those records manually. If Is Parsed indicates parsing failure, preservation of your original data is necessary to prevent data loss.

Using Discrete Input Roles

A wide range of input roles is available to match the granularity of input data. Line based input roles are available for line oriented input, and discrete roles are available for atomic attributes such as First Name. Whenever possible, use discrete roles because they give the parser more information about the data content, and result in better parsing. Some of the discrete input roles overlap less discrete roles; for example, the combination of city, state, and postal code redefines the Last Line role. Such redefining assignments should be avoided. Where they occur, the more discrete roles take precedence because they are more specific.

Map discrete Name and Address attributes whose original value you want to retain directly to the target (map them around the Name and Address operator). This is very common for first, middle, and last names because it is not a good practice to change a first name to a standardized first name (for example, changing Peggy to Margaret). This direct mapping prevents losing name data if a personal name is parsed as a firm name. To obtain a standardized first or middle name for matching and merging (data deduplication), you can map the same attributes into the Name and Address operator. However, only the recoded outputs, such as title, gender, or standardized name, are used as output from the operator. There is no advantage to using first, middle, or last name output because it mirrors the input. This recommendation only applies to discrete data where the first, middle, and last name is already known.

Separating Good Records from Bad Records with the Splitter Operator

Consider using the splitter operator to map good records to one target and bad records to another target. If the percentage of bad records is very low, or if the data set is small, you may consider manual correction of bad records. You can flag each record manually corrected record for later mapping to the target schema. If the IS_GOOD_GRP flag is F and the records are mapped to a special target for manual correction, you can map additional components that indicate the success level of the postal match. Components are available to indicate matching success at the city, street name, street number range, or street component (for example, the pre-directional, post-directional, or street type) level.