Previous Topic

Next Topic

Book Contents

Setting installation and data transfer settings

You set installation and data transfer settings in the etl.properties file. The values that you specify determine how the ETL utility is installed and how your data is moved from the source database to the Empirica Healthcare Analysis database.

You must enclose all values in single quotation marks, for example, 'Y' . To provide a blank value, type two single quotes ( '' ).

Do not comment out any lines in the etl.properties file.

  1. In a command shell, set your current directory to the location where you unpacked the installation ZIP file, Healthcare-CDM4_ETL_xxx.zip, where xxx refers to the build number.

    For example:

    $ cd /u01/stage/cdm4etl

  2. Type the following command to create a copy of the template_etl.properties file, and rename the file to etl.properties:

    $ cp template_etl.properties etl.properties

  3. Open the etl.properties file in a text editor, such as vi.
  4. Provide values for the following settings. These settings are used when the ETL utility is installed.

    Note: If you provide a non-blank value for the source_connect property, the installation attempts to use a database link for the import, even if you specify non-blank values for properties required by a file import. If source_connect is blank, the installation attempts to import using data files.

    Installation settings

    Property

    Non-blank value required

    Value to provide

    source_connect

    Yes, if you plan to install using a database link.

    Connect string from the Empirica Healthcare Analysis database to the source database.

    The connect string must be one of the following:

    • A net service name that is specified in the tnsnames.ora file on the Empirica Healthcare Analysis database server and that indicates how to connect to the source server.
    • An EZCONNECT connection string in the following format: hostname:port/service

      If you provide a non-blank value, the ETL utility imports data using a database link.

      Example for importing using a database link: 'ORCLSRC'

      Value to use for importing using files: ''

    source_global_name

    Yes, if you plan to install using a database link, and if the global_names initialization parameter in the Empirica Healthcare Analysis database is set to TRUE.

    Global name of the source database.

    Example when global_names is TRUE: 'ORCL.world'

    Value to use for importing using a database link when the global_names initialization parameter in the Empirica Healthcare Analysis database is set to FALSE: ''

    Value to use for importing using files: ''

    source_schema

    Yes

    Name of the account that contains the data tables in the source database.

    Example: 'CDM_SOURCE'

    vocab_schema

    Yes

    Name of the account that contains the CONCEPT table in the source database.

    If the CONCEPT table is in the same account as the data, provide the name of the account that contains the data tables.

    Example: 'CDM_VOCABULARY'

    source_tablespaces

    Yes

    Comma-delimited list of the tablespace(s) that are in the source database and that contain the data and vocabulary accounts that you are importing.

    Example: 'CDM4DATA, CDM4VOC'

    dest_connect

    Yes

    Connect string for the Empirica Healthcare Analysis database.

    Use the TNS connection that you use to connect to the Empirica Healthcare Analysis database from the machine where you install the ETL utility.

    Example: 'ORCLDEST'

    dest_schema

    Yes

    Name of the destination account that will be created in the Empirica Healthcare Analysis database to hold the data and vocabulary information from the source database.

    The name must:

    • Not already exist.
    • Contain 30 or fewer characters.
    • Begin with an alphabetic character.

    The name can contain only the following characters:

    • Alphanumeric characters
    • Underscores (_)
    • Dollar signs ($)
    • Number signs (#).

      Example: 'CDM_DEST'

    dest_tablespace

    Yes

    Name of the default tablespace for the destination account.

    Example: 'CDMDEST'

    healthcare_master

    Yes

    Name of the existing account for the Empirica Healthcare Analysis database. You created this account when you installed the Empirica Healthcare Analysis application.

    This account is different from the destination account that you are creating to hold the data that you transfer. You specify the name of the destination account using the dest_schema property.

    Example: 'HEALTHCARE'

  5. Provide values for the following settings. These settings are used when data is imported into the Empirica Healthcare Analysis database.

    PERSON and OBSERVATION_PERIOD tables are always imported.

    Import settings

    Property

    Non-blank value required

    Value to provide

    data_filename

    Yes, if you plan to import using data files.

    Name of the file containing the data to be moved to the Empirica Healthcare Analysis database.

    Example: 'data_export.dmp'

    Value to use for importing using a database link: ''

    vocab_filename

    Yes, if you plan to import using data files.

    Name of the file containing the vocabulary information. If the CONCEPT table is contained in the data export file, use the value that you provided for data_filename.

    Example: 'voc_export.dmp'

    Value to use for importing using a database link: ''

    import_directory

    Yes, if you plan to import using data files.

    Name of the directory from which you are importing data.

    Example: 'ETL_DIR'

    Value to use for importing using a database link: ''

    load_drug_era

    Yes

    'Y' or 'N', indicating whether to import the DRUG_ERA table.

    You are required to import at least one drug table. Therefore, at least one of the load_drug_era and load_drug_exposure properties must be set to 'Y'.

    load_drug_exposure

    Yes

    'Y' or 'N', indicating whether to import the DRUG_EXPOSURE table.

    You are required to import at least one drug table. At least one of the load_drug_era and load_drug_exposure properties must be set to 'Y'.

    load_condition_era

    Yes

    'Y' or 'N', indicating whether to import the CONDITION_ERA table.

    You are required to import exactly one condition table. Therefore, if load_condition_era is 'Y', load_condition_occurrence must be 'N', and vice versa.

    load_condition_occurrence

    Yes

    'Y' or 'N', indicating whether to import the CONDITION_OCCURRENCE table.

    You are required to import exactly one condition table. Therefore, if load_condition_occurrence is 'Y', load_condition_era must be 'N', and vice versa.

    load_procedure_occurrence

    Yes

    'Y' or 'N', indicating whether to import the PROCEDURE_OCCURRENCE table.

  6. Provide values for the following settings. These settings are used when data is transformed for use in the Empirica Healthcare Analysis database.

    Transform settings

    Property

    Non-blank value required

    Value to provide

    commit_count

    Yes

    Size of the batch in which records in the tables are processed based on the PERSON_ID column in each table. This value must be a number that is greater than or equal to 100.

    A smaller number can increase the time of the data transfer but allows you to check the progress in the ETL_STATUS table. Consider setting this value to 1 percent of the rows in the PERSON table.

    Example: '100000'

    compute_age

    Yes

    'Y' or 'N', indicating whether to populate the AGE and AGE_GROUP columns in the PERSON table.

    • If the value is 'Y', values for the AGE column are populated during the transformation process in one of the following ways:
      • Values are derived using the default age calculation provided by the ETL utility when age_column is ''.

        The default age calculation subtracts YEAR_OF_BIRTH from the year part of ENROLLMENT_START_DATE.

      • Values are populated using an Age column that you specify using the age_column property.

        If you varied from the default Common Data Model Version 4 and your PERSON table includes a column for age, you can use this column in place of the age calculation. Specify the column name in the age_column property, which appears next in this table.

    • If the value is 'Y', values for the AGE_GROUP column are derived based upon the value in the AGE column.

    Note: If the value is 'N' and you specify a column for age_column, the AGE column is populated with values from the column, but AGE_GROUP is not calculated.

    age_column

    No

    In the PERSON table, the name of the column that contains age. The value of age must be specified in years.

    If you do not have an age column, but you want the configuration to include a calculated age column, the transformation process can create and derive AGE if you set compute_age, which appears before this property in this table, to 'Y', and set age_column to ''. Age is computed in years.

    If you provide a non-blank value:

    • The column datatype must be numeric.
    • The column name must:
      • Contain 30 or fewer characters.
      • Begin with an alphabetic character.

    The column name can contain only the following characters:

    • Alphanumeric characters
    • Underscores (_)
    • Dollar signs ($)
    • Number signs (#).

      Example: AGE_AT_ENROLLMENT

    parallel_update

    Yes

    'Y' or 'N', indicating whether to use parallel updates during the transformation process.

    The update statements used to perform the data transformation generally perform better using parallel execution. However, in some cases, bottlenecks can occur, so the utility allows you to turn off parallel execution.

  7. Provide values for the following settings. These settings are used when configurations are created in the Empirica Healthcare Analysis database.

    Configuration settings

    Property

    Non-blank value required

    Value to provide

    config_name_era

    Yes

    Name to use for the Drug Era configuration. The name must not be used by an existing configuration.

    The Drug Era configuration is created only when the load_drug_era property is set to 'Y'.

    If load_drug_era is set to 'Y', you must provide a value.

    The values for config_name_era and config_name_exposure must:

    • Be different.
    • Contain 100 or fewer characters.

    The values can contain only the following characters:

    • Alphanumeric characters.
    • Underscores (_).
    • Parentheses [(] or [)].
    • Blank space.
    • Plus sign (+) or minus sign (-).

      Example: 'CDM Era'

    config_name_exposure

    Yes

    Name to use for the Drug Exposure configuration. The name must not be used by an existing configuration.

    The Drug Exposure configuration is created only when the load_drug_exposure property is set to 'Y'.

    If load_drug_exposure is set to 'Y', you must provide a non-blank value.

    The values for config_name_era and config_name_exposure must:

    • Be different.
    • Contain 100 or fewer characters.

    The values can contain only the following characters:

    • Alphanumeric characters.
    • Underscores (_).
    • Parentheses [(] or [)].
    • Blank space.
    • Plus sign (+) or minus sign (-).

      Example: 'CDM Exposure'

    parallel_ddl

    Yes

    'Y' or 'N', indicating whether to use parallel processes for creating tables and indexes.

    The performance for creating tables and indexes is generally better when you use parallel execution. However, in some cases, bottlenecks can occur, so the utility allows you to turn off parallel execution.

Copyright © 2015 Oracle and/or its affiliates. All rights reserved.