Skip Headers

Oracle® Database Utilities
10g Release 1 (10.1)

Part Number B10825-01
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Master Index
Master Index
Go to Feedback page
Feedback

Go to previous page
Previous
Go to next page
Next
View PDF

1 Overview of Oracle Data Pump

Oracle Data Pump technology enables very high-speed movement of data and metadata from one database to another.

This chapter discusses the following topics:

Data Pump Components

Oracle Data Pump is made up of three distinct parts:

The Data Pump clients, expdp and impdp, invoke the Data Pump Export utility and Data Pump Import utility, respectively. They provide a user interface that closely resembles the original export (exp) and import (imp) utilities.

The expdp and impdp clients use the procedures provided in the DBMS_DATAPUMP PL/SQL package to execute export and import commands, using the parameters entered at the command-line. These parameters enable the exporting and importing of data and metadata for a complete database or subsets of a database.


Note:

All Data Pump Export and Import processing, including the reading and writing of dump files, is done on the server. This means that the data base administrator (DBA) must create directory objects. See Default Locations for Dump, Log, and SQL Files for more information about directory objects.

When data is moved, Data Pump automatically uses either direct path load (or unload) or the external tables mechanism, or a combination of both. When metadata is moved, Data Pump uses functionality provided by the DBMS_METADATA PL/SQL package. The DBMS_METADATA package provides a centralized facility for the extraction, manipulation, and resubmission of dictionary metadata.

The DBMS_DATAPUMP and DBMS_METADATA PL/SQL packages can be used independently of the Data Pump clients.


See Also:

PL/SQL Packages and Types Reference for descriptions of the DBMS_DATAPUMP and DBMS_METADATA packages

What New Features Do Data Pump Export and Import Provide?

The new Data Pump Export and Import utilities (invoked with the expdp and impdp commands, respectively) have a similar look and feel to the original Export (exp) and Import (imp) utilities, but they are completely separate. Dump files generated by the new Data Pump Export utility are not compatible with dump files generated by the original Export utility. Therefore, files generated by the original Export (exp) utility cannot be imported with the Data Pump Import (impdp) utility.

Oracle recommends that you use the new Data Pump Export and Import utilities because they support all Oracle Database 10g features, except for XML schemas. Original Export and Import support the full set of Oracle database release 9.2 features. Also, the design of Data Pump Export and Import results in greatly enhanced data movement performance over the original Export and Import utilities.


Note:

See Chapter 20, " Original Export and Import" for information about situations in which you should still use the original Export and Import utilities.

The following are the major new features that provide this increased performance, as well as enhanced ease of use:

The remainder of this chapter discusses Data Pump technology as it is implemented in the Data Pump Export and Import utilities. To make full use of Data Pump technology, you must be a privileged user. Privileged users have the EXP_FULL_DATABASE and IMP_FULL_DATABASE roles. Nonprivileged users have neither.

Privileged users can do the following:

How Does Data Pump Access Data?

Data Pump supports two access methods to load and unload table row data: direct path and external tables. Because both methods support the same external data representation, data that is unloaded with one method can be loaded using the other method. Data Pump automatically chooses the fastest method appropriate for each table.

Data Pump also uses functionality provided in the DBMS_METADATA PL/SQL package to handle all operations involving metadata, including complete extraction, transformation, and re-creation of all database object definitions.

Direct Path Loads and Unloads

The Oracle database has provided direct path unload capability for export operations since Oracle release 7.3 and a direct path API for OCI since Oracle8i. Data Pump technology enhances direct path technology in the following ways:

  • Support of a direct path, proprietary format unload.

  • Improved performance through elimination of unnecessary conversions. This is possible because the direct path internal stream format is used as the format stored in the Data Pump dump files.

  • Support of additional datatypes and transformations.

Data Pump uses direct path load and unload when the structure of a table allows it.

In the following circumstances, Data Pump cannot use direct path loading:

  • A global index on multipartition tables exists during a single-partition load. This includes object tables that are partitioned.

  • A domain index exists for a LOB column.

  • A table is in a cluster.

  • A table has an active trigger.

  • A table has fine-grained access control enabled in insert mode.

  • A table contains BFILE columns or columns of opaque types.

  • A referential integrity constraint is present.

  • A table contains VARRAY columns with an embedded opaque type.

If any of these conditions exist for a table, Data Pump uses external tables rather than direct path to move the data for that table.

External Tables

The Oracle database has provided an external tables capability since Oracle9i that allows reading of data sources external to the database. As of Oracle Database 10g, the external tables feature also supports writing database data to destinations external to the database. Data Pump provides an external tables access driver (ORACLE_DATAPUMP) that reads and writes files. The format of the files is the same format used with the direct path method. This allows for high-speed loading and unloading of database tables. Data Pump uses external tables as the data access mechanism in the following situations:

  • Loading and unloading very large tables and partitions in situations where parallel SQL can be used to advantage

  • Loading tables with global or domain indexes defined on them, including partitioned object tables

  • Loading tables with active triggers or clustered tables

  • Loading and unloading tables with encrypted columns

  • Loading tables with fine-grained access control enabled for inserts

  • Loading tables that are partitioned differently at load time and unload time

Accessing Data Over a Network

You can perform Data Pump exports and imports over the network, rather than locally.

When you perform an import over the network, the source is another database, not a dump file set.

When you perform an export over the network, the source can be a read-only database on another system. Dump files are written out on the local system just as they are with a local (non-networked) export.


See Also:

  • NETWORK_LINK for information about performing exports over the network

  • NETWORK_LINK for information about performing imports over the network


What Happens During Execution of a Data Pump Job?

Data Pump jobs use a master table, a master process, and worker processes to perform the work and keep track of progress.

Coordination of a Job

For every Data Pump Export job and Data Pump Import job, a master process is created. The master process controls the entire job, including communicating with the clients, creating and controlling a pool of worker processes, and performing logging operations.

Tracking Progress Within a Job

While the data and metadata are being transferred, a master table is used to track the progress within a job. The master table is implemented as a user table within the database. The specific function of the master table for export and import jobs is as follows:

  • For export jobs, the master table records the location of database objects within a dump file set. Export builds and maintains the master table for the duration of the job. At the end of an export job, the content of the master table is written to a file in the dump file set.

  • For import jobs, the master table is loaded from the dump file set and is used to control the sequence of operations for locating objects that need to be imported into the target database.

The master table is created in the schema of the current user performing the export or import operation. Therefore, that user must have sufficient tablespace quota for its creation. The name of the master table is the same as the name of the job that created it. Therefore, you cannot explicitly give a Data Pump job the same name as a preexisting table or view.

For all operations, the information in the master table is used to restart a job.

The master table is either retained or dropped, depending on the circumstances, as follows:

  • Upon successful job completion, the master table is dropped.

  • If a job is stopped using the STOP_JOB interactive command, the master table is retained for use in restarting the job.

  • If a job is killed using the KILL_JOB interactive command, the master table is dropped and the job cannot be restarted.

  • If a job terminates unexpectedly, the master table is retained. You can delete it if you do not intend to restart the job.


See Also:

JOB_NAME for more information about how job names are formed.

Filtering Data During a Job

Within the master table, specific objects are assigned attributes such as name or owning schema. Objects also belong to a class of objects (such as TABLE, INDEX, or DIRECTORY). The class of an object is called its object type. You can use the EXCLUDE and INCLUDE parameters to restrict the types of objects that are exported and imported. The objects can be based upon the name of the object or the name of the schema that owns the object.

Transforming Metadata During a Job

When you are moving data from one database to another, it is often useful to perform transformations on the metadata for remapping storage between tablespaces or redefining the owner of a particular set of objects. This is done using the following Data Pump Import parameters: REMAP_DATAFILE, REMAP_SCHEMA, REMAP_TABLESPACE, and TRANSFORM.

Maximizing Job Performance

To improve throughput of a job, you can use the PARALLEL parameter to set a degree of parallelism that takes maximum advantage of current conditions. For example, to limit the effect of a job on a production system, the database administrator (DBA) might wish to restrict the parallelism. The degree of parallelism can be reset at any time during a job. For example, PARALLEL could be set to 2 during production hours to restrict a particular job to only two degrees of parallelism, and during nonproduction hours it could be reset to 8. The parallelism setting is enforced by a master process, which allocates work to be executed to a set of worker processes that perform the data and metadata processing within an operation. These worker processes operate in parallel.


Note:

The ability to adjust the degree of parallelism is available only in the Enterprise Edition of Oracle Database.

Loading and Unloading of Data

The worker processes are the ones that actually unload and load metadata and table data in parallel. The number of worker processes created is equal to the value supplied for the PARALLEL command-line parameter. The number of worker processes can be reset throughout the life of a job.


Note:

The value of PARALLEL is restricted to 1 in the Standard Edition of Oracle Database 10g.

When a worker process is assigned the task of loading or unloading a very large table or partition, it may choose to use the external tables access method to make maximum use of parallel execution. In such a case, the worker process becomes a parallel execution coordinator. The actual loading and unloading work is divided among some number of parallel I/O execution processes (sometimes called slaves) allocated from the instancewide pool of parallel I/O execution processes.

Monitoring Job Status

During the execution of a job, a log file will be optionally written. The log file summarizes the progress of the job and any errors that were encountered along the way. Whereas the log file records the completion status of the job, real-time status can be obtained by using the STATUS command in interactive mode of Data Pump Export or Import. Cumulative status for the job is returned, along with a description of the current operation. In addition, an estimate for the completion percentage of the current job is also returned. If the job is done, the state will be listed as Stopped or Completed.


See Also:

  • STATUS for information about the STATUS command in interactive mode of Data Pump Export

  • STATUS for information about the STATUS command in interactive mode of Data Pump Import


An alternative way to determine job status or to get other information about Data Pump jobs, would be to query the DBA_DATAPUMP_JOBS, USER_DATAPUMP_JOBS, or DBA_DATAPUMP_SESSIONS views.

The DBA_DATAPUMP_JOBS and USER_DATAPUMP_JOBS Views

The DBA_DATAPUMP_JOBS and USER_DATAPUMP_JOBS views identify all active Data Pump jobs, regardless of their state, on an instance (or on all instances for Real Application Clusters). They also show all Data Pump master tables not currently associated with an active job. You can use the job information to attach to an active job. Once you are attached to the job, you can stop it, change its parallelism, or monitor its progress. You can use the master table information to restart a stopped job or to remove any master tables that are no longer needed.

Table 1-1 describes the columns in the DBA_DATAPUMP_JOBS view and the USER_DATAPUMP_JOBS view.

Table 1-1 DBA_DATAPUMP_JOBS View and USER_DATAPUMP_JOBS View

Column Datatype Description
OWNER_NAME
VARCHAR2(30)
User who initiated the job (valid only for DBA_DATAPUMP_JOBS)
JOB_NAME
VARCHAR2(30)
User-supplied name for the job (or the default name generated by the server)
OPERATION
VARCHAR2(30)
Type of job
JOB_MODE
VARCHAR2(30)
Mode of job
STATE
VARCHAR2(30)
State of the job
DEGREE
NUMBER
Number of worker processes performing the operation
ATTACHED_SESSIONS
NUMBER
Number of sessions attached to the job


Note:

The information returned is obtained from dynamic performance views associated with the executing jobs and from the database schema information concerning the master tables. A query on these views can return multiple rows for a single Data Pump job (same owner and job name) if the query is executed while the job is transitioning between an Executing state and the Not Running state.

The DBA_DATAPUMP_SESSIONS View

The DBA_DATAPUMP_SESSIONS view identifies the user sessions that are attached to a job. The information in this view is useful for determining why a stopped operation has not gone away.

Table 1-2 describes the columns in the DBA_DATAPUMP_SESSIONS view.

Table 1-2 The DBA_DATAPUMP_SESSIONS View

Column Datatype Description
OWNER_NAME
VARCHAR2(30)
User who initiated the job.
JOB_NAME
VARCHAR2(30)
User-supplied name for the job (or the default name generated by the server).
SADDR
RAW(4) (RAW(8) on 64-bit systems)
Address of session attached to the job. Can be used with V$SESSION view.

Monitoring the Progress of Executing Jobs

Data Pump operations that transfer table data (export and import) maintain an entry in the V$SESSION_LONGOPS dynamic performance view indicating the job progress (in megabytes of table data transferred). The entry contains the estimated transfer size and is periodically updated to reflect the actual amount of data transferred.


Note:

The usefulness of the estimate value for export operations depends on the type of estimation requested when the operation was initiated, and it is updated as required if exceeded by the actual transfer amount. The estimate value for import operations is exact.

The V$SESSION_LONGOPS columns that are relevant to a Data Pump job are as follows:

  • USERNAME - job owner

  • OPNAME - job name

  • TARGET_DESC - job operation

  • SOFAR - megabytes (MB) transferred thus far during the job

  • TOTALWORK - estimated number of megabytes (MB) in the job

  • UNITS - 'MB'

  • MESSAGE - a formatted status message of the form:

    '<job_name>: <operation_name> : nnn out of mmm MB done'
    

File Allocation

There are three types of files managed by Data Pump jobs:

An understanding of how Data Pump allocates and handles these files will help you to use Export and Import to their fullest advantage.

Specifying Files and Adding Additional Dump Files

For export operations, you can specify dump files at the time the job is defined, as well as at a later time during the operation. For example, if you discover that space is running low during an export operation, you can add additional dump files by using the Data Pump Export ADD_FILE command in interactive mode.

For import operations, all dump files must be specified at the time the job is defined.

Log files and SQL files will overwrite previously existing files. Dump files will never overwrite previously existing files. Instead, an error will be generated.

Default Locations for Dump, Log, and SQL Files

Because Data Pump is server-based, rather than client-based, dump files, log files, and SQL files are accessed relative to server-based directory paths. Data Pump requires you to specify directory paths as directory objects. A directory object maps a name to a directory path on the file system.

For example, the following SQL statement creates a directory object named dpump_dir1 that is mapped to a directory located at /usr/apps/datafiles.

SQL> CREATE DIRECTORY dpump_dir1 AS '/usr/apps/datafiles';

The reason that a directory object is required is to ensure data security and integrity. For example:

  • If you were allowed to specify a directory path location for an input file, you might be able to read data that the server has access to, but to which you should not.

  • If you were allowed to specify a directory path location for an output file, the server might overwrite a file that you might not normally have privileges to delete.

Before you can run Data Pump Export or Data Pump Import, a directory object must be created by a database administrator (DBA) or by any user with the CREATE ANY DIRECTORY privilege. Then, when you are using Export or Import, you specify the directory object with the DIRECTORY parameter.

After a directory is created, the user creating the directory object needs to grant READ or WRITE permission on the directory to other users. For example, to allow the Oracle database to read and write files on behalf of user hr in the directory named by dpump_dir1, the DBA must execute the following command:

SQL> GRANT READ, WRITE ON DIRECTORY dpump_dir1 TO hr;

Note that READ or WRITE permission to a directory object only means that the Oracle database will read or write that file on your behalf. You are not given direct access to those files outside of the Oracle database unless you have the appropriate operating system privileges. Similarly, the Oracle database requires permission from the operating system to read and write files in the directories.

Data Pump Export and Import use the following order of precedence to determine a file's location:

  1. If a directory object is specified as part of the file specification, then the location specified by that directory object is used. (The directory object must be separated from the filename by a colon.)

  2. If a directory object is not specified for a file, then the directory object named by the DIRECTORY parameter is used.

  3. If a directory object is not specified, and if no directory object was named by the DIRECTORY parameter, then the value of the environment variable, DATA_PUMP_DIR, is used. This environment variable is defined using operating system commands on the client system where the Data Pump Export and Import utilities are run. The value assigned to this client-based environment variable must be the name of a server-based directory object, which must first be created on the server system by a DBA. For example, the following SQL statement creates a directory object on the server system. The name of the directory object is DUMP_FILES1, and it is located at '/usr/apps/dumpfiles1'.

    SQL> CREATE DIRECTORY DUMP_FILES1 AS '/usr/apps/dumpfiles1';
    
    

    Then, a user on a UNIX-based client system using csh can assign the value DUMP_FILES1 to the environment variable DATA_PUMP_DIR. The DIRECTORY parameter can then be omitted from the command line. The dump file employees.dmp, as well as the log file export.log, will be written to '/usr/apps/dumpfiles1'.

    %setenv DATA_PUMP_DIR DUMP_FILES1
    %expdp hr/hr TABLES=employees DUMPFILE=employees.dmp
    
    
  4. If none of the previous three conditions yields a directory object and you are a privileged user, then Data Pump attempts to use the value of the default server-based directory object, DATA_PUMP_DIR. It is important to understand that Data Pump does not create the DATA_PUMP_DIR directory object; it merely attempts to use its value when a privileged user has not provided a directory object using any of the mechanisms previously described. This default directory object must first be created by a DBA. Do not confuse this with the client-based environment variable of the same name.

Using Directory Objects When Automatic Storage Management Is Enabled

If you use Data Pump Export or Import with Automatic Storage Management (ASM) enabled, you must define the directory object used for the dump file so that the ASM disk-group name is used (instead of an operating system directory path). A separate directory object, which points to an operating system directory path, should be used for the log file. For example, you would create a directory object for the ASM dump file as follows:

SQL> CREATE or REPLACE DIRECTORY dpump_dir as '+DATAFILES/';

Then you would create a separate directory object for the log file:

SQL> CREATE or REPLACE DIRECTORY dpump_log as '/homedir/user1/';

To enable user hr to have access to these directory objects, you would assign the necessary privileges, for example:

SQL> GRANT READ, WRITE ON DIRECTORY dpump_dir TO hr;
SQL> GRANT READ, WRITE ON DIRECTORY dpump_log TO hr;

You would then use the following Data Pump Export command:

> expdp hr/hr DIRECTORY=dpump_dir DUMPFILE=hr.dmp LOGFILE=dpump_log:hr.log

See Also:


Setting Parallelism

For export and import operations, the parallelism setting (specified with the PARALLEL parameter) should be less than or equal to the number of dump files in the dump file set. If there are not enough dump files, the performance will not be optimal because multiple threads of execution will be trying to access the same dump file.

The PARALLEL parameter is valid only in the Enterprise Edition of Oracle Database 10g.

Using Substitution Variables

Instead of, or in addition to, listing specific filenames, you can use the DUMPFILE parameter during export operations to specify multiple dump files, by using a substitution variable (%U) in the filename. This is called a dump file template. The new dump files are created as they are needed, beginning with 01 for %U, then using 02, 03, and so on. Enough dump files are created to allow all processes specified by the current setting of the PARALLEL parameter to be active. If one of the dump files becomes full because its size has reached the maximum size specified by the FILESIZE parameter, it is closed, and a new dump file (with a new generated name) is created to take its place.

If multiple dump file templates are provided, they are used to generate dump files in a round-robin fashion. For example, if expa%U, expb%U, and expc%U were all specified for a job having a parallelism of 6, the initial dump files created would be expa01.dmp, expb01.dmp, expc01.dmp, expa02.dmp, expb02.dmp, and expc02.dmp.

For import and SQLFILE operations, if dump file specifications expa%U, expb%U, and expc%U are specified, then the operation will begin by attempting to open the dump files expa01.dmp, expb01.dmp, and expc01.dmp. If the dump file containing the master table is not found in this set, the operation expands its search for dump files by incrementing the substitution variable and looking up the new filenames (for example, expa02.dmp, expb02.dmp, and expc02.dmp). The search continues until the dump file containing the master table is located. If a dump file does not exist, the operation stops incrementing the substitution variable for the dump file specification that was in error. For example, if expb01.dmp and expb02.dmp are found but expb03.dmp is not found, then no more files are searched for using the expb%U specification. Once the master table is found, it is used to determine whether all dump files in the dump file set have been located.

Original Export and Import Versus Data Pump Export and Import

If you are familiar with the original Export (exp) and Import (imp) utilities, it is important to understand that many of the concepts behind them do not apply to Data Pump Export (expdp) and Data Pump Import (impdp). In particular: