What Happens During Execution of
a Data Pump Job?
Data Pump jobs
use a master table, a master process, and worker processes to perform the work
and keep track of progress.
For every Data Pump Export job and Data Pump Import job, a master process
is created. The master process controls the entire job, including communicating
with the clients, creating and controlling a pool of worker processes, and
performing logging operations.
While the data and metadata are being transferred, a master table is used to track the progress within a
job. The master table is implemented as a user table within the database. The
specific function of the master table for export and import jobs is as follows:
- For export
     jobs, the master table records the location of database objects within a
     dump file set. Export builds and maintains the master table for the duration
     of the job. At the end of an export job, the content of the master table
     is written to a file in the dump file set.
- For import jobs, the master table is
     loaded from the dump file set and is used to control the sequence of
     operations for locating objects that need to be imported into the target
     database.
The master table is created in the schema of the current user performing
the export or import operation. Therefore, that user must have sufficient
tablespace quota for its creation. The
name of the master table is the same as the name of the job that created it.
Therefore, you cannot explicitly give a Data Pump job the same name as a
preexisting table or view. For all operations, the information in the master
table is used to restart a job.
The master table
is either retained or dropped, depending on the circumstances, as follows:
- Upon successful job completion, the
     master table is dropped.
- If a job is stopped using the STOP_JOBinteractive command, the master table is retained for use in restarting the job.
- If a job is killed using the KILL_JOBinteractive command, the master table is dropped and the job cannot be restarted.
- If a job terminates unexpectedly, the
     master table is retained. You can delete it if you do not intend to
     restart the job.
- If a job stops before it starts
     running (that is, it is in the Defining state), the master table is
     dropped.
Filtering Data and Metadata During a Job
Within the master table, specific objects are assigned attributes such as
name or owning schema. Objects also belong to a class of objects (such as 
TABLE, INDEX, or DIRECTORY). The
class of an object is called its object type. You can use the EXCLUDE and INCLUDE parameters to restrict the types of objects that are exported and
imported. The objects can be based upon the name of the object or the name of
the schema that owns the object. You can also specify data-specific filters to
restrict the rows that are exported and imported.
Transforming Metadata During a Job
When you are moving data from one database to another, it is often useful
to perform transformations on the metadata for remapping storage between
tablespaces or redefining the owner of a particular set of objects. This is
done using the following Data Pump Import parameters: 
REMAP_DATAFILE, REMAP_SCHEMA, REMAP_TABLESPACE, and TRANSFORM.
Maximizing Job Performance
To improve throughput of a job, you can use the 
PARALLEL parameter to set a degree of parallelism
that takes maximum advantage of current conditions. For example, to limit the
effect of a job on a production system, the database administrator (DBA) might
wish to restrict the parallelism. The degree of parallelism can be reset at any
time during a job. For example, PARALLEL could be
set to 2 during production hours to restrict a particular job to only two
degrees of parallelism, and during nonproduction hours it could be reset to 8. The parallelism setting is enforced by the
master process, which allocates work to be executed to worker processes
that perform the data and metadata processing within an operation. These worker
processes operate in parallel. In general, the degree of parallelism should be
set to more than twice the number of CPUs on an instance. 
The worker processes are the ones that actually unload and load metadata
and table data in parallel. Worker processes are created as needed until the
number of worker processes is equal to the value supplied for the 
PARALLEL command-line parameter. The number of active
worker processes can be reset throughout the life of a job.
When a worker process is assigned the task
of loading or unloading a very large table or partition, it may choose to use
the external tables access method to make maximum use of parallel execution. In
such a case, the worker process becomes a parallel execution coordinator. The
actual loading and unloading work is divided among some number of parallel I/O
execution processes (sometimes called slaves) allocated from the instance wide
pool of parallel I/O execution processes.
Monitoring
Job Status
The Data Pump Export and Import utilities can be attached to a job in
either interactive-command mode or logging mode. In logging mode, real-time detailed status about the job is
automatically displayed during job execution. The information displayed can
include the job and parameter descriptions, an estimate of the amount of data
to be exported, a description of the current operation or item being processed,
files used during the job, any errors encountered, and the final job state
(Stopped or Completed). 
Job status can be displayed on request in interactive-command mode. The information displayed can include the
job description and state, a description of the current operation or item being
processed, files being written, and a cumulative status.
A log file can also be optionally written during the execution of a job.
The log file summarizes the progress of the job, lists any errors that were
encountered along the way, and records the completion status of the job. An
alternative way to determine job status or to get other information about Data
Pump jobs, would be to query the 
DBA_DATAPUMP_JOBS, USER_DATAPUMP_JOBS, or DBA_DATAPUMP_SESSIONS views.
Data Pump operations that transfer table data (export and import) maintain
an entry in the 
V$SESSION_LONGOPS dynamic performance
view indicating the job progress (in megabytes of table data transferred). The
entry contains the estimated transfer size and is periodically updated to
reflect the actual amount of data transferred.
File Allocation
There are three
types of files managed by Data Pump jobs:
- Dump files to contain the data and
     metadata that is being moved
- Log files to record the messages
     associated with an operation
- SQL files to record the output of a
     SQLFILE operation. A SQLFILE operation is invoked using the Data Pump
     Import SQLFILEparameter and results in all of the SQL DDL that Import will be executing based on other parameters, being written to a SQL file
For export operations, you can specify dump files at the time the job is
defined, as well as at a later time during the operation. For example, if you
discover that space is running low during an export operation, you can add
additional dump files by using the Data Pump Export 
ADD_FILE command in interactive mode. For import operations, all dump files must be specified at the time
the job is defined. Log files
and SQL files will overwrite previously existing files. Dump files will never
overwrite previously existing files. Instead, an error will be generated.
Because Data Pump is server-based, rather than client-based, dump files,
log files, and SQL files are accessed relative to server-based directory paths.
Data Pump requires you to specify directory paths as directory objects. A
directory object maps a name to a directory path on the file system. For
example, the following SQL statement creates a directory object named 
dpump_dir1 that is mapped to a directory located at /usr/apps/datafiles.SQL> CREATE DIRECTORY dpump_dir1 AS '/usr/apps/datafiles';
The reason that a
directory object is required is to ensure data security and integrity. For
example:
- If you
     were allowed to specify a directory path location for an input file, you
     might be able to read data that the server has access to, but to which you
     should not.
- If you
     were allowed to specify a directory path location for an output file, the
     server might overwrite a file that you might not normally have privileges
     to delete.
On Unix and Windows NT systems, a default directory object, 
DATA_PUMP_DIR, is created at database creation or whenever the
database dictionary is upgraded. By default, it is available only to privileged
users. If you are not a privileged user, before you can run Data Pump Export or
Data Pump Import, a directory object must be created by a database
administrator (DBA) or by any user with the CREATE ANY DIRECTORY privilege.
After a directory is created, the user creating the directory object needs
to grant 
READ or WRITE permission
on the directory to other users. For example, to allow the Oracle database to
read and write files on behalf of user hr in the directory named by dpump_dir1, the
DBA must execute the following command:SQL> GRANT READ, WRITE ON DIRECTORY dpump_dir1 TO hr;
Note that 
READ or WRITE permission to a directory object only means that the Oracle database will
read or write that file on your behalf. You are not given direct access to
those files outside of the Oracle database unless you have the appropriate
operating system privileges. Similarly, the Oracle database requires permission
from the operating system to read and write files in the directories.
Data Pump Export
and Import use the following order of precedence to determine a file's location:
- If a directory object is specified as part of the file specification,
     then the location specified by that directory object is used. (The
     directory object must be separated from the filename by a colon.). If a
     directory object is not specified for a file, then the directory object
     named by the DIRECTORYparameter is used. If a directory object is not specified, and if no directory object was named by the DIRECTORY parameter, then the value of the environment variable, DATA_PUMP_DIR, is used. This environment variable is defined using operating system commands on the client system where the Data Pump Export and Import utilities are run. The value assigned to this client-based environment variable must be the name of a server-based directory object, which must first be created on the server system by a DBA. For example, the following SQL statement creates a directory object on the server system. The name of the directory object is DUMP_FILES1, and it is located at '/usr/apps/dumpfiles1'.
SQL> CREATE DIRECTORY DUMP_FILES1 AS '/usr/apps/dumpfiles1';
  - Then, a
     user on a UNIX-based client system using cshcan assign the valueDUMP_FILES1to the environment variableDATA_PUMP_DIR. TheDIRECTORYparameter can then be omitted from the command line. The dump fileemployees.dmp, as well as the log fileexport.log, will be written to'/usr/apps/dumpfiles1'.
%setenv DATA_PUMP_DIR DUMP_FILES1
%expdp hr/hr TABLES=employees DUMPFILE=employees.dmp
  - If none of
     the previous three conditions yields a directory object and you are a
     privileged user, then Data Pump attempts to use the value of the default
     server-based directory object, DATA_PUMP_DIR. This directory object is automatically created at database creation or when the database dictionary is upgraded. You can use the following SQL query to see the path definition forDATA_PUMP_DIR:
SQL> SELECT directory_name, directory_path FROM dba_directories WHERE directory_name='DATA_PUMP_DIR';
If you are not a
privileged user, access to the 
DATA_PUMP_DIR
directory object must have previously been granted to you by a DBA.  Do not confuse the
default DATA_PUMP_DIR
directory object with the client-based environment
variable of the same name.
If you use Data Pump Export or Import with Automatic Storage Management
(ASM) enabled, you must define the directory object used for the dump file so
that the ASM disk-group name is used (instead of an operating system directory
path). A separate directory object, which points to an operating system
directory path, should be used for the log file. For example, you would create
a directory object for the ASM dump file as follows:
SQL> CREATE or REPLACE DIRECTORY dpump_dir as '+DATAFILES/';
Then you would
create a separate directory object for the log file:
SQL> CREATE or REPLACE DIRECTORY dpump_log as '/homedir/user1/';
To enable user 
hr to have access to these directory objects, you
would assign the necessary privileges, for example:SQL> GRANT READ, WRITE ON DIRECTORY dpump_dir TO hr;
SQL> GRANT READ, WRITE ON DIRECTORY dpump_log TO hr;
You would then
use the following Data Pump Export command:
> expdp hr/hr DIRECTORY=dpump_dir DUMPFILE=hr.dmp LOGFILE=dpump_log:hr.log
Setting Parallelism
For export and import operations, the parallelism setting (specified with
the 
PARALLEL parameter) should be less than or equal to the
number of dump files in the dump file set. If there are not enough dump files,
the performance will not be optimal because multiple threads of execution will
be trying to access the same dump file.
Instead of, or in addition to, listing specific filenames, you can use the 
DUMPFILE parameter during export operations to specify
multiple dump files, by using a substitution variable (%U) in the filename. This is called a dump file
template. The new dump files are created as they are needed, beginning with 01 for %U, then using 02, 03, and so on.
Enough dump files are created to allow all processes specified by the current
setting of the PARALLEL parameter to be active. If one of the dump files
becomes full because its size has reached the maximum size specified by the FILESIZE parameter, it is
closed, and a new dump file (with a new generated name) is created to take its
place.
If multiple dump file templates are provided, they are used to generate
dump files in a round-robin fashion.
For example, if 
expa%U, expb%U, and expc%U were all specified for a job having a parallelism
of 6, the initial dump files created would be expa01.dmp, expb01.dmp, expc01.dmp, expa02.dmp, expb02.dmp, and expc02.dmp.
For import and SQLFILE operations, if dump file specifications 
expa%U, expb%U, and expc%U are specified, then the operation will begin by
attempting to open the dump files expa01.dmp, expb01.dmp, and expc01.dmp. If the dump file containing the master table is
not found in this set, the operation expands its search for dump files by
incrementing the substitution variable and looking up the new filenames (for
example, expa02.dmp, expb02.dmp, and expc02.dmp). The search continues until the dump file
containing the master table is located. If a dump file does not exist, the
operation stops incrementing the substitution variable for the dump file
specification that was in error. For example, if expb01.dmp and expb02.dmp are found but expb03.dmp is not found, then no more files are searched for
using the expb%U specification. Once the master table is found, it is used to determine whether all dump
files in the dump file set have been located.
Because most Data Pump operations are performed on the server side, if you
are using any version of the database other than 
COMPATIBLE, you must provide the server with specific
version information. Otherwise, errors may occur. To specify version
information, use the VERSION parameter.
Keep the
following information in mind when you are using Data Pump Export and Import to
move data between different database versions:
- If you
     specify a database version that is older than the current database
     version, certain features may be unavailable. For example, specifying VERSION=10.1will cause an error if data compression is also specified for the job because compression was not supported in 10.1.
- On a Data
     Pump export, if you specify a database version that is older than the
     current database version, then a dump file set is created that you can
     import into that older version of the database. However, the dump file set
     will not contain any objects that the older database version does not support.
     For example, if you export from a version 10.2 database to a version 10.1
     database, comments on indextypes will
     not be exported into the dump file set.
- Data Pump Import can always read dump
     file sets created by older versions of the database.
- Data Pump Import
     cannot read dump file sets created by a database version that is newer
     than the current database version, unless those dump file sets were
     created with the version parameter set to the version of the target
     database. Therefore, the best way to perform a downgrade is to perform
     your Data Pump export with the VERSIONparameter set to the version of the target database.
- When
     operating across a network link, Data Pump requires that the remote
     database version be either the same as the local database or one version
     older, at the most. For example, if the local database is version 10.2,
     the remote database must be either version 10.1 or 10.2. If the local
     database is version 10.1, then 10.1 is the only version supported for the
     remote database.
Original Export and Import Versus Data Pump Export
and Import
If you are familiar with the original Export (
exp) and Import (imp) utilities, it is important to understand
that many of the concepts behind them do not apply to Data Pump Export (expdp) and Data Pump Import (impdp). In particular:- Data Pump Export and Import operate
     on a group of files called a dump file set rather than on a single
     sequential dump file.
- Data Pump Export and Import access
     files on the server rather than on the client. This results in improved
     performance. It also means that directory objects are required when you
     specify file locations.
- The Data
     Pump Export and Import modes operate symmetrically, whereas original
     export and import did not always exhibit this behavior. For example,
     suppose you perform an export with FULL=Y, followed by an import usingSCHEMAS=HR. This will produce the same results as if you performed an export withSCHEMAS=HR, followed by an import withFULL=Y.
- Data Pump
     Export and Import use parallel execution rather than a single stream of
     execution, for improved performance. This means that the order of data
     within dump file sets and the information in the log files is more
     variable.
- Data Pump
     Export and Import represent metadata in the dump file set as XML documents
     rather than as DDL commands. This provides improved flexibility for
     transforming the metadata at import time.
- Data Pump Export and Import are
     self-tuning utilities. Tuning parameters that were used in original Export
     and Import, such as BUFFERandRECORDLENGTH, are neither required nor supported by Data Pump Export and Import.
- At import
     time there is no option to perform interim commits during the restoration
     of a partition. This was provided by the COMMITparameter in original Import.
- There is no option to merge extents
     when you re-create tables. In original Import, this was provided by the COMPRESSparameter. Instead, extents are reallocated according to storage parameters for the target table.
- Sequential media, such as tapes and
     pipes, are not supported.
- The Data Pump method for moving data
     between different database versions is different than the method used by
     original Export/Import. With original Export, you had to run an older
     version of Export (exp) to produce a dump file that was compatible with an older database version. With Data Pump, you can use the current Export (expdp) version and simply use theVERSIONparameter to specify the target database version.
- When you
     are importing data into an existing table using either APPENDorTRUNCATE, if any row violates an active constraint, the load is discontinued and no data is loaded. This is different from original Import, which logs any rows that are in violation and continues with the load.
- Data Pump
     Export and Import consume more undo tablespace than
     original Export and Import. This is due to additional metadata queries
     during export and some relatively long-running master table queries during
     import. As a result, for databases with large amounts of metadata, you may
     receive an ORA-01555: snapshot too old error. To avoid this, consider adding additional undo tablespace or increasing the value of theUNDO_RETENTIONinitialization parameter for the database.
- If a table
     has compression enabled, Data Pump Import attempts to compress the data being loaded. Whereas, the
     original Import utility loaded data in such a way that if a even table had
     compression enabled, the data was not compressed upon import.
- Data Pump supports character set conversion for both direct path and external tables. Most of the
     restrictions that exist for character set conversions in the original
     Import utility do not apply to Data Pump. The one case in which character
     set conversions are not supported under the Data Pump is when using
     transportable tablespaces. 
 
No comments:
Post a Comment
Share your knowledge it really improves, don't show off...