
Troubleshooting
This chapter describes several common problems that you may encounter when using the symmetric replication facility, and suggested resolutions. The topics include:
Note: Often when diagnosing a replication problem, you will need to consult one or more data dictionary views. For the replication catalog views, always use the SYS.DBA_name views if you are authorized. Otherwise, use either the ALL_name or USER_name views. The views for deferred remote procedure calls are all owned by SYS and have no prefix.
Diagnosing Problems with DBA_REPCATLOG Entries
The DBA_REPCATLOG view shows the interim and final status for asynchronous administrative activities. You should examine this table before enabling a replication environment with RESUME_MASTER_ACTIVITY. You should also examine it whenever you suspect replication administration problems. The master definition site uses its DBA_REPCATLOG view to record both local and remote activities. Each of these activities is explained below.
For each local activity, there is a row in the master definition site's DBA_REPCATLOG view. The STATUS column in this row begins with the value READY. If the activity completes normally, the row is deleted from the DBA_REPCATLOG view. If the activity encounters a problem, the Oracle error number is captured in the ERRNUM column and the error message is captured in the MESSAGE column. These columns are helpful when diagnosing symmetric replication problems.
For a remote activity, the symmetric replication facility creates two rows that appear in the DBA_REPCATLOG view: one at the master definition site with a STATUS value of AWAIT_CALLBACK, and one at the remote master with a STATUS value of READY. What happens to these two log rows depends on whether the remote activity completes normally.
- If the remote activity completes normally, the remote master updates its row in the DBA_REPCATLOG view to DO_CALLBACK and commits. When it establishes communication with the master definition site, the remote master deletes the associated row from its DBA_REPCATLOG view and the corresponding row at the master definition site and commits.
- If the remote activity encounters a problem, the remote master updates the ERRNUM and MESSAGE columns and sets the STATUS column to ERROR for the associated row in the DBA_REPCATLOG view for the remote master. When it establishes communication with the master definition site, the remote master updates the row in the DBA_REPCATLOG view at the master definition site with the local ERRNUM, MESSAGE, and STATUS values. Then the remote master deletes the associated row in its DBA_REPCATLOG view and commits.
DO_DEFERRED_REPCAT_ADMIN executes the requests in the local DBA_REPCATLOG submitted by the user that invoked DO_DEFERRED_REPCAT_ADMIN in the order determined by the ID column. When DO_DEFERRED_REPCAT_ADMIN is executed at a master that is not the master definition site, it does as much as possible. Some asynchronous activities such as populating a replicated table require communication with the master definition site. If this communication is not possible, DO_DEFERRED_REPCAT_ADMIN stops executing rows from DBA_REPCATLOG to avoid executing DBA_REPCATLOG rows out of order. Some communication with the master definition site, such as the final step of updating or deleting a DBA_REPCATLOG row at the master definition site, can be deferred and will not prevent DO_DEFERRED_REPCAT_ADMIN from executing additional rows in the DBA_REPCATLOG.
Entries Not Removed from Log
Occasionally, you may notice that an entry in the DBA_REPCATLOG view is not removed as anticipated, yet the STATUS is not ERROR. Here are some items to check if the symmetric replication facility does not appear to be working properly.
- Ensure that there is not a problem with the execution of the job queue. Job queue errors are described
.
Submit a trivial job at the master site to ensure that it runs as expected. In a newly created database, jobs are not automatically enabled until the database is shut down and restarted. You can call DBMS_IJOB.SET_ENABLED(TRUE) to avoid restarting the database. (Note the I, for internal, in DBMS_IJOB.)
Additionally, check the LOG_USER column in the DBA_JOBS view to ensure that the replication job is being run on behalf of the replication administrator. Check the USERID column of the DBA_REPCATLOG view to ensure that the replication administrator was the user that submitted the request. DBMS_REPCAT.DO_DEFERRED_REPCAT_ADMIN only performs those administrative requests submitted by the user that calls this procedure.
- Ensure the relevant databases are running and communication is possible.
- Ensure that you have the necessary private database link for the symmetric replication facility, and that the user designated in the CONNECT TO clause (generally the surrogate replication administrator) has the necessary privileges, as described
. Note that database links are required in both directions, and that privileges must be granted at both sites.
- Ensure that you have the necessary private database link for the replication administrator, and that the replication administrator has been granted the necessary privileges, as described
. Note that database links are required in both directions, and that privileges must be granted at both sites.
Disabling Job Queues
When diagnosing a replication problem, you may find it useful to disable the job queue at one or more masters. To do this, shut down the master and restart it with a value of zero for JOB_QUEUE_PROCESSES. To avoid restarting the database, you can call DBMS_IJOB.SET_ENABLED(FALSE). (Note the I, for internal, in DBMS_IJOB.) You must then connect to the master site and execute the procedure DO_DEFERRED_REPCAT_ADMIN, to execute asynchronous administrative activities at that master. This lets you have better control over the execution time of administrative activities.
Diagnosing Problems with Job Queues
If a job is not executed as expected, check the following:
- Check the NEXT_DATE value in the DBA_JOBS view to ensure that the job was properly scheduled for execution.
- After determining that the job execution interval has passed, check the FAILURES and BROKEN values in the DBMS_JOBS view.
- If the job failed to execute, check the alert log and trace files for error information and fix the error. If the job was not broken, it will ultimately be re-executed. If the job was broken, or if you want to force immediate re-execution of the job, call DBMS_JOB.RUN after fixing the job.
- If the job was never executed, there may be a problem with the availability of background processes. Check the initialization parameter JOB_QUEUE_PROCESSES to determine the maximum number of background processes available and JOB_QUEUE_INTERVAL to determine how frequently each background processes wakes up. The DBA_JOBS_RUNNING view describes what jobs these processes are currently running (you may have a problem with a runaway job), and the alert log and trace file can provide you with additional information about potential problems with the background process.
Diagnosing Problems with Master Sites
Replicated Objects Not Created at New Master Site
If you add a new master site to your replicated environment, and the appropriate replicated objects are not created at the new site, try the following:
- Ensure that the necessary private database links exist between the new master site and the existing master sites, as described
. You must have links both to the new site from each existing site, and from the new site to each existing site.
- Re-execute DBMS_REPCAT.DO_DEFERRED_REPCAT_ADMIN at the new master site.
DDL Changes Not Propagated to Master Site
If you call a procedure in the DBMS_REPCAT package to make a schema-level change at the master definition site that is not propagated to a master site, try the following:
- Examine the DBA_REPCATLOG at the master site and at the master definition site.
- Call DBMS_REPCAT.DO_DEFERRED_REPCAT_ADMIN at the master site to force this change to be applied at the new site.
DDL submitted to repcat executes on behalf of the user who submits the DDL. When a DDL statement applies to an object in a schema other than the submitter's schema, the submitter needs appropriate privileges to execute the statement. In addition, the statement must explicitly name the schema. For example, assume that you, the replication administrator, supply the following as the ddl_text parameter to the DBMS_REPCAT.CREATE_MASTER_REPOBJECT procedure:
CREATE TABLE scott.new_emp AS SELECT * FROM hr.emp WHERE...;
Because each table name contains a schema name, this statement works whether the replication administrator is SCOTT, HR, or another user--as long as the administrator has the required privileges.
Suggestion: Qualify the name of every schema object with the appropriate schema.
DML Changes Not Asynchronously Propagated to Other Sites
If you make an update to your data at a master site, and that change is not properly asynchronously propagated to the other sites in your replicated environment, try the following:
- Check the NEXT_DATE value in the DefSchedule view to determine if the changes have been scheduled to be propagated.
- Call DBMS_DEFER_SYS.EXECUTE to force the execution of the DML, or call DBMS_DEFER_SYS.SCHEDULE_EXECUTION to schedule execution at a periodic interval. If the scheduled execution interval has passed, check the DefSchedule view for the job number and follow the instructions for diagnosing problems with job queues
.
- If you determine that the change was not propagated because of an error, you can also check the DefError view at the destination site to determine the cause of the error. You might also check the DefTran and DefCall views for more information. Additional instructions on how to determine the cause of an error logged in the DefError view are included
.
DML Cannot be Applied to Replicated Table
If you receive the DEFERRED_RPC_QUIESCE exception when you attempt to modify a replicated table, one or more replicated object groups at your local site are "quiescing" or "quiesced". To proceed, your replication administrator must either call DBMS_REPCAT.RESUME_MASTER_ACTIVITY, or DBMS_REPCAT.DROP_MASTER_REPSCHEMA for each quiesced, replicated object group.
Bulk Updates and Constraint Violations
A single update statement applied to a replicated table can update zero or more rows. The update statement causes zero or more update requests to be queued for deferred execution, one for each row updated. This distinction is important when constraints are involved, because Oracle effectively performs constraint checking at the end of each statement. While a bulk update might not violate a uniqueness constraint, for example, some equivalent sequence of individual updates might violate uniqueness.
If the ordering of updates is important, update one row at a time in an appropriate order. This lets you define the order of the update requests in the deferred RPC queue.
Re-creating a Replicated Object
If you replicate an object that already exists at the master definition site with DBMS_REPCAT.CREATE_MASTER_REPOBJECT, the status of the object must be VALID. If the status is INVALID, recompile the object, or drop and recreate the object. Then invoke CREATE_MASTER_REPOBJECT with the RETRY argument set to TRUE.
Unable to Generate Replication Support for a Table
When you call GENERATE_REPLICATION_SUPPORT for a replicated table, Oracle generates a trigger at the local site. If the table will be propagating changes asynchronously, this trigger uses the DBMS_DEFER package to build the calls that are placed in the local deferred transaction queue. EXECUTE privileges for most of the packages involved with symmetric replication, such as DBMS_REPCAT and DBMS_DEFER, need to be granted to replication administrators and users that own replicated objects. The DBMS_REPCAT_ADMIN package performs the grants needed by the replication administrators for many typical replication scenarios. When the owner of a replicated object is not a replication administrator, however, you must explicitly grant EXECUTE privilege on DBMS_DEFER to the object owner.
Problems with Replicated Procedures or Triggers
If you discover an unexpected unresolved conflict, and you were mixing procedural and row-level replication on a table, carefully review the procedure to ensure that the replicated procedure did not cause the conflict. Ensure that ordering conflicts between procedural and row-level updates are not possible. Check if the replicated procedure locks the table in EXCLUSIVE mode before performing updates (or uses some other mechanism of avoiding conflicts with row-level updates). Check that row-level replication is disabled at the start of the replicated procedure and re-enabled at the end. Ensure that row-level replication is re-enabled even if exceptions occur when the procedure executes. In addition, check to be sure that the replicated procedure executed at all master sites. You should perform similar checks on any replicated triggers that you have defined on replicated tables.
Diagnosing Problems with the Deferred Transaction Queue
When you call DBMS_DEFER_SYS.SCHEDULE_EXECUTION, Oracle adds this job to the job queue. If you have scheduled your transaction queue to be pushed at a periodic interval, and you encounter a problem, you should first be certain that you are not experiencing a problem with the job queue. For information on diagnosing job queue problems, see page 9 - 4.
When the symmetric replication facility pushes a deferred transaction to a remote site, it uses a distributed transaction to ensure that the transaction has been properly committed at the remote site before the transaction is removed from the queue at the local site.
For information on diagnosing problems with distributed transactions (two-phase commit), see Oracle7 Server Distributed Systems, Volume I.
If you notice that transactions are not being pushed to a given remote site, you may have a problem with how you have specified the destination for the transaction. If you specify a destination database when you call DBMS_DEFER_SYS.SCHEDULE_EXECUTION (using the DBLINK parameter), or DBMS_DEFER_SYS.EXECUTE using the DESTINATION parameter), you must provide the full database link. These procedures do not expand the database link name.
Having the wrong view definitions can lead to erroneous deferred transaction behavior. The DEFCALLDEST and DEFTRANDEST views are defined differently in CATDEFER.SQL and CATREPC.SQL. The definitions in CATREPC.SQL should be used whenever symmetric replication is used. If CATDEFER.SQL is ever (re)loaded, ensure that the view definitions in CATREPC.SQL are subsequently loaded.
Diagnosing Problems with Snapshots
Problems Creating Replicated Objects at Snapshot Site
If you unsuccessfully attempt to create a new replicated object at a snapshot site, try the following:
- For updatable snapshots, check that the associated master table has a master snapshot log.
- Make sure that you have the necessary privileges to create the object. The privileges to create an updatable snapshot as a replicated object are listed
.
- If you are still unsuccessful, try passing the CREATE SNAPSHOT text to the CREATE_SNAPSHOT_REPOBJECT procedure as the DDL_TEXT parameter, instead of precreating the snapshot.
Problems with Snapshot Refresh
If you have a problem refreshing a snapshot, try the following:
- Check the NEXT value in the DBA_SNAPSHOTS view to determine if the refresh has been scheduled.
- If the refresh interval has passed, check the DBA_REFRESH view for the associated job number for the snapshot refresh and then follow the instructions for diagnosing a problem with job queues
.
- You may also encounter an error if you attempt to define a master detail relationship between two snapshots. You should define master detail relationships only on the master tables by using declarative referential integrity constraints; the related snapshots should then be placed in the same refresh group to preserve this relationship.
- If you encounter a situation where your snapshots are being continually refreshed, you should check the refresh interval that you specified. This interval is evaluated before the snapshot is refreshed. If the interval that you specify is less than the amount of time it takes to refresh the snapshot, the snapshot will be refreshed each time the SNP background process checks the queue of outstanding jobs.
- If there are any outstanding conflicts recorded in the DefError view at the master site for the snapshots, you can only refresh the snapshots by setting the parameter REFRESH_AFTER_ERRORS to TRUE. This parameter can be set when you call DBMS_SNAPSHOT.REFRESH, DBMS_REFRESH.MAKE, or DBMS_REFRESH.CHANGE.
- If your snapshot logs are growing too large, see the section on managing snapshot space log use
.
- Snapshots in the same refresh groups will have their rows updated in a single transaction. Such a transaction can be very large, requiring either a large rollback segment at the snapshot site (with the rollback segment specified to be used during refresh) or you will need to use more frequent refreshes to reduce the transaction size.
- If Oracle error ORA-12004 occurs, the master site may have run out of rollback segments when trying to maintain the snapshot log, or the snapshot log may be out of date (for example, it may have been purged or recreated, see page 3 - 9).
- Complete refreshes of a single table internally us the TRUNCATE feature to increase speed and reduce rollback segment requirements. However, until the snapshot refresh is complete, users may temporarily see no data in the snapshot. Refreshes of multiple snapshots (for example, refresh groups) do not use the TRUNCATE feature.
- Reorganization of the master table (for example, to reclaim system resources) should TRUNCATE the master table to force snapshots to do complete refreshes, otherwise, the snapshots will have incorrect references to master table ROWIDs. For more information see page 3 - 6
See also page 3 - 18 for additional troubleshooting information.