3 Administering Oracle Clusterware Components

The Oracle Clusterware includes two important components: the voting disk and the Oracle Cluster Registry (OCR). The voting disk is a file that manages information about node membership and the OCR is a file that manages cluster and Oracle Real Application Clusters (Oracle RAC) database configuration information. This chapter describes how to administer the voting disks and the Oracle Cluster Registry (OCR) under the following topics:

Administering Voting Disks in Oracle Real Application Clusters

Oracle recommends that you select the option to configure multiple voting disks during Oracle Clusterware installation to improve availability. After installation, use the following procedures to regularly backup your voting disks and to recover them as needed:

Backing up Voting Disks

Run the following command to back up a voting disk. Perform this operation on every voting disk as needed where voting_disk_name is the name of the active voting disk and backup_file_name is the name of the file to which you want to back up the voting disk contents:

dd if=voting_disk_name of=backup_file_name

Note:

You can use the ocopy command in Windows environments or use the crsctl commands described in the following note.

Recovering Voting Disks

Run the following command to recover a voting disk where backup_file_name is the name of the voting disk backup file and voting_disk_name is the name of the active voting disk:

dd if=backup_file_name of=voting_disk_name

Note:

If you have multiple voting disks, then you can remove the voting disks and add them back into your environment using the crsctl delete css votedisk path and crsctl add css votedisk path commands respectively, where path is the complete path of the location on which the voting disk resides.

Changing the Voting Disk Configuration after Installing Oracle Real Application Clusters

You can add and remove voting disks after installing Oracle Real Application Clusters. Do this using the following commands where path is the fully qualified path for the additional voting disk. Run the following command as the root user to add a voting disk:

crsctl add css votedisk path

Run the following command as the root user to remove a voting disk:

crsctl delete css votedisk path

Note:

Bring down ocssd using the -force option prior to modifying the voting disk configuration with either of these commands to avoid interacting with active Oracle Clusterware daemons. Note also that using the -force option while any cluster node is active may corrupt your configuration.

Administering the Oracle Cluster Registry in Oracle Real Application Clusters

This section describes how to administer the OCR. The OCR contains information about the cluster node list, instance-to-node mapping information, and information about Oracle Clusterware resource profiles for applications that you have customized as described in Chapter 14, "Making Applications Highly Available Using Oracle Clusterware". The procedures discussed in this section are:

See Also:

Appendix A, "Troubleshooting" for information about the ocrdump utility

Adding, Replacing, Repairing, and Removing the OCR

The Oracle installation process for Oracle RAC gives you the option of automatically mirroring the OCR. This creates a second OCR to duplicate the original OCR. You can put the mirrored OCR on an Oracle cluster file system disk, on a shared raw device, or on a shared raw logical volume.

You can also manually mirror the OCR if you:

  • Upgraded to release 10.2 but did not choose to mirror the OCR during the upgrade

  • Created only one OCR during the Oracle Clusterware installation

Note:

Oracle strongly recommends that you use mirrored OCRs if the underlying storage is not RAID. This prevents the OCR from becoming a single point of failure.

In addition to mirroring the OCR, you can also replace the OCR if Oracle displays an OCR failure alert in Enterprise Manager or in the Oracle Clusterware alert log file. You can also repair an OCR location if there is a mis-configuration or other type of OCR error. In addition, you can remove an OCR location if, for example, your system experiences a performance degradation due to OCR processing or if you transfer your OCR to RAID storage devices and chose to no longer use multiple OCRs. Use the following procedures to perform these tasks:

Note:

The operations in this section affect the OCR cluster-wide: they change the OCR configuration information in the ocr.loc file on UNIX-based systems and the Registry keys on Windows-based systems. However, the ocrconfig command cannot modify OCR configuration information for nodes that are shut down or for nodes on which Oracle Clusterware is not running.

Adding an Oracle Cluster Registry

You can add an OCR location after an upgrade or after completing the Oracle RAC installation. If you already mirror the OCR, then you do not need to add an OCR location; Oracle automatically manages two OCRs when it mirrors the OCR. Oracle RAC environments do not support more than two OCRs, a primary OCR and a second OCR.

Note:

If your OCR resides on a cluster file system file or if the OCR is on an network file system, then create the target OCR file before performing the procedures in this section.

Run the following command to add an OCR location using either destination_file or disk to designate the target location of the additional OCR:

ocrconfig -replace ocr destination_file or disk

Run the following command to add an OCR mirror location using either destination_file or disk to designate the target location of the additional OCR:

ocrconfig -replace ocrmirror destination_file or disk

Note:

You must be root user to run ocrconfig commands.

Replacing an Oracle Cluster Registry

You can replace a mirrored OCR using the following procedure as long as one OCR-designated file remains online:

  1. Verify that the OCR that you are not going to replace is online.

  2. Verify that Oracle Clusterware is running on the node on which the you are going to perform the replace operation.

    Note:

    The OCR that you are replacing can be either online or offline. In addition, if your OCR resides on a cluster file system file or if the OCR is on an network file system, then create the target OCR file before continuing with this procedure.  
  3. Run the following command to replace the OCR using either destination_file or disk to indicate the target OCR:

    ocrconfig -replace ocr destination_file or disk
    

    Run the following command to replace an OCR mirror location using either destination_file or disk to indicate the target OCR:

    ocrconfig -replace ocrmirror destination_file or disk
    
  4. If any node that is part of your current Oracle RAC environment is shut down, then run the command ocrconfig -repair on any node that is stopped to enable that node to rejoin the cluster after you restart the stopped node.

Repairing an Oracle Cluster Registry Configuration on a Local Node

You may need to repair an OCR configuration on a particular node if your OCR configuration changes while that node is stopped. For example, you may need to repair the OCR on a node that was not up while you were adding, replacing, or removing an OCR. Use the following procedure to repair an OCR configuration:

  1. Run the following command to stop Oracle Clusterware on all nodes:

    crsctl stop crs
    
  2. Run the following command on one node to take a backup of the OCR configuration:

    ocrconfig -export export_filename
    

    In the preceding command, export_filename is the name of the of the file to which you backed up OCR. You import this file after you repair the OCR configuration.

  3. Run the following command on all nodes to repair the OCR configuration:

    ocrconfig -repair
    
  4. Run the following command to import the backup to the repaired OCR configuration:

    ocrconfig -import export_filename
    
  5. Run the following command on one node to overwrite the OCR configuration on disk:

    ocrconfig -overwrite
    
  6. Run the following command on one node to verify the OCR configuration:

    ocrcheck
    

Note:

You cannot perform this operation on a node on which the Oracle Clusterware daemon is running.

Removing an Oracle Cluster Registry

To remove an OCR location, at least one other OCR must be online. You can remove an OCR location to reduce OCR-related overhead or to stop mirroring your OCR because you moved your the OCR to redundant storage such as RAID. Perform the following procedure to remove an OCR location from your Oracle RAC environment:

  1. Ensure that at least one OCR other than the OCR that you are removing is online.

    Caution:

    Do not perform this OCR removal procedure unless there is at least one other active OCR online.
  2. Run the following command on any node in the cluster to remove the OCR:

    ocrconfig -replace ocr
    

    Run the following command on any node in the cluster to remove the mirrored OCR:

    ocrconfig -replace ocrmirror
    

    These commands update the OCR configuration on all of the nodes on which Oracle Clusterware is running.

    See Also:

    You can also use the -backuploc option to move the OCR to another location as described in Appendix D, " Oracle Cluster Registry Configuration Tool Command Syntax"

Note:

When removing an OCR location, the remaining OCR must be online. If you remove a primary OCR, then the mirrored OCR becomes the primary OCR.

Managing Backups and Recovering the OCR Using OCR Backup Files

This section describes two methods for copying OCR content and using it for recovery. The first method uses automatically generated OCR file copies and the second method uses manually created OCR export files.

The Oracle Clusterware automatically creates OCR backups every four hours. At any one time, Oracle always retains the last three backup copies of the OCR. The CRSD process that creates the backups also creates and retains an OCR backup for each full day and at the end of each week.

You cannot customize the backup frequencies or the number of files that Oracle retains. You can use any backup software to copy the automatically generated backup files at least once daily to a different device from where the primary OCR resides.

The default location for generating backups on UNIX-based systems is CRS_home/cdata/cluster_name where cluster_name is the name of your cluster. The Windows-based default location for generating backups uses the same path structure.

Note:

You must be root user to run ocrconfig commands.

Restoring the Oracle Cluster Registry from Automatically Generated OCR Backups

If an application fails, then before attempting to restore the OCR, restart the application. As a definitive verification that the OCR failed, run an ocrcheck and if the command returns a failure message, then both the primary OCR and the OCR mirror have failed. Attempt to correct the problem using one of the following platform-specific OCR restoration procedures.

Note:

You cannot restore your configuration from an automatically created OCR backup file using the -import option, which is explained in "Administering the Oracle Cluster Registry with OCR Exports". You must instead use the -restore option as described in the following sections.
Restoring the Oracle Cluster Registry on UNIX-Based Systems

Use the following procedure to restore the OCR on UNIX-based systems:

  1. Identify the OCR backups using the ocrconfig -showbackup command. Review the contents of the backup using ocrdump -backupfile file_name where file_name is the name of the backup file.

  2. Stop Oracle Clusterware on all the nodes in your Oracle RAC cluster by running the following command as root:

    # crsctl stop crs
    

    Repeat this command on each node in your Oracle RAC cluster.

    Note:

    Prior to running the crsctl start crs command in step 4, run the following command to verify that all processes except init.cssd fatal are inactive:
    ps -ef|grep cssd
    
  3. Perform the restore by applying an OCR backup file that you identified in Step 1 using the following command where file_name is the name of the OCR that you want to restore. Make sure that the OCR devices that you specify in the OCR configuration exist and that these OCR devices are valid before running this command.

    ocrconfig -restore file_name
    
  4. Start Oracle Clusterware on all the nodes in your Oracle RAC cluster by running the following command as root:

    # crsctl start crs
    

    Repeat this command on each node in your Oracle RAC cluster.

  5. Run the following command to verify the OCR integrity where the -n all argument retrieves a listing of all of the cluster nodes that are configured as part of your cluster:

    cluvfy comp ocr -n all [-verbose]
    

See Also:

Appendix A, "Troubleshooting" for more information about enabling and using CVU
Restoring the Oracle Cluster Registry on Windows-Based Systems

Use the following procedure to restore the OCR on Windows-based systems:

  1. Identify the OCR backups using the ocrconfig -showbackup command. Review the contents of the backup using ocrdump -backupfile file_name where file_name is the name of the backup file.

  2. On all of the remaining nodes, disable the following OCR clients and stop them using the Service Control Panel: OracleClusterVolumeService, OracleCSService, OracleCRService, and the OracleEVMService.

  3. Execute the restore by applying an OCR backup file that you identified in Step 1 with the ocrconfig -restore file name command. Make sure that the OCR devices that you specify in the OCR configuration exist and that these OCR devices are valid.

  4. Start all of the services that were stopped in step 2. Restart all of the nodes and resume operations in cluster mode.

  5. Run the following command to verify the OCR integrity where the -n all argument retrieves a listing of all of the cluster nodes that are configured as part of your cluster:

    cluvfy comp ocr -n all [-verbose]
    

    See Also:

    "Cluster Verification Utility Oracle Clusterware Component Verifications" for more information about enabling and using CVU

Diagnosing OCR Problems with the OCRDUMP and OCRCHECK Utilities

You can use the OCRDUMP and OCRCHECK utilities to diagnose OCR problems as described under the following topics:

Using the OCRDUMP Utility

Use the OCRDUMP utility to write the OCR contents to a file so that you can examine the OCR content.

See Also:

"OCRDUMP Utility Syntax and Options" for more information about the OCRDUMP utility

Using the OCRCHECK Utility

Use the OCRCHECK utility to verify the OCR integrity.

See Also:

Using the OCRCHECK Utility for more information about the OCRCHECK utility

Overriding the Oracle Cluster Registry Data Loss Protection Mechanism

The OCR has a mechanism that prevents data loss due to accidental overwrites. If you configure a mirrored OCR and if the OCR cannot access the two mirrored OCR locations and also cannot verify that the available OCR contains the most recent configuration, then the OCR prevents further modification to the available OCR. The OCR prevents overwriting by prohibiting Oracle Clusterware from starting on the node on which the OCR resides. In such cases, Oracle displays an alert message in either Enterprise Manager, the Oracle Clusterware alert log files, or both.

Sometimes this problem is local to only one node and you can use other nodes to start your cluster database. In such cases, Oracle displays an alert message in Enterprise Manager, the Oracle Clusterware alert log, or both.

However, if you are unable to start any cluster nodes in your environment and if you cannot repair the OCR, then you can override the protection mechanism. This procedure enables you to start the cluster using the available OCR, thus enabling you to use the updated OCR file to start your cluster. However, this can result in the loss of data that was not available at the time that the previous known good state was created.

Note:

Overriding the OCR using this procedure can result in the loss of OCR updates that were made between the time of the last known good OCR update made to the currently-accessible OCR and the time at which you performed this procedure. In other words, running the ocrconfig -overwrite command can result in data loss if the OCR that you are using to perform the overwrite does not contain the latest configuration updates for your cluster environment.

Perform the following procedure to overwrite the OCR if a node cannot start up and if the alert log contains a a CLSD-1009 or CLSD-1011 message.

  1. Attempt to resolve the cause of the a CLSD-1009 or CLSD-1011 message. Do this by comparing the node's OCR configuration (ocr.loc on Unix-based systems and the Registry on Windows-based systems) with other nodes on which Oracle Clusterware is running. If the configurations do not match, then run ocrconfig -repair. If the configurations match, then ensure that the node can access all of the configured OCRs by running an ls command on Unix-based systems or a dir command on Windows-based systems. Oracle issues a warning when one of the configured OCR locations is not available or if the configuration is incorrect.

  2. Ensure that the most recent OCR contains the latest OCR updates. Do this by taking output from the ocrdump command and determine whether it has your latest updates.

  3. If you cannot resolve the CLSD message, then run the command ocrconfig -overwrite to bring up the node.

Administering the Oracle Cluster Registry with OCR Exports

In addition to using the automatically created OCR backup files, you should also export the OCR contents before and after making significant configuration changes, such as adding or deleting nodes from your environment, modifying Oracle Clusterware resources, or creating a database. Do this by using the ocrconfig -export command. This exports the OCR content to a file format.

Using the ocrconfig -export command enables you to restore the OCR using the -import option if your configuration changes cause errors. For example, if you have unresolvable configuration problems, or if you are unable to restart your clusterware after such changed, then restore your configuration using one of the following platform-specific procedures:

Note:

Most configuration changes that you make not only change the OCR contents, configuration changes also cause file and database object creation. Some of these changes are often not restored when you restore the OCR. Do not perform an OCR restore as a correction to revert to previous configurations if some of these configuration changes should fail. This may result in an OCR that has contents that do not match the state of the rest of your system.

Importing Oracle Cluster Registry Content on UNIX-Based Systems

Use the following procedure to import the OCR on UNIX-based systems:

  1. Identify the OCR export file that you want to import by identifying the OCR export file that you previously created using the ocrconfig -export file_name command.

  2. Stop Oracle Clusterware on all of the nodes in your Oracle RAC database by executing the init.crs stop command on all of the nodes.

  3. Perform the import by applying an OCR export file that you identified in Step 1 using the following command where file_name is the name of the OCR file from which you want to import OCR information:

    ocrconfig -import file_name
    
  4. Restart Oracle Clusterware on all of the nodes in your cluster by restarting each node.

  5. Run the following Cluster Verification Utility (CVU) command to verify the OCR integrity where the -n all argument retrieves a listing of all of the cluster nodes that are configured as part of your cluster:

    cluvfy comp ocr -n all [-verbose]
    

Note:

You cannot import an exported OCR backup file, which is described in "Managing Backups and Recovering the OCR Using OCR Backup Files". You must instead use the -import option as described in the following sections.

See Also:

Appendix A, "Troubleshooting" for more information about enabling and using CVU

Importing Oracle Cluster Registry Content on Windows-Based Systems

Use the following procedure to import the OCR on Windows-based systems:

  1. Identify the OCR export file that you want to import by running the ocrconfig -showbackup command. .

  2. Stop the following OCR clients on each node in your Oracle RAC environment using the Service Control Panel: OracleClusterVolumeService, OracleCMService, OracleEVMService, OracleCSService, and the OracleCRService.

  3. Import an OCR export file using the ocrconfig -import command from one node.

  4. Restart all of the affected services on all nodes.

  5. Run the following Cluster Verification Utility (CVU) command to verify the OCR integrity where node_list is a list of all of the nodes in your cluster database:

    cluvfy comp ocr -n all [-verbose] 
    

    See Also:

    Appendix A, "Troubleshooting" for more information about enabling and using CVU

Implementing the Oracle Hardware Assisted Resilient Data Initiative for the OCR

The Oracle Hardware Assisted Resilient Data (HARD) initiative prevents data corruptions from being written to permanent storage. If you enable HARD, then the OCR writes HARD-compatible blocks. To determine whether the device used by the OCR supports HARD and then enable it, review the Oracle HARD white paper at:

http://www.oracle.com/technology/deploy/availability/htdocs/HARD.html

Upgrading and Downgrading the OCR Configuration in Oracle RAC

When you install Oracle Clusterware, Oracle automatically runs the ocrconfig -upgrade command. To downgrade, follow the downgrade instructions for each component and also downgrade the OCR using the ocrconfig -downgrade command. If you are upgrading the OCR to Oracle Database 10g release 10.2, then you can use the cluvfy command to verify the integrity of the OCR. If you are downgrading, you cannot use the Cluster Verification Utility (CVU) commands to verify the OCR for pre-10.2 release formats.

HARD-Compatible OCR Blocks in Oracle9i

In Oracle9i, the OCR did not write HARD-compatible blocks. If the device used by OCR is enabled for HARD, then use the method described in the HARD white paper to disable HARD for the OCR before downgrading your OCR. If you do not disable HARD, then the downgrade operation fails.

Administering Multiple Cluster Interconnects on UINIX-Based Platforms

In Oracle RAC environments that run on UNIX-based platforms, you can use the CLUSTER_INTERCONNECTS initialization parameter to specify an alternative interconnect for the private network.

The CLUSTER_INTERCONNECTS initialization parameter requires the IP address of the interconnect instead of the device name. It enables you to specify multiple IP addresses, separated by colons. Oracle RAC network traffic is distributed between the specified IP addresses.

The CLUSTER_INTERCONNECTS initialization parameter is useful only in a UNIX-based environments where UDP IPC is enabled. The CLUSTER_INTERCONNECTS parameter enables you to specify an interconnect for all IPC traffic to include Oracle Global Cache Service (GCS), Global Enqueue Service (GES), and Interprocessor Parallel Query (IPQ).

Overall cluster stability and performance may improve when you force Oracle GCS, GES, and IPQ over a different interconnect by setting the CLUSTER_INTERCONNECTS initialization parameter. For example, to use the network interface whose IP address is 129.34.137.212 for all GCS, GES, and IPQ IPC traffic, set the CLUSTER_INTERCONNECTS parameter as follows:

CLUSTER_INTERCONNECTS=129.34.137.212

Use the ifconfig or netstat command to display the IP address of a device. This command provides a map between device names and IP addresses. For example, to determine the IP address of a device, run the following command as the root user:

# /usr/sbin/ifconfig -a 
fta0: flags=c63<UP,BROADCAST,NOTRAILERS,RUNNING,MULTICAST,SIMPLEX> 
      inet 129.34.137.212 netmask fffffc00 broadcast 129.34.139.255 ipmtu 1500

lo0:  flags=100c89<UP,LOOPBACK,NOARP,MULTICAST,SIMPLEX,NOCHECKSUM> 
      inet 127.0.0.1 netmask ff000000 ipmtu 4096 

ics0:  flags=1100063<UP,BROADCAST,NOTRAILERS,RUNNING,NOCHECKSUM,CLUIF> 
      inet 10.0.0.1 netmask ffffff00 broadcast 10.0.0.255 ipmtu 7000 

sl0:  flags=10<POINTOPOINT>

tun0: flags=80<NOARP>

In the preceding example, the interface fta0: has an IP address of 129.34.137.212 and the interface ics0: has an IP address of 10.0.0.1.

Bear in mind the following important points when using the CLUSTER_INTERCONNECTS initialization parameter:

  • The IP addresses specified for the different instances of the same database on different nodes must belong to network adapters that connect to the same network. If you do not follow this rule, then inter-node traffic may pass through bridges and routers or there may not be a path between the two nodes at all.

  • Specify the CLUSTER_INTERCONNECTS initialization parameter in the parameter file, setting a different value for each database instance.

  • If you specify multiple IP addresses for this parameter, then list them in the same order for all instances of the same database. For example, if the parameter for instance 1 on node 1 lists the IP addresses of the alt0:, fta0:, and ics0: devices in that order, then the parameter for instance 2 on node 2 must list the IP addresses of the equivalent network adapters in the same order.

  • If the interconnect IP address specified is incorrect or does not exist on the system, then Oracle Database uses the default cluster interconnect device. On Tru64 UNIX for example, the default device is ics0:.

Failover and Failback and CLUSTER_INTERCONNECTS

Some operating systems support run-time failover and failback. However, if you use the CLUSTER_INTERCONNECTS initialization parameter, then failover and failback are disabled.

Note:

Failover and failback and CLUSTER_INTERCONNECTS are not supported on AIX systems.