A Troubleshooting the Oracle Clusterware Installation Process

This appendix provides troubleshooting information for installing Oracle Clusterware.

A.1 Install OS Watcher and RACDDT

To address troubleshooting issues, Oracle recommends that you install OS Watcher, and if you intend to install an Oracle RAC database, RACDDT. You must have access to OracleMetaLink to download OS Watcher and RACDDT.

OS Watcher (OSW) is a collection of UNIX/Linux shell scripts that collect and archive operating system and network metrics to aid Oracle Support in diagnosing various issues related to system and performance. OSW operates as a set of background processes on the server and gathers operating system data on a regular basis. The scripts use common utilities such as vmstat, netstat and iostat.

RACDDT is a data collection tool designed and configured specifically for gathering diagnostic data related to Oracle RAC technology. RACDDT is a set of scripts and configuration files that is run on one or more nodes of an Oracle RAC cluster. The main script is written in Perl, while a number of proxy scripts are written using Korn shell. RACDDT will run on all supported UNIX and Linux platforms, but is not supported on any Windows platforms.

OSW is also included in the RACDDT script file, but is not installed by RACDDT. OSW must be installed on each node where data is to be collected.

To download binaries for OS Watcher and RACDDT, go to the following URL:

https://metalink.oracle.com

Download OSW by searching for OS Watcher, and downloading the binaries from the User Guide bulletin. Installation instructions for OSW are provided in the user guide. Download RACDDT by searching for RACDDT, and downloading the binaries from the RACDDT User Guide bulletin.

A.2 General Installation Issues

The following is a list of examples of types of errors that can occur during installation. It contains the following issues:

An error occurred while trying to get the disks
Failed to connect to server, Connection refused by server, or Can't open display
MEMORY_TARGET not supported on this system
Nodes unavailable for selection from the OUI Node Selection screen
Node nodename is unreachable
PROT-8: Failed to import data from specified file to the cluster registry
Time stamp is in the future
YPBINDPROC_DOMAIN: Domain not bound

An error occurred while trying to get the disks: Cause: There is an entry in /etc/oratab pointing to a non-existent Oracle home. The OUI error file should show the following error: "java.io.IOException: /home/oracle/OraHome//bin/kfod: not found" (OracleMetalink bulletin 276454.1); Action: Remove the entry in /etc/oratab pointing to a non-existing Oracle home.

Failed to connect to server, Connection refused by server, or Can't open display

Cause: These are typical of X Window display errors on Windows or UNIX systems, where xhost is not properly configured.

Action: In a local terminal window, log in as the user that started the X Window session, and enter the following command:

$ xhost fully_qualified_remote_host_name

For example:

$ xhost somehost.example.com

Then, enter the following commands, where workstation_name is the host name or IP address of your workstation.

Bourne, Bash, or Korn shell:

$ DISPLAY=workstation_name:0.0
$ export DISPLAY

To determine whether X Window applications display correctly on the local system, enter the following command:

$ xclock

The X clock should appear on your monitor.If the X clock appears, then close the X clock and start Oracle Universal Installer again.

MEMORY_TARGET not supported on this system

Cause: On Linux systems, insufficient /dev/shm size for PGA and SGA.

If you are installing on a Linux system, note that Memory Size (SGA and PGA), which sets the initialization parameter MEMORY_TARGET or MEMORY_MAX_TARGET, cannot be greater than the shared memory file system (/dev/shm) on your operating system.

Action: Increase the /dev/shm mountpoint size. For example:

# mount -t tmpfs shmfs -o size=4g /dev/shm

Also, to make this change persistent across system restarts, add an entry in /etc/fstab similar to the following:

shmfs /dev/shm tmpfs size=4g 0

Nodes unavailable for selection from the OUI Node Selection screen: Cause: Oracle Clusterware is either not installed, or the Oracle Clusterware services are not up and running.; Action: Install Oracle Clusterware, or review the status of your Oracle Clusterware. Consider restarting the nodes, as doing so may resolve the problem.

Node nodename is unreachable

Cause: Unavailable IP host

Action: Attempt the following:

Run the shell command ifconfig -a. Compare the output of this command with the contents of the /etc/hosts file to ensure that the node IP is listed.
Run the shell command nslookup to see if the host is reachable.
As the oracle user, attempt to connect to the node with ssh or rsh. If you are prompted for a password, then user equivalence is not set up properly. Review the section "Configuring SSH on All Cluster Nodes".

PROT-8: Failed to import data from specified file to the cluster registry: Cause: Insufficient space in an existing Oracle Cluster Registry device partition, which causes a migration failure while running rootupgrade.sh. To confirm, look for the error "utopen:12:Not enough space in the backing store" in the log file $ORA_CRS_HOME/log/hostname/client/ocrconfig_pid.log.; Action: Identify a storage device that has 280 MB or more available space. Locate the existing raw device name from /var/opt/oracle/srvConfig.loc, and copy the contents of this raw device to the new device using the command dd.

Time stamp is in the future

Cause: One or more nodes has a different clock time than the local node. If this is the case, then you may see output similar to the following:

time stamp 2005-04-04 14:49:49 is 106 s in the future

Action: Ensure that all member nodes of the cluster have the same clock time.

YPBINDPROC_DOMAIN: Domain not bound

Cause: This error can occur during postinstallation testing when a node public network interconnect is pulled out, and the VIP does not fail over. Instead, the node hangs, and users are unable to log in to the system. This error occurs when the Oracle home, listener.ora, Oracle log files, or any action scripts are located on an NAS device or NFS mount, and the name service cache daemon nscd has not been activated.

Action: Enter the following command on all nodes in the cluster to start the nscd service:

/sbin/service  nscd start

A.3 Missing Operating System Packages On Linux

You have missing operating system packages on your system if you receive error messages such as the following during Oracle Clusterware, Oracle RAC, or Oracle Database installation:

libstdc++.so.5: cannot open shared object file: No such file or directory
libXp.so.6: cannot open shared object file: No such file or directory

Typically, errors such as these occur if you have not fully checked required operating system packages during preinstallation, and failed to confirm that all required packages were installed. Run Cluster Verification Utility (CVU), either from the shiphome mount point (runcluvfy.sh), or from an installation directory (CRS_home/bin). Cluster Verification Utility reports which required packages are missing.

If you have a Linux support network configured, such as the Red Hat network or Oracle Unbreakable Linux support, then you can also use the up2date command to determine the name of the package. For example:

# up2date --whatprovides libstdc++.so.5
compat-libstdc++-33.3.2.3-47.3

A.4 Performing Cluster Diagnostics During Oracle Clusterware Installations

If Oracle Universal Installer (OUI) does not display the Node Selection page, then perform clusterware diagnostics by running the olsnodes -v command from the binary directory in your Oracle Clusterware home (CRS_home/bin on Linux and UNIX-based systems, and CRS_home\BIN on Windows-based systems) and analyzing its output. Refer to your clusterware documentation if the detailed output indicates that your clusterware is not running.

In addition, use the following command syntax to check the integrity of the Cluster Manager:

cluvfy comp clumgr -n node_list -verbose

In the preceding syntax example, the variable node_list is the list of nodes in your cluster, separated by commas.

Note:

If you encounter unexplained installation errors during or after a period when cron jobs are run, then your cron job may have deleted temporary files before the installation is finished. Oracle recommends that you complete installation before daily cron jobs are run, or disable daily cron jobs that perform cleanup until after the installation is completed.

A.5 Interconnect Errors

If you use more than one NIC for the interconnect, then you must use NIC bonding, or the interconnect will fail.

If you install Oracle Clusterware and Oracle RAC, then they must use the same NIC or bonded NIC cards for the interconnect.

If you use bonded NIC cards, then they must be on the same subnet.