4 Configuring Oracle Enterprise Manager for Active and Passive Environments

Active and Passive environments, also known as Cold Failover Cluster (CFC) environments, refer to one type of high availability solution that allows an application to run on one node at a time. These environments generally use a combination of cluster software to provide a logical host name and IP address, along with interconnected host and storage systems to share information to provide a measure of high availability for applications.

This chapter contains the following sections:

4.1 Configuring Oracle Enterprise Management Agents for Use in Active and Passive Environments

In a Cold Failover Cluster environment, one host is considered the active node where applications are run, accessing the data contained on the shared storage. The second node is considered the standby node, ready to run the same applications currently hosted on the primary node in the event of a failure. The cluster software is configured to present a Logical Host Name and IP address. This address provides a generic location for running applications that is not tied to either the active node or the standby node.

In the event of a failure of the active node, applications can be terminated either by the hardware failure or by the cluster software. These application can then be restarted on the passive node using the same logical host name and IP address to access the new node; resuming operations with little disruption. Automating failover of the virtual host name and IP, along with starting the applications on the passive node, requires the use of the third party cluster software. Several Oracle partner vendors provide high availability solutions in this area.

4.1.1 Installation and Configuration

Enterprise Manager can be configured to support this Cold Failover Cluster configuration using the existing Management Agents communicating to the Oracle Management Service processes.

If your application is running in an Active and Passive environment, the clusterware does the job of bringing up the passive or standby database instance in case the Active database goes down. For Enterprise Manager to continue monitoring the application instance in such a scenario, the existing Management Agents need additional configuration.

The additional configuration steps for this environment involve:

  • Installing of an extra Management Agent using the logical host name and IP address generated through the cluster software.

  • Modifying the targets monitored by each Management Agent once the third Management Agent is installed.

In summary, this configuration results in the installation of three Management Agents, one for each hardware node and one for the IP address generated by the cluster software. Theoretically, if the cluster software supports the generation of multiple virtual IP addresses to support multiple high availability environments, the solution outlined here should scale to support the environment.

The following table documents the steps required to configure agents in a CFC environment:

Table 4-1 Steps Required to Configure Management Agents in a Cold Failover Cluster Environment

Action Method Description/Outcome Verification

Install the vendor specific cluster software

Installation method varies depending on the cluster vendor.

The minimal requirement is a 2-node cluster that supports Virtual or Floating IP addresses and shared storage.

Use the ping command to verify the existence of the floating IP address.

Use nslookup or equivalent command to verify the IP address in your environment.

Install Management Agents to each physical node of the cluster using the physical IP address or host name as the node name.

Use the Oracle Universal Installer (OUI) to install Management Agents to each node of the cluster.

When complete, the OUI will have installed Management Agents on each node that will be visible through the Grid Control console.

Check that the Management Agent, host, and targets are visible in the Enterprise Manager environment.

Delete targets that will be configured for high availability using the cluster software.

Using the Grid Control console, delete all targets discovered during the previous installation step that are managed by the cluster software except for the Management Agent and the host.

Grid Control Console displays the Management Agent, hardware, and any target that is not configured for high availability.

Inspect the Grid Control console and verify that all targets that will be assigned to the Management Agent running on the floating IP address have been deleted from the Management Agents monitoring the fixed IP addresses.

Install a third Management Agent to the cluster using the logical IP address or logical host name as the host specified in the OUI at install time.

Note: This installation should not detect or install to more than one node.

This Management Agent must follow all the same conventions as any application using the cluster software to move between nodes (that is, installed on the shared storage using the logical IP address).

This installation requires an additional option to be used at the command line during installation time. The 'HOSTNAME' flag must be set as in the following example:

(/144)-

>./runInstaller HOSTNAME=<Logical IP address or hostname>

Third Management Agent installed, currently monitoring all targets discovered on the host running physical IP.

To verify the Management Agent is configured correctly, type emctl status agent at the command line and verify the use of the logical IP virtual host name. Also, verify that the Management Agent is set to the correct Management Service URL and that the Management Agent is uploading the files.

When the Management Agent is running and uploading data, use the Grid Control console to verify that it has correctly discovered targets that will move to the standby node during a failover operation.

Delete any targets from the Management Agent monitoring the logical IP that will not switch to the passive node during failover.

Use the Grid Control console to delete any targets that will not move between hosts in a switchover or failover scenario. These might be targets that are not attached to this logical IP address for failover or are not configured for redundancy.

Grid Control console is now running three Management Agents. Any target that is configured for switchover using cluster software will be monitored by a Management Agent that will transition during switchover or failover operations.

The operation is also verified by inspecting the Grid Control console. All targets that will move between nodes should be monitored by the Management Agent running on the virtual host name. All remaining targets should be monitored by a Management Agent running on an individual node.

Add the new logical host to the cluster definition.

Using the All Targets tab on the Grid Control console, find the cluster target and add the newly discovered logical host to the existing cluster target definition.

It is also possible (not required) to use the Add Cluster Target option on the All Targets tab, making a new composite target using the nodes of the cluster.

The Grid Control console will now correctly display all the hosts associated with the cluster.

Place the Management Agent process running on the logical IP under the control of the cluster software.

This will vary based on the cluster software vendor.

Management Agent will transition along with applications.

A suggested order of operation is covered in the next section.

Verify that the Management Agent can be stopped and restarted on the standby node using the cluster software.


4.1.2 Switchover Steps

Each cluster vendor will implement the process of building a wrapper around the steps required to do a switchover or failover in a different fashion. The steps themselves are generic and are listed here:

  • Shut down the Management Agent

  • Shut down all the applications running on the virtual IP and shared storage

  • Switch the IP and shared storage to the new node

  • Restart the applications

  • Restart the Management Agent

Stopping the Management Agent first, and restarting it after the other applications have started, prevents Enterprise Manager from triggering any false target down alerts that would otherwise occur during a switchover or failover.

4.1.3 Performance Implications

While it is logical to assume that running two Management Agent processes on the active host may have some performance implications, this was not shown during testing. Keep in mind that if the Management Agents are configured as described in this chapter, the Management Agent monitoring the physical host IP will only have two targets to monitor. Therefore the only additional overhead is the two Management Agent processes themselves and the commands they issue to monitor a Management Agent and the operating system. During testing, it was noticed that an overhead of between 1-2% of CPU usage occurred.

4.1.4 Summary

Generically, configuring Enterprise Manager to support Cold Cluster Failover environments encompasses the following steps.

  • Install a Management Agent for each virtual host name that is presented by the cluster and insure that the Management Agent is correctly communicating to the Management Service.

  • Configure the Management Agent that will move between nodes to monitor the appropriate highly available targets.

  • Verify that the Management Agent can be stopped on the primary node and restarted on the secondary node automatically by the cluster software in the event of a switchover or failover.

4.2 Using Virtual Host Names for Active and Passive High Availability Environments in Enterprise Manager Database Control

This section provides information to database administrators about configuring an Oracle Database release 10g in Cold Failover Cluster environments using Enterprise Manager Database Control.

The following conditions must be met for Database Control to service a database instance after failing over to a different host in the cluster:

  • The installation of the database must be done using a Virtual IP address.

  • The installation must be conducted on a shared disk or volume which holds the binaries, configuration, and runtime data (including the database).

  • Configuration data and metadata must also failover to the surviving node.

  • Inventory location must failover to the surviving node.

  • Software owner and time zone parameters must be the same on all cluster member nodes that will host this database.

The following items are configuration and installation points you should consider before getting started.

  • To override the physical host name of the cluster member with a virtual host name, software must be installed using the parameter ORACLE_HOSTNAME.

  • For inventory pointer, software must be installed using the command parameter invPtrLoc to point to the shared inventory location file, which includes the path to the shared inventory location.

  • The database software, the configuration of the database, and Database Control are done on a shared volume.

4.2.1 Set Up the Alias for the Virtual Host Name and Virtual IP Address

You can set up the alias for the virtual host name and virtual IP address by either allowing the clusterware to set it up automatically or by setting it up manually before installation and startup of Oracle services. The virtual host name must be static and resolvable consistently on the network. All nodes participating in the setup must resolve the virtual IP address to the same host name. Standard TCP tools similar to nslookup and traceroute commands can be used to verify the set up.

4.2.2 Set Up Shared Storage

Shared storage can be managed by the clusterware that is in use or you can use any shared file system volume as long as it is not unsupported. The most common shared file system is NFS.

4.2.3 Set Up the Environment

Some operating system versions require specific operating system patches to be applied prior to installing release 10gR2 of the Oracle database. You must also have sufficient kernel resources available when you conduct the installation.

Before you launch the installer, specific environment variables must be verified. Each of the following variables must be identically set for the account you are using to install the software on all machines participating in the cluster.

  • Operating system variable TZ, time zone setting. You should unset this prior to the installation.

  • PERL variables. Variables like PERL5LIB should be unset to prevent the installation and Database Control from picking up the incorrect set of PERL libraries.

  • Paths used for dynamic libraries. Based on the operating system, the variables can be LD_LIBRARY_PATH, LIBPATH, SHLIB_PATH, or DYLD_LIBRARY_PATH. These variables should only point to directories that are visible and usable on each node of the cluster.

4.2.4 Ensure That the Oracle USERNAME, ID, and GROUP NAME Are Synchronized on All Cluster Members

The user and group of the software owner should be defined identically on all nodes of the cluster. You can verify this using the following command:

$ id -a
uid=1234(oracle) gid=5678(dba)groups=5678(dba)

4.2.5 Ensure That Inventory Files Are on the Shared Storage

To ensure that inventory files are on the shared storage, follow these steps:

  • Create you new ORACLE_HOME directory.

  • Create the Oracle Inventory directory under the new Oracle home

    cd <shared oracle home>
    mkdir oraInventory
    
    
  • Create the oraInst.loc file. This file contains the Inventory directory path information required by the Universal Installer:

    1. vi oraInst.loc

    2. Enter the path information to the Oracle Inventory directory and specify the group of the software owner as the dba user. For example:

      inventory_loc=/app/oracle/product/10.2/oraInventory inst_group=dba
      

4.2.6 Start the Installer

To start the installer, point to the inventory location file oraInst.loc, and specify the host name of the virtual group. The debug parameter in the example below is optional:

run installer -invPtrloc /app/oracle/share1/oraInst.loc ORACLE_HOSTNAME=lxdb.acme.com -debug

4.2.7 Start Services

You must start the services in the following order:

  1. Establish IP address on the active node

  2. Start the TNS listener

  3. Start the database

  4. Start dbconsole

  5. Test functionality

In the event that services do not start, do the following:

  1. Establish IP on failover box

  2. Start TNS listener

    lsnrctl start
    
    
  3. Start the database

    dbstart
    
    
  4. Start Database Control

    emctl start dbconsole
    
    
  5. Test functionality

To manually stop or shutdown a service, follow these steps:

  1. Stop the application.

  2. Stop Database Control

    emctl stop dbconsole
    
    
  3. Stop TNS listener

    lsnrctl stop
    
    
  4. Stop the database

    dbshut
    
    
  5. Stop IP