Skip Headers

Oracle Real Application Clusters Guard Concepts and Administration Guide
Release 3.2.1 for Windows NT and Windows 2000

Part Number A95197-01
Go To Table Of Contents
Contents
Go To Index
Index

Go to previous page Go to next page

2
Concepts

Oracle Real Application Clusters software provides a high level of availability through its multi-instance implementation of the Oracle database server. Oracle Real Application Clusters Guard helps you to configure Oracle Real Application Clusters databases into an MSCS cluster. When you do so, Oracle Real Application Clusters Guard, along with Oracle Real Application Clusters and MSCS, works to monitor and maintain the availability of nodes and cluster resources that you configure into an MSCS cluster in units called groups. Oracle Real Application Clusters supports two types of deployments:

The concepts in this chapter apply to both methods of deployment; special considerations for primary/secondary deployments are discussed in Section 2.4.

Before you begin to configure an Oracle Real Application Clusters database into an MSCS cluster, it is helpful to understand the concepts and policies that govern how configuration enhances high availability. This chapter discusses the following topics concerning cluster concepts and policies for maintaining high availability:

Topic  Reference 

Cluster Resources, Groups, and Virtual Addresses 

Section 2.1 

Monitoring the State of Cluster Components 

Section 2.2 

Maintaining Availability When Components Fail 

Section 2.3 

Considerations for Primary/Secondary Instance Deployment 

Section 2.4 

2.1 Cluster Resources, Groups, and Virtual Addresses

When a cluster node becomes unavailable, its cluster resources (for example, shared-nothing cluster disks, Oracle database instances and applications, and IP addresses) are failed over (moved) to an available node in units called groups. Clients request access to the resources in those groups at a node-independent network address called a virtual address. The following sections describe cluster resources, groups, and virtual addresses.

2.1.1 Cluster Resources

An MSCS cluster resource is any physical or logical component that is available to a computing system and has the following characteristics:

Because an MSCS resource exists on only one node at a time, Oracle Real Application Clusters databases are not considered cluster resources; however, the database instances and the components used by the database (listener and network name) are.

2.1.2 Resource Types

Each cluster resource is associated with a resource type, and each resource type (Oracle Real Application Clusters database instance, listener, network name, and IP address) is associated with a resource dynamic-link library (DLL) and is managed in the cluster environment using this resource DLL. There are standard MSCS resource DLLs as well as custom Oracle resource DLLs. The same resource DLL may support several different resource types.

For example, when you use Oracle Real Application Clusters Guard to configure an Oracle Real Application Clusters database into an MSCS cluster, Oracle Real Application Clusters Guard creates several database instance resources (one for each instance associated with the database) and Oracle listener resources.

The Oracle Real Application Clusters database instance resource DLLs (FsResOdbs.dll and FsResOPSInstEx.dll) provide functions that allow MSCS to check the status of the database instances, bring them online, or take them offline, and display their properties in MSCS.

See also:

2.1.3 Groups

A group is a logical collection of cluster resources that forms a minimal unit of failover. During a group failover, a group of cluster resources is moved from one cluster node to another cluster node. A group is owned by only one cluster node at a time. All cluster resources required for a given workload reside in the same group. Oracle Real Application Clusters Guard provides the Configure Database Wizard to help you to configure each Oracle Real Application Clusters database instance into a group. Each group created for an Oracle Real Application Clusters database instance includes the following resources:

The Oracle Real Application Clusters Guard Manager displays two types of group folders: Groups and Instance Groups. MSCS creates a group for each disk resource and for the cluster. Oracle Real Application Clusters Guard creates an instance group for each instance associated with the database when you configure an Oracle Real Application Clusters database into an MSCS cluster. Groups include both the groups included in the Instance Groups folder and groups created by MSCS. Commands and property sheets for both types of groups are the same.

Note that the raw disk partitions that Oracle Real Application Clusters databases use to store data, redo, and log files, are not considered cluster resources. These disks must be accessible to all database instances (and therefore, all cluster nodes) concurrently; cluster resources are accessible to only one cluster node at a time.

2.1.3.1 Preferred Owner Node

Each group has a preferred owner node. The preferred owner node for a group containing an Oracle Real Application Clusters database instance (an instance group) is the node on which the instance exists and is the only node on the cluster on which the instance can come online. There is one and only one preferred owner node for a group containing an Oracle Real Application Clusters Guard instance. Therefore, if the group containing a database instance fails over, the instance is not brought online on the failover node.

2.1.3.2 Possible Owner Nodes

Each group also has a set of possible owner nodes. The possible owner nodes for a group containing an Oracle Real Application Clusters database instance is any cluster node where Oracle Services for MSCS is installed, less any nodes you explicitly remove from the set using Oracle Real Application Clusters Guard Manager.

2.1.3.3 Resource Dependencies

When you configure an Oracle Real Application Clusters database into an MSCS cluster, Oracle Real Application Clusters Guard Manager helps you to create a group for each database instance, requests information about and adds one or more node-independent addresses (called virtual addresses), and automatically adds an Oracle Net listener to each group for you. When Oracle Real Application Clusters Guard adds these resources to the group, it sets up a relationship among them called resource dependencies. The resource dependencies define the order in which the cluster software brings the resources offline and online.

As shown in Figure 2-1, in a group containing an Oracle Real Application Clusters database instance, there is a dependency between the instance and the IP address. In addition, the listener has a dependency on the network name, which has a dependency on the IP address. Therefore, if a node fails, the Oracle database instance resource and the listener resource will be brought offline first, followed by the network name resource, and then the IP address resource. On the node to which the group fails over (the failover node), the order is reversed; MSCS brings the IP address resource online first, followed by the network name resource. Neither the database instance nor the listener is brought online on the failover node, because each can run only on the node on which it was created. (However, if a group were simply taken offline and then placed online on its current node, then the database instance resource and listener resource would be brought online after the IP address and network name resources had been brought online.)

Figure 2-1 Oracle Real Application Clusters Guard Resource Dependencies


Text description of dependencies1.gif follows.
Text description of the illustration dependencies1.gif

2.1.4 Virtual Addresses

A virtual address is a network address at which running resources in a group can be located, regardless of the cluster node hosting those resources. A virtual address provides a constant node-independent network location that allows clients to easily locate resources without needing to know which physical cluster node is hosting those resources.

Groups move from an unavailable node to an available one after a node fails or a virtual address fails (and cannot be restarted on its current node) in an operation called failover. You identify a virtual address for a group in Oracle Real Application Clusters Guard Manager by specifying a unique network name and IP address for each group. The Configure Database Wizard in Oracle Real Application Clusters Guard Manager helps you to specify one or more virtual addresses for each database instance. Figure 2-2 shows the dialog box that helps you add one or more virtual addresses to a group.

Figure 2-2 Configure Database Wizard - Virtual Address Dialog Box


Text description of addvirtualaddress.gif follows.
Text description of the illustration addvirtualaddress.gif

The virtual addresses in the group makes the group a virtual server. Although at least one virtual address per group is required for client access, you can assign multiple virtual addresses to a group. You might assign multiple virtual addresses to provide increased bandwidth.

Each group appears to users and client applications as a highly available virtual server, independent of the physical identity of one particular node. To access the resources in a group, clients always access the group by connecting to the virtual address of a group. To the client, the virtual server is the interface to the cluster resources and looks like a physical node.

Figure 2-3 shows a four-node cluster with one instance group configured on each node. Clients access these groups through Virtual Server A, B, C, and D. By accessing the cluster resources through the virtual address of a group, as opposed to the physical address of an individual node, you ensure a quick connection to an available database instance even when the requested instance is not available. The process by which a quick remote connection is ensured is described in Section 2.3.1.1.

Figure 2-3 Accessing Cluster Resources Through a Virtual Server


Text description of oracg_virtualserver.gif follows.
Text description of the illustration oracg_virtualserver.gif

See Section 3.7 for details on the network configuration and virtual address for Oracle Real Application Clusters databases configured in an MSCS cluster.

2.2 Monitoring the State of Cluster Components

Monitoring the state of components in an MSCS cluster is key to maintaining high availability. MCSC monitors the state of cluster nodes and cluster resources. Data it collects on Oracle Real Application Clusters database instances is communicated to Oracle Real Application Clusters Guard, so that it can monitor and evaluate the state of the database overall. The following sections describe the following:

2.2.1 How Cluster Nodes Are Monitored

The Windows systems that are members of a cluster are called cluster nodes. The cluster nodes are joined together through a shared storage interconnect as well as an internode network connection.

The private interconnect, sometimes referred to as a heartbeat connection or an internode network connection, allows one node to detect the availability or unavailability of another node. Typically, a private interconnect (that is distinct from the public network connection used for user and client application access) is used for this communication. If one node fails, the cluster software immediately fails over the groups from the unavailable node to an available node, and restarts the group's virtual address on an available node. Clients reconnect to a database instance through connect-time failover.

2.2.2 How Cluster Resources Are Monitored

MSCS monitors the state of cluster resources (Oracle Real Application Clusters database instances, listeners, IP addresses, and network names) by polling the resources are regular intervals to determine if they are running, failed, or in the case of database instances, possibly hung. As shown in Figure 2-4 and Figure 2-5, you can set the parameters for how often each type of polling is performed and the amount of time that can pass without a response from the poll before it is considered to have failed, as follows:

In addition, for non-database resources (resources other than database instances), you can specify the restart policy for the resource, which is defined when you select one of the following:

You cannot change the resource failover policy. It is set as required by Oracle Real Application Clusters Guard to maintain high availability of all components. See Section 2.3 for details.

The restart policy for database instances is specified for all database instances rather than one instance at a time, so that the policy can be evaluated and applied across all database instances as a whole. (The resource failover policy for an Oracle Real Application Clusters database instance is always "If the resource is not restarted, do not fail over the group.")

Figure 2-4 shows the Policies property page for a non-database cluster resource, namely, an Oracle TNS listener. (Specifying the "Use value from resource type" option indicates that you want to use the default values that are set in MSCS. To view the default values, open MSCS Cluster Administrator, select Resource Types from the tree view, right-click Oracle Real Application Clusters Instance in the right pane, and then click Properties.)

Figure 2-5 shows the Policies property page for an Oracle Real Application Clusters database instance.

MSCS provides the results of Is Alive polling of each database instance to Oracle Real Application Clusters Guard so that it can monitor the status of the Oracle Real Application Clusters database as a whole. Section 2.2.3 describes how Oracle Real Application Cluster databases are monitored. Section 2.3.2 describes how the restart policy for database instances is specified and applied.

Figure 2-4 Policies Property Page for Non-Database Resources


Text description of ops_res_props.gif follows.
Text description of the illustration ops_res_props.gif

Figure 2-5 Policies Property Page for an Oracle Real Application Clusters Database Instance


Text description of ops_db_policies.gif follows.
Text description of the illustration ops_db_policies.gif

2.2.3 How Oracle Real Application Clusters Databases Are Monitored

A global monitor component of Oracle Real Application Clusters Guard manages issues and policies that affect the database instances as a whole, such as policies that determine if and when failed instances are restarted and parameters for database instance hang detection and termination of hung instances. MSCS communicates the status of each Oracle Real Application Clusters database instance to the monitor so that the monitor has a global view of all of the database instances on the system.

Figure 2-6 shows a three-node cluster that includes nodes ntclu41, ntclu42, and ntclu43. An Oracle Real Application Clusters database, MyDB, has been configured into the cluster using Oracle Real Application Clusters Guard. The status of each database instance contained within a group is reported to the global monitor, currently on ntclu42.

Figure 2-6 Group Configuration for Oracle Real Application Clusters


Text description of ops_config.gif follows.
Text description of the illustration ops_config.gif

If one or more of the instances fails or hangs as detected through MSCS Is Alive polling, the problem is reported to the global monitor. The database hang detection, termination, and restart policies determine what should be done with an unresponsive or failed instance. Section 2.3.2 describes how these policies are applied to Oracle Real Application Clusters database instances.

2.3 Maintaining Availability When Components Fail

As with monitoring, the response to an unavailable node, non-database cluster resource, or Oracle Real Application Clusters database instances are each handled a little differently. However, the object in all cases is to maintain the availability of the Oracle Real Application Clusters database to clients. The following sections describe how failures are handled and availability is restored when any one of these components becomes unavailable.

2.3.1 Listener, Virtual Address, or Cluster Node Failure

Availability to the database instance associated with a listener, virtual address, or cluster node is maintained by failing over the group containing the resources, rerouting the client request using an operation called connect-time failover, or both, as follows:

2.3.1.1 Connect-Time Failover

A connect-time failover is a process by which a client connect request is forwarded to another listener if the first listener is not responding or if the database instance associated with that listener is unavailable. Clients that want to connect to any instance of an unconfigured Oracle Real Application Clusters database can take advantage of connect-time failover to ensure that they can connect to the database as long as at least one instance is running.

However, a significant delay can occur during connect-time failover for an unconfigured Oracle Real Application Clusters database due to TCP/IP timeout. If a node fails and new connection requests are made to that node's IP address, the connection request will wait the duration of the TCP/IP timeout interval to connect to an instance on a running node.

When you configure an Oracle Real Application Clusters database into an MSCS cluster using Oracle Real Application Clusters Guard, the TCP/IP timeout is avoided for new connections as long as the virtual address associated with the instance is available. If the virtual address is up and running, new requests for an instance on a failed node do not wait the duration of the timeout period. Requests for the connection are refused immediately and are routed transparently to another instance.

Oracle Real Application Clusters Guard keeps the virtual address associated with an instance running as follows:

Oracle Real Application Clusters Guard, therefore, provides must faster connect-time failover by ensuring that the virtual address is available and thus eliminating TCP/IP timeout delays.

Figure 2-7 illustrates a connection request to database db.us.acme.com, when an entry in tnsnames.ora file appears as follows:

db.us.acme.com= 
 (description= 
  (load_balance=on)
  (failover=on)
  (address_list=
   (address=(protocol=tcp)(host=138.2.26.155)(port=1521))
   (address=(protocol=tcp)(host=138.2.26.156)(port=1521))) 
  (connect_data=
     (service_name=op.us.acme.com)))

  1. Because virtual address 138.2.26.155 appears first in the tnsnames.ora address list, the client sends a connection request to virtual server ntclu-155.

  2. Because the listener and associated instance have failed, the request for a connection is immediately rejected.

  3. The client request is rerouted to the next address in the tnsnames.ora address list.

  4. Listener for instance 2 receives the request.

  5. The request is routed to instance 2.

Figure 2-7 Enhanced Connect-Time Failover


Text description of ops_va_reject.gif follows.
Text description of the illustration ops_va_reject.gif

2.3.1.2 Group Failover Policy

The group failover policy specifies the number of times during a given time period that the cluster software should allow the group to fail over before that group is taken offline. The failover policy provides a means to prevent a group from failing over repeatedly.

Values for group failover policy options are set to default values when you use the Oracle Real Application Clusters Guard Manager Configure Database Wizard. However, you can reset the values in these policy options with the Group Failover property page, shown in Figure 2-8. (To access this page, select the group of interest in the Oracle Real Application Clusters Guard Manager tree view and then click the Failover tab.)

Figure 2-8 Group Failover Property Page


Text description of failover_policy.gif follows.
Text description of the illustration failover_policy.gif

Figure 2-8 shows the page for setting group failover policy.

The group failover policy consists of a failover threshold and a failover period:

For example, if the group failover threshold is 3 and the failover period is 5, the cluster software allows the group to fail over 3 times within 5 hours before discontinuing failovers for that group.

When the first group failover occurs, a timer to measure the failover period is set to 0 and a counter to measure the number of failovers is set to 1. The timer is not reset to 0 when the failover period expires. Instead, the timer is reset to 0 when the first failover occurs after the failover period has expired.

For example, assume again that the group failover period is 5 hours and the failover threshold is 3. As shown in Figure 2-9, when the first group failover occurs at point A, the timer is set to 0. Assume a second group failover occurs 4.5 hours later at point B, and the third group failover occurs at point C. The failover period has been exceeded when the third group failover occurs. Therefore, at point C, group failovers are allowed to continue, the timer is reset to 0, and the counter is reset to 1.

Figure 2-9 Failover Threshold and Failover Period Timeline


Text description of failovex1.gif follows.
Text description of the illustration failovex1.gif

Assume that another group failover occurs at point D. If you look at the entire timeline, you might expect that group failovers will be discontinued. The group failovers at points B, C, and D have occurred within a 5-hour timeframe. However, because the timer for measuring the failover period was reset to 0 at point C, the failover threshold has not been exceeded, and the cluster software allows the group to fail over.

Assume that another group failover occurs at point E. When a problem that ordinarily results in a group failover occurs at point F, the cluster software does not fail over the group. Three failovers have occurred during the 5-hour period that has passed since the timer was reset to 0 at point C. The cluster software leaves the group on the current node in a failed state.

2.3.1.3 Repeated Failovers

Sometimes group failovers occur more frequently than desired. For example, suppose a Northeast database instance resource is in a group called Customers_NodeA, and you specify the following:

2.3.2 Instance Failure or Hang

Oracle Real Application Clusters Guard uses a series of database policies to determine if a database instance has failed or is hanging, and if so, how to resolve the problem.

A quick check on the instance is done through Looks Alive polling. Looks Alive polling checks the health of the instance by confirming that the service is running. A more thorough check of the database is also performed at regular intervals (and if Looks Alive polling fails), as follows:

The flow chart in Figure 2-10 illustrates this process.

Figure 2-10 Failure and Hang Detection and Resolution


Text description of ops_detect.gif follows.
Text description of the illustration ops_detect.gif

The following sections describe the restart, hang detection, and termination policies in detail. These policies are the same regardless of whether the deployment is a default n-node deployment or a primary/secondary deployment. However, the interpretation of the restart policy is different for a primary/secondary deployment. For details on how the instance restart policy is interpreted for a primary/secondary deployment, see Section 2.4.1.


Note:

You can write a script that Oracle Real Application Clusters Guard will run when certain database instance state changes occur, such as when a database is placed online, taken offline, or terminated. For more information, see Section 3.6


2.3.2.1 Instance Restart Policy

If Is Alive polling for an instance returns a failure status, Oracle Real Application Clusters Guard assumes that the instance has failed (or been terminated) and applies the database restart policy. The restart policy for Oracle Real Application Clusters database instances configured into an MSCS cluster with Oracle Real Application Clusters Guard are specified and managed at the database level, rather than at the instance resource level. This means that Oracle Real Application Clusters Guard, rather than the cluster software (MSCS), manages Oracle Real Application Clusters database instance restart policy.

Therefore, when Is Alive polling detects an instance failure, the instance is left in a failed state by the cluster software. However, the Oracle Real Application Clusters Guard global monitor is notified of the problem, and it examines the overall database restart policy to determine if the instance should be restarted. If so, the global monitor calls the cluster software to attempt to bring the instance online. (Under no circumstance of instance failure or hang is the group failed over to another node.)

As shown in Figure 2-11, there are three database restart options:

Figure 2-11 Instance Restart Policy Page at Database Level


Text description of ops_db_instance_policy.gif follows.
Text description of the illustration ops_db_instance_policy.gif

2.3.2.2 Hang Detection

When an instance is unresponsive (as defined by a lack of response from the Is Alive query within the Pending timeout period), Oracle Real Application Clusters Guard checks several parameters to determine whether the unresponsiveness is due to a database instance hang, or an event that is more processing-intensive than most. If Oracle Real Application Clusters Guard determines that an Oracle Real Application Clusters database instance is hung, Oracle Real Application Clusters Guard may terminate one or more instances (in an effort to resolve the problem) based on the database termination policy. Oracle Real Application Clusters Guard uses the termination policy to determine if, when, and how many instances it can terminate in an effort to resolve instance hangs.

Oracle Real Application Clusters Guard checks for several processing-intensive events by executing a query designed to determine if a specified event is occurring. Events that Oracle Real Application Clusters Guard checks for include logon storms, parse storms, instance recovery, lock remastering, and stuck archiver.

In the Oracle Real Application Clusters Guard Manager Hang Detection property page, you can specify the parameters for what is considered a logon storm or parse storm, as follows:

You can also specify whether Oracle Real Application Clusters Guard should check for each of the listed processing-intensive events by selecting or clearing the check box next to each event. Each event has an associated timeout value that you can adjust in the Oracle Real Application Clusters Guard Manager Hang Detection property page, as shown in Figure 2-12. If a query for an event does not return success or failure within the specified timeout period, then Oracle Real Application Clusters Guard checks for the next selected event. The following list describes each of the timeout events:

Figure 2-12 Oracle Real Application Clusters Hang Detection Property Page


Text description of ops_hang_policies.gif follows.
Text description of the illustration ops_hang_policies.gif

When you set timeout values, consider that if success or failure is not returned for any event presented in the preceding list, an instance can be unresponsive for a time period equal to the sum of all the timeout values (maximum total timeout) before Oracle Real Application Clusters Guard takes further action. For example, if the timeout value for each event is 300 seconds (5 minutes), it is possible that the instance (or instances) will be unresponsive for 1500 seconds (25 minutes) before Oracle Real Application Clusters Guard applies the database termination policy. Conversely, if the timeout value for each event is set too low, an instance might be erroneously deemed hung and terminated when it is not hung.

If Oracle Real Application Clusters Guard determines that an instance is hung, then it applies the database termination policy, as described in Section 2.3.2.3.

2.3.2.3 Terminating Hung Instances - Termination Policy

Once Oracle Real Application Clusters Guard determines that an instance is hung, the monitor applies the termination policy set for the Oracle Real Application Clusters database, as shown in Figure 2-13.

Figure 2-13 Termination Policy


Text description of ops_termination.gif follows.
Text description of the illustration ops_termination.gif

There are two basic termination policy options:

2.3.3 Client Reconnection After Failures

Failures affect those users and applications:

Client applications that are cluster-aware experience a brief interruption in service; to the client applications, it appears that a node was quickly rebooted. In most cases, the means to connect to a running instance is provided automatically--without operator intervention.

See Section 3.11 for information about cluster-aware applications.

2.4 Considerations for Primary/Secondary Instance Deployment

Oracle Real Application Clusters supports a primary/secondary instance deployment. The primary/secondary instance deployment lets you configure a basic two-node high-availability system for Oracle Real Application Clusters. An instance designated as the primary instance on one node accepts user connections, while an instance designated as the secondary instance on the other node accepts connections when the primary node fails, or when specifically selected through the INSTANCE_ROLE parameter in the CONNECT_DATA portion of the tnsnames.ora file.

You specify the primary/secondary deployment by setting the ACTIVE_INSTANCE_COUNT parameter in each instance's initialization parameter file (init<sid>.ora) to 1. In a primary/secondary deployment, the instance that mounts the database first assumes the role of primary instance. The second instance to mount the database assumes the role of secondary instance. If the primary instance is shut down or fails, the secondary instance automatically assumes the primary role. When the failed instance returns to active status, it assumes the role of secondary instance. Figure 2-14 shows the Oracle Real Application Clusters Guard Manager property page that displays the role of an instance.

Figure 2-14 Instance Role


Text description of ops_role_page.gif follows.
Text description of the illustration ops_role_page.gif

The Oracle Net listener enforces the routing of work requests to the primary and secondary instances by using the INSTANCE_ROLE parameter in the CONNECT_DATA portion of the tnsnames.ora file.

All locks are mastered by the primary instance only, which minimizes communication between nodes and improves performance.

2.4.1 Primary/Secondary Deployments and Instance Restart Policy

The instance restart policy for a primary/secondary instance deployment is as described in Section 2.3.2.1. However, the interpretation of this policy is complicated by the instance roles (primary instance role and secondary instance role) and failover operations that might occur in this configuration. The following sections provide examples to describe how the instance roles and instance restart policy interact.

2.4.1.1 Failure of the Primary Instance

During typical operations, the nodes running the primary and secondary instances are up and operational. Group A, containing instance A in the primary role, is running on node A. Group B, containing instance B in the secondary role, is running on node B. If the primary instance fails, but the secondary instance is still running, then the following occurs:

  1. Instance B becomes the primary instance as controlled by Oracle Real Application Clusters, as shown in Figure 2-15. If instance A is restarted, it will assume the secondary role.

  2. Oracle Real Application Clusters Guard initiates the restart policy for the database as follows:

    • Do not restart instances

      Oracle Real Application Clusters Guard leaves instance A in a failed state and stops its associated listener. Instance B has the primary role and instance A (and its listener) remains in a failed state.

    • Restart if no other instance is online

      Because instance B is still running, Oracle Real Application Clusters Guard leaves instance A in a failed state and stops its associated listener. Instance B has the primary role and instance A (and its listener) remains in a failed state. (However, if instance B were to fail, then Oracle Real Application Clusters Guard would restart instance B because there would be no other instance online.)

    • Always restart any instance

      Oracle Real Application Clusters Guard restarts instance A in the secondary instance role. Instance B has the primary role.

Figure 2-15 Primary Instance Failure in Primary/Secondary Configuration


Text description of prim_sec_nfo.gif follows.
Text description of the illustration prim_sec_nfo.gif

2.4.1.2 Failure of the Node Running the Primary Instance

During typical operations, the nodes running the primary and secondary instances are up and operational. Group A, containing instance A in the primary role, is running on Node A. Group B, containing instance B in the secondary role, is running on Node B. If Node A fails, but instance B is still running, then the following occurs:

  1. Instance B assumes the role of primary instance as controlled by Oracle Real Application Clusters.

  2. Group A fails over to Node B, as shown in Figure 2-16, and Oracle Real Application Clusters Guard stops the listener associated with instance A.

  3. Oracle Real Application Clusters Guard initiates the restart policy for the database as follows:

    • Do not restart instances

      Instance A is left in a failed state and Oracle Real Application Clusters Guard stops its associated listener. Using the preceding example, instance B now has the primary role and instance A is in a failed state on Node B.

    • Restart if no other instance is online

      Because the instance on Node B is still running, instance A is left in a failed state on Node B.

    • Always restart any instance

      Oracle Real Application Clusters Guard takes Group A offline, moves it back to Node A, and brings the group back online. Instance A is restarted with the secondary instance role. The role of each instance is the reverse of its original role.

Figure 2-16 Node or Virtual Address Failure in Primary/Secondary Configuration


Text description of prim_sec.gif follows.
Text description of the illustration prim_sec.gif

2.4.1.3 Failure of the Secondary Instance

During typical operations, the nodes running the primary and secondary instances are up and operational. Group A, containing instance A in the primary role, is running on node A. Group B, containing instance B in the secondary role, is running on node B. If the secondary instance fails, but the primary instance is still running, then Oracle Real Application Clusters leaves the roles as they are. Oracle Real Application Clusters Guard applies the restart policy to determine whether or not it should restart the failed secondary instance, as follows:

2.4.2 Managing Primary/Secondary Deployments

This section describes the Oracle Real Application Clusters Guard Manager commands that are available for managing a primary/secondary Oracle Real Application Clusters deployment. These commands allow you to move the primary role to the secondary instance, swap roles between instances, stop the secondary instance, and restore the secondary role to an instance. The commands are commonly used for planned outages (hardware and operating system upgrades) and for recovering from unplanned outages.

Table 2-1 lists the relevant commands available on the Real Application Clusters menu of Oracle Real Application Clusters Guard Manager, their effect, and some common usages.

Note that you can also use the Oracle Real Application Clusters Guard Manager Place Online and Take Offline commands with instances in a primary/secondary configuration. However, when you use these commands, you might have to issue several commands in a specific order to achieve the desired results. For example, to swap roles between instances, you must issue the Take Offline command with the instance that holds the primary role, then issue a Place Online command with that instance. When you use the commands designed specifically for managing a primary/secondary Oracle Real Application Clusters deployment, the swap is made with a single Switchover command.

Table 2-1  Commands for Managing Primary/Secondary Configurations
Command  Effect  Common Use 

Move Primary 

Takes the primary instance offline, resulting in the primary role being moved to the secondary instance. 

To perform maintenance tasks on the primary instance node 

Stop Secondary 

Stops the secondary instance. 

To perform maintenance tasks on the secondary instance node 

Switchover 

Takes the primary instance offline, then brings it online. This reverses the roles assigned to each instance. Issuing this command is equivalent to issuing the Move Primary command followed by the Restore command. 

To return role assignments to their original instances 

Restore 

Brings the secondary instance online. (If both instances are offline, this returns an error message indicating that the instance with the primary role is not online.) 

To reinstate the secondary role to the secondary instance after a Stop Secondary command has been issued

To assign the secondary role to an instance after an unplanned role failover has occurred and the instance that previously held the primary role is now offline 

The following example demonstrates some typical uses of these commands. The role assignments are as follows for the Oracle Real Application Clusters database:

  1. Because you need to perform work on the node where Sales1 is running, you need to take the instance on that node offline. From Oracle Real Application Clusters Guard Manager, select the Oracle Real Application Clusters database from the tree view and issue a Move Primary command:

      Real Application Clusters -> Move Primary

    The Move Primary command takes the primary instance offline, which results in a role failover. Role assignments are now as follows:

    Sales1 - offline
    Sales2 - primary instance
  2. You complete work on the node where Sales1 was running and bring the node back online. Role assignments are now as follows:

    Sales1 - unknown
    Sales2 - primary instance
  3. To restore Sales1 as the secondary instance, from Oracle Real Application Clusters Guard Manager, you select the Oracle Real Application Clusters database from the tree view and issue a Restore command:

    Real Application Clusters -> Restore

    Role assignments are now as follows:

    Sales1 - secondary instance
    Sales2 - primary instance

    Both instances are now online, but the instance roles are the reverse of their original assignment. You can leave them as they are, particularly if both nodes are the same in terms of processing power and memory. However, for the purposes of this example, assume you want to return the instances to their original roles.

  4. To return the instances to their original roles, from Oracle Real Application Clusters Guard Manager, you select the Oracle Real Application Clusters database from the tree view and issue a Switchover command:

    Real Application Clusters -> Switchover

    Sales2 is taken offline, which results in a role failover, then Sales2 is placed back online and reassigned with the secondary instance role. Role assignments are back to their original instances, as follows:

    Sales1 - primary instance
    Sales2 - secondary instance


Go to previous page Go to next page
Oracle
Copyright © 2001 Oracle Corporation.

All Rights Reserved.
Go To Table Of Contents
Contents
Go To Index
Index