5 Making Applications Highly Available Using Oracle Clusterware

This chapter explains how you can extend the high availability of the Oracle Clusterware framework to your applications. You do this by wrapping your applications with Oracle Clusterware commands. That is, you can use the same high availability mechanisms of Oracle Database and Oracle Real Application Clusters (Oracle RAC) to make your custom applications highly available. You can use Oracle Clusterware to monitor, relocate, and restart your applications as described in this chapter under the following topics:

Note:

The Oracle Clusterware API demo is not supported for Windows.

Overview of Using Oracle Clusterware Commands to Enable High Availability

Oracle Clusterware includes a high availability framework that provides an infrastructure to protect any application. Oracle Clusterware ensures that applications that it manages start when the system starts. Oracle Clusterware also monitors the applications to make sure that they are always available. For example, if a process fails, then Oracle Clusterware attempts to restart the process based on scripts that you customize. If a node in the cluster fails, then you can program processes that normally run on the failed node to restart on another node. The monitoring frequency, starting, and stopping of the applications and the application dependencies are configurable.

To make applications highly available, first create an application profile that identifies your application. The application profile uses a second component, an action program, that describes how Oracle Clusterware should monitor your application and how Oracle Clusterware should respond to changes in your application's status. Oracle Database stores application profile attributes in the OCR. The definitions of an application profile, action program, and the other primary Oracle Clusterware high availability components are as follows:

  • Action Program—A program that defines the location of your program, the monitoring frequency, and the start and stop actions that Oracle Clusterware should perform on the application. The start action starts the application, the stop action stops the application, and the monitoring or check action checks the application's status.

  • Application Profile—An Oracle Clusterware resource file that describes the attributes of your application. An application profile influences your application's behavior and it identifies other programs or scripts that Oracle Clusterware should run to perform various actions.

  • Privileges—Access and usage privileges that enable Oracle Clusterware to control all of the components of your application for high availability operations, including the right to start processes under other user identities. Oracle Clusterware must run as a privileged user to control applications with the correct start and stop processes. On Linux and UNIX platforms, this usually implies that Oracle Clusterware must run as the root user and on Windows platforms Oracle Clusterware must run as Administrator.

  • Resource—An entity that Oracle Clusterware manages for high availability such as your application.

    Note:

    A resource in the Oracle Clusterware context is not the same as a resource in the Oracle database sense, such as "Resource Manager." A resource that Oracle Clusterware refers to any entity managed by Oracle Clusterware, including application programs.
  • Resource Dependency—A relationship among resources or applications that implies an operational ordering. For example, during a start operation, parent resources are started before resources that have dependencies. Stopping a resource is prevented if resources that depend on it are running,. However, you can force the termination with the crs_stop -f command in which case all resources that depend on the resource being stopped are stopped first.

  • Oracle Cluster Registry (OCR)—A mechanism that stores configuration information that Oracle Clusterware and other Oracle RAC manageability systems use. The OCR uses a hierarchical name space for key value pairs. Keys and subkeys have enforced user, group, and other permissions.

  • Template—A text file generated by the crs_profile command that contains the default values for application profile attributes.

    See Also:

    Appendix D, "High Availability Oracle Clusterware Command-Line Reference and C API" for more information about using Oracle Clusterware commands to make your applications highly available

Overview of Managing Custom Applications with Oracle Clusterware Commands

You can use Oracle Clusterware commands to start, stop, relocate, and check the status of your custom applications. Do this by defining your application with an application profile. The profile defines attributes that affect how Oracle Clusterware manages your application. Then register your application information in the OCR using the crs_register command. Use the following steps to create an application profile:

  1. Create an application profile by running the crs_profile command. See Table 5-1 for a listing of required and optional entries for the Oracle Clusterware profiles.

  2. Register the application profile using the crs_register command.

  3. Run the crs_start command to initiate the application profile and then Oracle Clusterware runs the start command that you have included in the profile to start your application.

  4. Oracle Clusterware periodically runs the action program command to check an application's status.

  5. In the event of a check or node failure, Oracle Clusterware recovers the application either by restarting it on the current node or by relocating the application to another node.

  6. If you run the crs_stop command to stop the application, then Oracle Clusterware runs the stop action program command to stop it.

You can manage application availability as follows:

  • Specify starting resources during cluster or node start up

  • Restart applications that fail

  • Relocate applications to other nodes if they cannot run in their current location

Full administrative privileges are not required when using Oracle Clusterware. Any user can create resources or applications. However, the creator or owner must grant permission to other users or user groups in order for others to be able to use Oracle Clusterware on those applications. Additionally, profiles that have privileges defined can only be modified by privileged users. The following sections provide further details about application profiles.

Note:

Do not use Oracle Clusterware commands prefixed with crs_ (except for crs_stat) on resources that have names beginning with the prefix ora unless Oracle Support Services asks you to. Instead, use the Server Control (SRVCTL) utility on Oracle resources. You can create resources that depend on resources that Oracle has defined. When creating resources, do not use an ora prefix in the resource name. This prefix is reserved for Oracle use only.

Creating Application Profiles

Application profiles have attributes that define how Oracle Clusterware starts, manages, and monitors applications. One attribute is the location of the action program that Oracle Clusterware uses to manipulate the application. Oracle Clusterware uses the action program to monitor or check the application status and to start and stop it. Oracle Database reads application profiles from files stored in specified locations and stores the information in the OCR. You use Oracle Clusterware commands in profiles to designate resource dependencies and to determine what happens to an application or service when it loses access to a resource on which it depends.

The following section describes profiles in more detail. The recommended method for creating profiles is to use the crs_profile command, which is described in detail in Appendix D, "High Availability Oracle Clusterware Command-Line Reference and C API".

Application Resource Profiles

Attributes are defined by name=value entries in profile files and these entries can be in any order in the file. The following are some of the primary attributes of an application profile:

  • Resources that are required by an application which are defined by settings for the REQUIRED_RESOURCES parameter. Oracle Clusterware relocates or stops an application if a required resource becomes unavailable. Required resources are defined for each node.

  • Rules for choosing the node on which to start or restart an application are defined by settings for the PLACEMENT parameter. The application must be accessible by the nodes that you nominate for placement.

  • A list of nodes to use in order of preference when Oracle Clusterware starts or fails over an application which is defined by settings for the HOSTING_MEMBERS parameter. This list is used if the placement policy defined by the PLACEMENT parameter is favored or restricted.

  • The filenames of application profiles must be in the form resource_name.cap where resource_name is the name that you or the system assigns to an application and cap is the file suffix. The Oracle Clusterware commands in profiles refer to applications by name, such as resource_name, but not by the full filename.

Required and Optional Profile Attributes

Application profiles have optional and required profile attributes. Optional profile attributes may be left unspecified in the profile. Optional profile attributes that have default values are merged at registration time with the values that are stored in the template for that resource type and for the generic template. Default values are derived from the template.

Each resource type has a template file named TYPE_resource_type.cap that is stored in the template subdirectory under the crs directory of the Oracle Clusterware home. A generic template file for values that are used in all types of resources is stored in the same location in the file named TYPE_generic.cap.

Application Profile Attributes

Table 5-1 lists the Oracle Clusterware application profile attributes in alphabetical order. For each attribute, the table shows whether the attribute is required, its default value, and an attribute description.

Table 5-1 Application Profile Attributes

Attribute Required Default Range Description

ACTION_SCRIPT

Yes

None

N/A

The resource-specific script for starting, stopping, and checking a resource. You may specify a full path for the action program file. Otherwise, the default paths are used: CRS_home/crs/script for privileged, and CRS_home/crs/public for public. You may also specify a relative path with this default path as the starting point.

ACTIVE_PLACEMENT

No

0

0, 1

When set to 1, Oracle Clusterware reevaluates the placement of a resource during addition or restart of a cluster node.

AUTO_START

No

restore

N/A

Indicates whether Oracle Clusterware should automatically start a resource after a cluster restart. Valid AUTO_START values are:

  • always—Restarts the resource when the node restarts regardless of the state of the resource when the node stopped.

  • restore—Restores the resource to the same state that it was in when the node went down. If the state of the resource was offline (STATE=OFFLINE, TARGET=OFFLINE) when the node went down, then the resource remains offline when the node comes back up. The resource is started only if it was online before the node went down.

  • never—Oracle Clusterware never restarts the resource regardless of the state of the resource when the node stopped.

Note: Oracle only supports lower-case values for always, restore, and never.

CHECK_INTERVAL

No

60

0 or any positive integer

The time interval, in seconds, between repeated executions of the check entry point of a resource's action program. There can be some overhead associated if you set the check interval to a low value and enable frequent checks. Set to 0 to disable the attribute.

DESCRIPTION

No

Name of the resource

N/A

A description of the resource.

FAILOVER_DELAY

No

0

0 or any positive integer

The amount of time, in seconds, that Oracle Clusterware waits before attempting to restart or fail over a resource. Set to 0 to enable immediate failover.

FAILURE_INTERVAL

No

0

0 or any positive integer

The interval, in seconds, during which Oracle Clusterware applies the failure threshold. If the value is zero (0), tracking of failures is disabled.

FAILURE_THRESHOLD

No

0

0 or any positive integer

The number of failures detected within a specified FAILURE_INTERVAL before Oracle Clusterware marks the resource as unavailable and no longer monitors it. If a resource's check script fails this number of times, then the resource is stopped and set offline. If the value is zero (0), tracking of failures is disabled.

HOSTING_MEMBERS

Sometimes

None

N/A

An ordered list of cluster nodes separated by blank spaces that can host the resource. This attribute is required only if PLACEMENT equals favored or restricted. This attribute must be empty if PLACEMENT equals balanced.

Enter node names as values for the HOSTING_MEMBERS attribute, not virtual host names or physical host names. Use the node names that you used when you installed Oracle Clusterware. The resources that you mention should contain the node name for the node on which they run. Run the olsnodes commands to see your node names. The HOSTING_MEMBERS Oracle Clusterware attribute is set automatically when these Oracle Clusterware resources are created; you do not need to take further action to ensure that the attribute is set.

The node name is usually the same as the physical host name. However, it can be different. For example, when vendor clusterware is present, the Oracle Clusterware nodes are named the same as the vendor clusterware nodes. Not all vendor clusterware implementations use the physical node names as node names. Use the lsnodes command to display vendor clusterware node names. When there is no vendor clusterware, then the Oracle Clusterware node names must be the same as the physical hostname.

NAME

Yes

None

N/A

The name of the application. The application name is a string that contains a combination of letters a-z or A-Z, and digits 0-9. The naming convention is to start with an alphanumeric prefix, such as sky1, and complete the name with an identifier to describe it. The name can contain any platform-supported characters except the exclamation point (!). However, the application name cannot begin with a period.

OPTIONAL_RESOURCES

No

None

N/A

An ordered list of resource names separated by blank spaces that this resource uses during placement decisions. Up to 58 user-defined resources can be listed.

PLACEMENT

No

balanced

N/A

The placement policy (balanced, favored, or restricted) specifies how Oracle Clusterware chooses the cluster node on which to start the resource. Also, see "Application Placement Policies".

REQUIRED_RESOURCES

No

None

N/A

An ordered list of resource names separated by blank spaces that this resource depends on. Each resource to be used as a required resource in this profile must be registered with Oracle Clusterware or the resource's profile registration will fail.

RESTART_ATTEMPTS

No

1

0 or any positive integer

The number of times that Oracle Clusterware attempts to restart a resource on a single cluster node before attempting to relocate the resource. A value of 1 means that Oracle Clusterware only attempts to restart the resource once on a node. A second failure causes an attempt to relocate the resource. If set to 0, there is no attempt to restart but always try to failover.

RESTART_COUNT

 

 

N/A

The counter maintained by the Oracle Clusterware daemon for the number of times that a resource had been restarted. It goes from zero to RESTART_ATTEMPTS. This is also written to the OCR.

SCRIPT_TIMEOUT

No

60

Any positive integer

The maximum time (in seconds) for an action script to execute. An error message is returned if the script does not complete within the time specified. The timeout applies to all action script entry points (start, stop, and check). If you do not specify a value, Oracle Clusterware assumes a default value of 60 seconds.

START_TIMEOUT

 

 

N/A

The maximum time (in seconds) in which a start action script can run. An error message is returned if the script does not complete within the time specified. If you do not specify this attribute or you specify 0 seconds, then Oracle Clusterware uses the SCRIPT_TIMEOUT value.

STOP_TIMEOUT

 

 

N/A

The maximum time (in seconds) in which a stop action script can run. An error message is returned if the script does not complete within the time specified. If you do not specify this attribute or if you specify 0 seconds, then Oracle Clusterware uses the SCRIPT_TIMEOUT value.

TYPE

Yes

None

N/A

Must be set to application.

UPTIME_THRESHOLD

 

 

Any positive integer

The value for UPTIME_THRESHOLD represents the length of time that a resource must be up before Oracle Clusterware considers the resource to be stable. By setting a value for the UPTIME_THESHOLD attribute, you can indicate a resource's stability. The form of this attribute is xd where x is a positive integer and d is one of the following denominations:


s: seconds (the default)
m: minutes
h: hours
d: days
w: weeks

For example: 30 is 30 seconds (because s is the default), 10m is 10 minutes, and so on.


Default Profile Locations

Profiles may be located anywhere and need not be on a cluster-visible file system. Oracle RAC provides default locations for profiles as described in the next subsection. The default location for profiles with root privileges on Linux and UNIX systems, or Administrator privileges on Windows systems is the profile subdirectory under the crs directory of the Oracle Clusterware home. The default location for profiles with non-root or non-Administrator privileges is the public subdirectory under the crs directory of the Oracle Clusterware home. The action script must be located in the same directory on all nodes and must be the same file.

Using Entry Points to Manage Resources Using Oracle Clusterware

You can use entry points to specify how to start a resource, stop a resource, and check a resource. You can implement entry points in various ways, such as by using shell or Perl scripts, C++ functions, or Java functions. Oracle Clusterware has the following entry points:

  • START—The start (online) entry point brings the resource online.

  • STOP—The stop (offline) entry point takes the resource offline.

  • CHECK—The check (monitor) entry point monitors the health of a resource.

Example of Using Oracle Clusterware Commands to Create Application Resources

The example in this section creates an application named postman. Oracle Clusterware uses the script /opt/email/bin/crs_postman to start, stop, and monitor whether the application is running (action_script). Oracle Clusterware checks postman every five seconds as specified by the setting for the check_interval attribute. Oracle Clusterware restarts postman no more than once if it fails. When deciding on which node to place the postman application, Oracle Clusterware considers the value for the optional_resources parameter. If possible, Oracle Clusterware places postman on the same node. Finally, for postman to run, the resource network1 must be running on the same node as specified by the setting for the required_resources parameter. If network1 fails or if it is relocated to another node, then Oracle Clusterware stops or moves the postman application.

Note:

Do not use the Oracle Clusterware commands prefixed with crs_ (except for crs_stat) on resources that have names beginning with the prefix ora unless either Oracle Support Services ask you to, or unless Oracle has certified you as described in https://metalink.oracle.com.

Instead, use the Server Control (SRVCTL) utility on Oracle resources. You can create resources that depend on resources that Oracle has defined. When creating resources, do not use an ora prefix in the resource name. This prefix is reserved for Oracle use only.

You can also use the Oracle Clusterware commands to inspect the configuration and status.

Using the crs_profile Command to Create An Application Resource Profile

To create an action profile, you use the crs_profile command. Example 5-1 uses the crs_profile command to create an application profile for the postman action script that is used to monitor email.

Example 5-1 Using the crs_profile Command to Create an Action Profile

$ crs_profile -create postman -t application -B /opt/email/bin/crs_postman \
-d "Email Application" -r network1 -l application2 \
-a postman.scr -o ci=5,ft=2,fi=12,ra=2

The contents of the application profile file that the example creates are as follows:

NAME=postman
TYPE=application
ACTION_SCRIPT=/oracle/crs/script/postman.scr
ACTIVE_PLACEMENT=0
AUTO_START=always
CHECK_INTERVAL=5
DESCRIPTION=email app
FAILOVER_DELAY=0
FAILURE_INTERVAL=12
FAILURE_THRESHOLD=2
HOSTING_MEMBERS=
OPTIONAL_RESOURCES=application2
PLACEMENT=balanced
REQUIRED_RESOURCES=network1
RESTART_ATTEMPTS=2
SCRIPT_TIMEOUT=60

A good example of an action script is the xclock script, which is a simple action script that is a default binary on all Linux and UNIX platforms. Example 5-2 shows the contents of the xclock action script.

Example 5-2 Action Script Example: xclock

#!/bin/bash
# start/stop/check script for xclock example
# To test this change BIN_DIR to the directory where xclock is based
# and set the DISPLAY variable to a server within your network.
 
 
BIN_DIR=/usr/X11R6/bin
LOG_DIR=/tmp
BIN_NAME=xclock
DISPLAY=yourhost.domain.com:0.0
export DISPLAY
 
if [ ! -d $BIN_DIR ]
then
        echo "start failed"
        exit 2
fi
 
PID1=`ps -ef | grep $BIN_NAME | grep -v grep | grep -v xclock_app | awk '{ print $2 }'`
 
case $1 in
'start')
        if [ "$PID1" != "" ]
        then
           status_p1="running"
        else
           if [ -x $BIN_DIR/$BIN_NAME  ]
           then
             umask 002
             ${BIN_DIR}/${BIN_NAME} & 2>${LOG_DIR}/${BIN_NAME}.log
             status_p1="started"
           else
             echo `basename $0`": $BIN_NAME: Executable not found"
           fi
        fi
 
         echo "$BIN_NAME: $status_p1"
        ;;
 
'stop')
        if [ "${PID1}" != "" ]
        then
           kill -9 ${PID1} && echo "$BIN_NAME daemon killed"
        else
           echo "$BIN_NAME: no running Process!"
        fi
        ;;
'check')
        if [ "$PID1" != "" ]
        then
           echo "running"
           exit 0
        else
           echo "not running"
           exit 1
        fi
        ;;
*)
        echo "Usage: "`basename $0`" {start|stop|check}"
        ;;
esac 

Oracle Clusterware Required Resources List

Oracle Clusterware uses the required resources list, with the placement policy and hosting nodes list, to determine the cluster nodes that are eligible to host an application. Required resources must be ONLINE on the nodes on which the application is running or started.

The failure of a required resource on a hosting node causes Oracle Clusterware to attempt to restart the application on the current node. If RESTART_ATTEMPTS is not set to 0, and if the application cannot start on the current node, then Oracle Clusterware attempts to fail the application over to another node that provides the required resource. Alternatively, Oracle Clusterware stops the application if there is no suitable node. In this case, Oracle Clusterware posts a not restarting event notification.

You can also use required resource lists to start, stop, and relocate groups of interdependent applications when you use the crs_start, crs_stop, or crs_relocate commands with the force (-f) option. In other words, you can configure a set of resources to have other required resources. For example you can configure resources A, B, and C where A and B depend on C. Then if you stop resource C with -force option, this action will stop all three resources. The same is true for the crs_relocate command.

In addition, using the -force option can relocate an online dependency, if necessary, to enable the starting of a resource. For instance, assume that resource VIP_A that has a primary node assignment of node A is running instead on node B. If you perform a crs_start on the instance resource for node A, then this operation will fail. This is because the VIP_A resource is required for the instance on node A and the VIP_A resource is online. However, the resource is on a node where the resource cannot run. Performing a crs_start -f on instance B, however, forces the VIP_A resource to relocate and then start the instance.

Creating VIPs for Applications

If your application is accessed by way of a network, then Oracle recommends that you create a virtual internet protocol address for the application as a dependent resource. In the previous example, network1 is an application VIP address. Create application VIP addresses as follows:

crs_profile –create network1 -t application \
-a CRS_home/bin/usrvip \
-o oi=eth0,ov=138.3.83.78,on=255.255.240.0

Note:

In the case of a user VIP, you must use the usrvip action script that Oracle Database provides in the CRS home/bin directory.

In this example, CRS_home is the home directory for the Oracle Clusterware installation. In addition, eth0 is the name of the public network adapter, 138.3.83.78 which resolves by way of DNS to a new hostname that will locate your application regardless of the node on which it is running. Finally, 255.255.240.0 is the netmask for the public IP address. As the oracle user, register the VIP address with Oracle Clusterware as follows:

crs_register network1

On Linux and UNIX operating systems, the application VIP address script must run as the root user. As the root user, change the owner of the resource as follows:

crs_setperm network1 –o root

As the root user, enable the oracle user to run this script:

crs_setperm network1 –u user:oracle:r-x

As the oracle user, start the VIP address as follows:

crs_start network1

Application Placement Policies

The placement policy specifies how Oracle Clusterware selects a node on which to start an application and where to relocate the application after a node failure. Only cluster nodes on which all of the required resources are available, as listed in an application's profile, are eligible to be considered as hosting nodes for an application. Oracle Clusterware supports the following placement policies:

  • balanced—Oracle Clusterware favors starting or restarting the application on the node that is currently running the fewest resources. A placement that is based on optional resources is considered first. Next, the host with the fewest resources running is chosen. If no node is favored by these criteria, then any available node is chosen.

  • favored—Oracle Clusterware refers to the list of nodes in the HOSTING_MEMBERS attribute of the application profile. Only cluster nodes that are in this list and that satisfy the resource requirements are eligible for placement consideration. Placement due to optional resources is considered first. If no node is eligible based on optional resources, then the order of the hosting nodes determines which node runs the application. If none of the nodes in the hosting node list are available, then Oracle Clusterware places the application on any available node. This node may or may not be included in the HOSTING_MEMBERS list.

  • restricted—Similar to favored except that if none of the nodes on the hosting list are available, then Oracle Clusterware does not start or restart the application. A restricted placement policy ensures that the application never runs on a node that is not on the list, even if you manually relocate it to that node.

You must specify hosting nodes in the HOSTING_MEMBERS attribute to use a favored or restricted placement policy. Do not specify hosting nodes in the HOSTING_MEMBERS attribute with a balanced placement policy. Otherwise, the application will not validate and you cannot register it. If ACTIVE_PLACEMENT is set to 1, then the placement of the application is reevaluated whenever you add a node to the cluster or if the cluster node restarts. This enables Oracle Clusterware to relocate applications to a preferred node after the node recovers from a failure.

Optional Resources in Placement Decisions

Oracle Clusterware uses optional resources to choose a hosting node based on the number of optional resources that are in an ONLINE state on the hosting node. If each node has an equal number of optional resources in an ONLINE state, then Oracle Clusterware considers the order of the optional resources as follows:

  • Oracle Clusterware compares the state of the optional resources on each node starting at the first resource that you list in the application profile and then proceeds through the list.

  • For each consecutive resource in your list, if the resource is ONLINE on one node, then any node that does not have the resource ONLINE is not considered.

  • Oracle Clusterware evaluates each resource in the list in this manner until only one node is available to host the resource.

  • The maximum number of optional resources is 58.

If this algorithm results in multiple preferred nodes, then the resource is placed on one of these nodes chosen according to its placement policy.

Oracle Clusterware Action Program Guidelines

This section provides the following guidelines for writing Oracle Clusterware action programs that interpret Oracle Clusterware start, stop, and check commands:

  • Oracle Clusterware relies on a status code upon exiting from an action program to set the resource state to ONLINE or OFFLINE. On Windows systems, the program should be nonblocking, which may imply a Windows service or a Windows resource that does not block during console interactions.

  • Action programs must return a status code to indicate success or failure. For example, on Linux or UNIX systems, if the script is written in Bourne Shell, then the program should issue exit(1) to indicate failure, and exit(0) to indicate success. Similarly, on Windows systems, the action program should return a status of (0) to indicate success and (1) to indicate failure.

  • After application failure, Oracle Clusterware calls the action program with a check parameter. The action program replies to Oracle Clusterware with a status of (1) to indicate failure. After receiving this failure status, Oracle Clusterware calls the action program with a stop parameter. It is important that the action program returns a status of (0) to indicate a successful stop of the application, even though the application was not running when the stop request was called.

  • Oracle Clusterware sets the resource state to UNKNOWN if an action program's stop entry point fails to exit within the number of seconds in the SCRIPT_TIMEOUT value, or if the action program returns with a code that indicates failure. This may occur during a start, relocation, or stop operation. Ensure that the action program's stop entry point exits with a value that indicates success if the resource is successfully stopped or if the resource is not running.

  • When a daemon or service starts, it usually needs to start as a background process, depending on the platform. However, a resource started in this way always returns success during a start attempt. This means that the default scripts cannot detect failures caused by minor errors, such as misspelled command paths.

    • On Linux and UNIX systems, if a resource does not move to the background immediately upon startup, then you can start the application in the background by adding an ampersand (&) to the end of the line that starts the application.

    • On Windows systems, you can use net start to start a service that needs to start in the background.

    • When using commands to start daemons or services in the background, interactively make test runs of the commands used in the script to eliminate errors before using the script with Oracle Clusterware.

How Oracle Clusterware Runs Action Programs

This section describes how Oracle Clusterware runs action programs. The first argument to an action program is the command start, stop, or check depending on which action Oracle Clusterware is running. The second argument is the Oracle Clusterware resource name of the application. This enables a script to determine which instance of the resource that Oracle Clusterware is starting, stopping, or checking.An action program can retrieve any of its Oracle Clusterware resource attributes from the environment by using $_CAA_attribute_name. For example, $_CAA_NAME contains the application name, the second argument to the script, and $_CAA_HOSTING_MEMBERS contains its HOSTING_MEMBERS attribute.

User Defined Attributes

Oracle Clusterware supports user-defined attributes in Oracle Clusterware applications, which are attributes having names that contain USR. User-defined attributes are stored as part of the Oracle Clusterware application profile for the application. You can reference them in an action program using $_USR_attributename. To add a user-defined attribute, add it to the file CRS_home/crs/template/application.tdf using the following syntax:

#
# an example user-defined attribute
#
#!===========================
attribute: USR_EXAMPLE
type: string
switch: -o example
default:
required: no

The attribute parameter contains the name of the new attribute. The type parameter defines the type of the user-defined attribute and can be one of the following:

  • string

  • boolean

  • integer—a numeric attribute

  • positive_integer—a numeric attribute that must be positive

  • name string

  • name_list—a comma-delimited list of names

The switch parameter describes how the attribute is specified for the crs_profile command. Set the required field to no for user-defined attributes.

Note:

User-defined attribute names that begin with USR_ORA are reserved for use by Oracle.

Windows crsuser Program

This section describes the Windows crsuser program. The syntax for the crsuser command is:

crsuser add [domain\]username

For example, on a Windows system you could issue the following command as an operating system user that is part of ORA_DBA group and the Local Administrator group:

C:\> crsuser add oracledomain\oracluster

Provide the user's Windows password. This creates the OracleCRSToken_user service that Oracle Clusterware needs to start the Oracle Clusterware resources under the given user ID (when they are not running as the LocalSystem account). You can also use the crsuser command: remove [domain\]username to remove a token service and crsuser list to list a registered users.

Using Oracle Clusterware Commands

This section describes how to use the Oracle Clusterware commands under the following topics:

Registering Application Resources

Each application that you manage with Oracle Clusterware must have an application profile and the profile must be added to OCR. Use the crsctl add resource command to add applications to OCR. For example, enter the following command to add the mail monitoring application from the previous example:

# crsctl add resource postman

If you modify a profile, then update the OCR by running the crs_register -u command again.

Starting Application Resources

To start an application resource that is registered with Oracle Clusterware, use the crs_start command. For example:

# crs_start postman

The following text is an example of the command output:

Attempting to start 'postman' on node 'rac1'
Start of 'postman' on node 'rac1' succeeded.

The application now runs on the node named rac1.

Note:

The name of the application resource may or may not be the same as the name of the application.

See Also :

Appendix D, "High Availability Oracle Clusterware Command-Line Reference and C API" for examples of Oracle Clusterware command output

The command waits for the amount of time specified by the setting for the SCRIPT_TIMEOUT parameter to receive a notification of success or failure from the action program each time the action program is called. Application resources can be started if they have stopped due to exceeding their failure threshold values. You must register a resource with crs_register before you can start it.

To start and stop the resources, use the crs_start and crs_stop commands. Manual starts or stops outside of Oracle Clusterware can invalidate the resource status. In addition, Oracle Clusterware may attempt to restart a resource on which you perform a manual stop operation.

All required resources must be online on the node where you start the resource. If the resources that the REQUIRED_RESOURCES parameter identifies are offline, then the command crs_start resource_name will start the required resources before starting the resource.

Running the crs_start command on a resource sets the resource target value to ONLINE. Oracle Clusterware attempts to change the state to match the target by running the action program with the start parameter. When a resource is running, both the target state and current state are ONLINE.

Starting an Application on an Unavailable Node

When starting an application on a cluster node that is unavailable, crs_start can give indeterminate results. In this scenario, the start section of the action program is run, but the cluster node fails before notification of the start is displayed on the command line. The crs_start command returns a failure with the error Remote start for resource_name failed on node node_name. The application is actually ONLINE but fails over to another node making the application appear as though it were started on the incorrect node.

If a cluster node fails while you are starting a resource on that node, then check the state of the resource on the cluster by using the crs_stat command to determine the state of that resource.

Relocating Applications and Application Resources

Use the crs_relocate command to relocate applications and application resources. For example, to relocate the mail monitoring application to the node known as rac2, enter the following command:

# crs_relocate postman -c rac2

Each time that the action program is called, the crs_relocate command waits for the duration identified by the value for the SCRIPT_TIMEOUT parameter to receive notification of success or failure from the action program. A relocation attempt fails if:

  • The application has required resources that are ONLINE on the initial node

  • Applications that require the specified resource are ONLINE on the initial node

To relocate an application and its required resources, use the -f option with the crs_relocate command. Oracle Clusterware relocates or starts all resources that are required by the application regardless of their state.

Stopping Applications and Application Resources

To stop applications and application resources, use the crs_stop command. Immediately after the crs_stop command completes, the application status converts to OFFLINE. Because Oracle Clusterware always attempts to match a resource's state to its target, the Oracle Clusterware subsystem stops the application. The following example stops the mail application from the example:

# crs_stop postman

The following text is an example of the command output:

Attempting to stop `postman` on node `rac1`
Stop of `postman` on node `rac1` succeeded.

You cannot stop an application if the application is a required resource for another online application unless you use the force (-f) option. If you use the crs_stop -f resource_name command on an application that is required by other resources and if those resources are online, then Oracle Clusterware stops the application. In addition, all of the resources that require that application that are online are also stopped.

Note:

Oracle Clusterware can only stop applications and application resources. Oracle Clusterware cannot stop network, tape, or media changer resources.

Managing Automatic Oracle Clusterware Resource Operations for Action Scripts

The following section explains additional information for controlling how Oracle Clusterware manages restarts. You can prevent Oracle Clusterware from automatically restarting a resource by setting several action program attributes. You can also control how Oracle Clusterware manages the restart counters for your action programs. In addition, you can customize the timeout values for the start, stop, and check actions that Oracle Clusterware performs on action scripts. These topics are described under the following headings:

Preventing Automatic Restarts

When a node stops and restarts, Oracle Clusterware starts the resources as soon as the node starts. This may not be desirable because resource startup might fail if system components on which the resource depends, such as a volume manager or a file system, are not running. This is especially true if Oracle Clusterware does not manage the system components on which the resource depends. To manage automatic restarts, you can use the AUTO_START attribute to specify whether Oracle Clusterware should automatically start a resource when a node restarts.

Valid AUTO_START values are:

  • always—Causes the resource to restart when the node restarts regardless of the resource's state when the node stopped.

  • restore—Does not start the resource at restart time if it was in an offline state, such as STATE=OFFLINE, TARGET=OFFLINE, when the node stopped. The resource is restored to its state when the node went down. The resource is started only if it was online before and not otherwise.

  • never—Oracle Clusterware never restarts the resource regardless of the resource's state when the node stopped.

Note:

Oracle only supports lower-case values for always, restore, and never.

Automatically Manage Restart Attempts Counter for Resources

When a resource fails, Oracle Clusterware restarts the resource for only the number of times specified in the profile attribute RESTART_ATTEMPTS regardless of how often the resource fails. The CRSD process maintains an internal counter to track how often a resource has been restarted. There is a mechanism by which Oracle Clusterware can automatically manage the restart attempts counter based on the stability of a resource. Use the UPTIME_THESHOLD attribute to indicate resource stability.

You can specify the time for the UPTIME_THRESHOLD attribute in different units of measure, such as seconds (s), minutes (m), hours (h), days (d) or weeks (w). Examples of valid values for this attribute are: 7d for seven days, 5h for five hours, 180m for 180 minutes, and so on. Specify the time period as a numeric value and the unit of measure as the last character, s, m, h, d, or w.

After the time period that you have indicated by the setting for UPTIME_THRESHOLD has elapsed, Oracle Clusterware resets the value for RESTART_COUNTS to 0. Oracle Clusterware can alert you when the value for RESTART_COUNT reaches the value that you have set for RESTART_ATTEMPTS.

Note:

Oracle Clusterware writes an alert to the CRSD log file when the value for RESTART_COUNT reaches the value that you have set for RESTART_ATTEMPTS. Oracle Clusterware does not write this message to the alert file.
RESTART_ATTEMPTS and RESTART_COUNT Failure Scenarios

Some of the failure scenarios for the RESTART_ATTEMPTS and RESTART_COUNT attributes are:

  • When a resource keeps restarting—The resource does not meet the uptime threshold criteria and it will be stopped after restarting for the number of attempts set by the value for RESTART_ATTEMPTS.

  • When a node fails or restarts—Oracle Clusterware resets the value for RESTART_COUNTER to 0 either when the resource relocates or when it restarts on the same node.

  • If the crsd process fails—Because both RESTART_COUNT and RESTART_ATTEMPTS are stored in OCR, the behavior is not affected.

Implications of Restart and Timeout Features for Previous Releases

If you have an installation that is prior to Oracle Database 11g, add the release 1 (11.1) attributes to your profiles by doing one of the following:

  • Modify your resources with the Oracle Database 11g attributes and reregister the resources. This causes the separate timeouts to be effective.

  • Do not modify your resources. This retains the pre-release behavior of Oracle Database 11g using the value in SCRIPT_TIMEOUT as the timeout for all start, stop, and check actions.

Unregistering Applications and Application Resources

To unregister an application or an application resource, use the crs_unregister command. You cannot unregister an application or resource that is ONLINE or required by another resource. The following example unregisters the mail application:

# crs_unregister postman

The unregistration process frees space on the OCR. Additionally, run the crs_unregister command as a clean-up step when a resource is no longer managed by Oracle Clusterware. Generally, you should unregister all permanently stopped applications.

Displaying Clusterware Application and Application Resource Status Information

To display status information about applications and resources that are on cluster nodes, use the crs_stat command. The following example displays the status information for the postman application:

# crs_stat postman
NAME=postman
TYPE=application
TARGET=ONLINE
STATE=ONLINE on rac2

Enter the following command to view information about all applications and resources in tabular format:

# crs_stat -t

The following text is an example of the command output:

Name         Type        Target      State     Host
----------------------------------------------------------------
cluster_lockd application ONLINE    ONLINE     rac2
dhcp          application OFFLINE   OFFLINE

Enter the following command to determine:

  • How many times an application resource has been restarted

  • How many times an application resource has failed within the failure interval

  • The maximum number of times that an application or resource can be restarted or fail

  • The target state of the application or resource and the normal status information

# crs_stat -v

To view verbose content in tabular format, enter the following command:

# crs_stat -v -t

The following text is an example of the command output:

Name          Type       R/RA  F/FT   Target    State    Host
----------------------------------------------------------------------
cluster_lockd application 0/30  0/0   ONLINE    ONLINE   rac2
dhcp          application 0/1   0/0   OFFLINE   OFFLINE
named         application 0/1   0/0   OFFLINE   OFFLINE
network1      application 0/1   0/0   ONLINE    ONLINE   rac1

Enter the following command to view the application profile information that is stored in the OCR:

# crs_stat -p

The following text is an example of the command output:

NAME=cluster_lockd
TYPE=application
ACTION_SCRIPT=cluster_lockd.scr
ACTIVE_PLACEMENT=0
AUTO_START=restore
CHECK_INTERVAL=5
DESCRIPTION=Cluster lockd/statd
FAILOVER_DELAY=30
FAILURE_INTERVAL=60
FAILURE_THRESHOLD=1
HOSTING_MEMBERS=
OPTIONAL_RESOURCES=
PLACEMENT=balanced
REQUIRED_RESOURCES=
RESTART_ATTEMPTS=2
SCRIPT_TIMEOUT=60 ...

See the crs_stat command for more information.

See Also:

Appendix D, "High Availability Oracle Clusterware Command-Line Reference and C API" for detailed information about Oracle Clusterware commands