Conditions for Agent and Server Terminations

Table 130 lists the failure conditions that cause Essbase Agent and Essbase Server to terminate, the affected components, OPMN reactions, and Essbase client responses.

Table 130. Essbase Agent and Server Termination Scenarios

Failure Condition

Assumed Working

Failover Functionality

Expected Service Level Agreement

Essbase Server death:

  • Software bug

  • Lease expired

  • Abnormal error condition

Essbase Agent

Network

Shared disk

OPMN is not involved.

Essbase Agent restarts an Essbase Server on getting a new request; there may be a slight delay in server startup while waiting for lease availability.

Client gets a “request failed” error (while request is bound to the server).

Client must reissue the request.

Essbase Agent death:

  • Software bug

  • Lease expired

  • Abnormal error condition

Network

Shared disk

OPMN ping of Essbase Agent fails.

OPMN attempts to restart Essbase Agent on same node.

If Essbase Agent does not restart on the same node, OPMN initiates failover on the passive node.

Essbase Agent restarts servers on getting application requests.

Client gets a network disconnect error (when attempting to contact Essbase Agent using SessionID).

Servers become orphaned and the following events occur:

  1. Servers stop renewing their lease and terminate.

  2. Servers detect pipe break and terminate.

Either (1) or (2) occurs, whichever comes earlier, and servers terminate.

Client must re-login to Essbase (after the agent restarts).

Client must resubmit request.

Network outage

(network partition, that is. primary IP [eth0] down on active node)

Essbase Agent (on current active node)

Essbase Servers (on current active node)

Shared disk

Lease database (potentially, if reachable using from a different network interface—not eth0)

OPMN “forward ping” of Essbase Agent fails.

OPMN attempts to restart Essbase Agent on local node, which fails (eth0 is down); Essbase Agent terminates.

Servers detect Essbase Agent death and terminate.

Network heartbeat between ONS peers in active and passive node fails (that is, a network partition).

OPMN in passive node becomes new active node.

OPMN brings up the Essbase Agent on the new active node; there might be a slight delay for the Essbase Agent lease to become available.

Essbase Agent brings up Essbase Servers as requests are received.

Client gets “error” while trying to reuse Session or “hangs” until network outage detected.

Essbase Agent and servers becomes unreachable.

Client musty re-login to the Essbase Agent after it comes up on passive node; and then it needs to resubmit requests.

Essbase.lck file exists after Agent death

Not applicable

Not applicable

Essbase.lck file is removed in failover mode.

Not applicable

Essbase.sec file is corrupt after Essbase Agent death

(non-unique scenario; could be a follow on to Essbase Agent crash or network partition)

Network

Shared disk

Not applicable for Essbase failover.

Essbase Agent does not start until the administrator restores good essbase.sec from backup.

Service unavailable without Administrator.

After Essbase Agent comes back up, clients must re-login.

Disk outage

(Shared disk down)

Network

Not applicable for Essbase failover.

Customer needs to eliminate single point of failure in shared disk.

This can be addressed by running the shared disk with a mirroring setup, like in a SAN with disk redundancy (RAID 1-0 configuration).

Both active and passive nodes fail.

Service is not available.

Lease database outage

Network

Shared disk

Essbase Agent unable to renew lease and terminates.

Servers unable to renew lease and terminate.

You need to eliminate single point of failure for lease database. Oracle recommends that you run the lease database (which is relational) in cold failover cluster (CFC) (active-passive) mode or RAC mode (active-active).

Service unavailable.

Both active and passive nodes are unable to run Essbase.

Node failure (catastrophic hardware failure)

Network

Shared disk

Network heartbeat between ONS peers on active and passive nodes fails (current active node has crashed).

OPMN on the passive node becomes the new active node.

OPMN brings up Essbase Agent on the new active node; there might be a slight delay for the Essbase Agent lease to become available.

Essbase Agent brings up Essbase Servers as requests come in.

Client gets “error” while trying to reuse Session; Agents and Servers have died.

Client must re-login to Essbase Agent after it comes up on standby node; then, it needs to resubmit requests.

Shared Services Web application outage

Network

Shared disk

Essbase Agent

Essbase Servers

Not applicable to Essbase failover.

As long as the LDAP provider is up (OpenLDAP, Oracle LDAP/External Directory) , Essbase Agent can authenticate users (there is no runtime dependency on Shared Services).

Certain user operations will fail; for example, create and delete application fails (which updates Shared Services Registry using the Shared Services Web server).

Existing clients continue to work.

Essbase Agent and server hang (application bug)

Network

Shared disk

Essbase Agent

Essbase Agent and server hangs are not explicitly handled, but the overall robustness of the agent and servers improves when using failover clusters.

As long as Essbase Agent and server are able to renew their leases, there is no change to the existing behavior.