1 Introduction to High Availability

A high availability architecture is one of the key requirements for any Enterprise Deployment. Oracle Fusion Middleware has an extensive set of high availability features, which protect its components and applications from unplanned down time and minimize planned downtime.

The solutions and procedures described in this book are designed to eliminate single points of failure for Oracle Fusion Middleware components with no or minimal down time. These solutions help ensure that applications that deployed with Oracle Fusion Middleware meet the required availability to achieve your business goals.

This guide discusses the architecture, interaction, and dependencies of Oracle Fusion Middleware components, and explains how they can be deployed in a high availability architecture.

This chapter explains high availability and its importance from the perspective of Oracle Fusion Middleware. This chapter includes the following sections:

1.1 What is High Availability

High availability refers to the ability of users to access a system without loss of service. Deploying a high availability system minimizes the time when the system is down, or unavailable and maximizes the time when it is running, or available. This section provides an overview of high availability from a problem-solution perspective. This section includes the following topics:

1.1.1 High Availability Problems

Mission critical computer systems need to be available 24 hours a day, 7 days a week, and 365 days a year. However, part or all of the system may be down during planned or unplanned downtime. A system's availability is measured by the percentage of time that it is providing service in the total time since it is deployed. Table 1-1 provides an example.

Table 1-1 Availability Percentages and Corresponding Downtime Values

Availability Percentage Approximate Downtime Per Year

95%

18 days

99%

4 days

99.9%

9 hours

99.99%

1 hour

99.999%

5 minutes


System downtime may be categorized as planned or unplanned. Unplanned downtime is any sort of unexpected failure. Planned downtime refers to scheduled operations that are known in advance and that render the system unavailable. The effect of planned downtime on end users is typically minimized by scheduling operational windows when system traffic is slow. Unplanned downtime may have a larger effect because it can happen at peak hours, causing a greater impact on system users.

These two types of downtimes (planned and unplanned) are usually considered separately when designing a system's availability requirements. A system's needs may be very restrictive regarding its unplanned downtimes, but very flexible for planned downtimes. This is the typical case for applications with high peak loads during working hours, but that remain practically inactive at night and during weekends. You may choose different high availability features depending on the type of failure is being addressed.

1.1.2 High Availability Solutions

High availability solutions can be categorized into local high availability solutions that provide high availability in a single data center deployment, and disaster recovery solutions, which are usually geographically distributed deployments that protect your applications from disasters such as floods or regional network outages.

Amongst possible types of failures, process, node, and media failures as well as human errors can be protected by local high availability solutions. Local physical disasters that affect an entire data center can be protected by geographically distributed disaster recovery solutions.

To solve the high availability problem, a number of technologies and best practices are needed. The most important mechanism is redundancy. High availability comes from redundant systems and components. You can categorize local high availability solutions by their level of redundancy, into active-active solutions and active-passive solutions (see Figure 1-1):

  • Active-active solutions deploy two or more active system instances and can be used to improve scalability as well as provide high availability. In active-active deployments, all instances handle requests concurrently.

  • Active-passive solutions deploy an active instance that handles requests and a passive instance that is on standby. In addition, a heartbeat mechanism is set up between these two instances. This mechanism is provided and managed through operating system vendor-specific clusterware. Generally, vendor-specific cluster agents are also available to automatically monitor and failover between cluster nodes, so that when the active instance fails, an agent shuts down the active instance completely, brings up the passive instance, and application services can successfully resume processing. As a result, the active-passive roles are now switched. The same procedure can be done manually for planned or unplanned downtime. Active-passive solutions are also generally referred to as cold failover clusters.

    You can use Oracle Cluster Ready Services (CRS) to manage the Fusion Middleware Active-Passive (CFC) solutions.

Figure 1-1 Active-Active and Active-Passive High Availability Solutions

Active-Active and Active-Passive High Availability Solutions
Description of "Figure 1-1 Active-Active and Active-Passive High Availability Solutions"

In addition to architectural redundancies, the following local high availability technologies are also necessary in a comprehensive high availability system:

  • Process death detection and automatic restart

    Processes may die unexpectedly due to configuration or software problems. A proper process monitoring and restart system should monitor all system processes constantly and restart them should problems appear.

    A system process should also maintain the number of restarts within a specified time interval. This is also important since continually restarting within short time periods may lead to additional faults or failures. Therefore a maximum number of restarts or retries within a specified time interval should also be designed as well.

  • Clustering

    Clustering components of a system together allows the components to be viewed functionally as a single entity from the perspective of a client for runtime processing and manageability. A cluster is a set of processes running on single or multiple computers that share the same workload. There is a close correlation between clustering and redundancy. A cluster provides redundancy for a system.

    If failover occurs during a transaction in a clustered environment, the session data is retained as long as there is at least one surviving instance available in the cluster.

  • State replication and routing

    For stateful applications, client state can be replicated to enable stateful failover of requests in the event that processes servicing these requests fail.

  • Failover

    With a load-balancing mechanism in place, the instances are redundant. If any of the instances fail, requests to the failed instance can be sent to the surviving instances.

  • Server load balancing

    When multiple instances of identical server components are available, client requests to these components can be load balanced to ensure that the instances have roughly the same workload.

  • Server Migration

    Some services can only have one instance running at any given point of time. If the active instance becomes unavailable, the service is automatically started on a different cluster member. Alternatively, the whole server process can be automatically started on a different system in the cluster.

  • Integrated High Availability

    Components depend on other components to provide services. The component should be able to recover from dependent component failures without any service interruption.

  • Rolling Patching

    Patching product binaries often requires down time. Patching a running cluster in a rolling fashion can avoid downtime. Patches can be uninstalled in a rolling fashion as well.

  • Configuration management

    A clustered group of similar components often need to share common configuration. Proper configuration management ensures that components provide the same reply to the same incoming request, allows these components to synchronize their configurations, and provides high availability configuration management for less administration downtime.

  • Backup and Recovery

    User errors may cause a system to malfunction. In certain circumstances, a component or system failure may not be repairable. A backup and recovery facility should be available to back up the system at certain intervals and restore a backup when an unrepairable failure occurs.

Disaster Recovery

Disaster recovery solutions typically set up two homogeneous sites, one active and one passive. Each site is a self-contained system. The active site is generally called the production site, and the passive site is called the standby site. During normal operation, the production site services requests; in the event of a site failover or switchover, the standby site takes over the production role and all requests are routed to that site. To maintain the standby site for failover, not only must the standby site contain homogeneous installations and applications, data and configurations must also be synchronized constantly from the production site to the standby site.

Figure 1-2 Geographically Distributed Disaster Recovery

Description of Figure 1-2 follows
Description of "Figure 1-2 Geographically Distributed Disaster Recovery"

Oracle Fusion Middleware Components Protected by High Availability Solutions

The Oracle Fusion Middleware High Availability Guide discusses high availability solutions for the following components:

  • Oracle WebLogic Server

  • Oracle SOA Suite

  • Oracle ADF

  • Oracle WebCenter

  • Oracle Identity Management Components

  • Oracle HTTP Server

  • Oracle Web Cache

  • Oracle Portal, Forms, Reports, and Discoverer

1.2 High Availability Information in Other Documentation

Table 1-2 lists Oracle Fusion Middleware guides (other than this guide) that contain high availability information. This information pertains to high availability of various Oracle Fusion Middleware components.

Table 1-2 High Availability Information in Oracle Fusion Middleware Documentation

Component Location of Information

Oracle SOA Suite

Oracle Fusion Middleware Administrator's Guide for Oracle SOA Suite

Oracle Fusion Middleware Installation Guide for Oracle SOA Suite

Oracle Fusion Middleware Enterprise Deployment Guide for Oracle SOA Suite

Oracle WebCenter

Oracle Fusion Middleware Administrator's Guide for Oracle WebCenter

Oracle Fusion Middleware Installation Guide for Oracle WebCenter

Oracle Fusion Middleware Enterprise Deployment Guide for Oracle WebCenter

Oracle ADF

Oracle Fusion Middleware Fusion Developer's Guide for Oracle Application Development Framework

Oracle Fusion Middleware Web User Interface Developer's Guide for Oracle Application Development Framework

Oracle Data Integrator

Oracle Fusion Middleware Developer's Guide for Oracle Data Integrator

Oracle Fusion Middleware Connectivity and Knowledge Modules Guide for Oracle Data Integrator

Oracle Fusion Middleware Knowledge Module Developer's Guide for Oracle Data Integrator

Oracle WebLogic Server Clusters

Oracle Fusion Middleware Using Clusters for Oracle WebLogic Server

Oracle Fusion Middleware Backup and Recovery

Oracle Fusion Middleware Administrator's Guide

Oracle Web Cache

Oracle Fusion Middleware Administrator's Guide for Oracle Web Cache

Oracle Identity Management

Oracle Fusion Middleware Installation Guide for Oracle Identity Management

Oracle Fusion Middleware Enterprise Deployment Guide for Oracle Identity Management

Oracle Virtual Directory

Oracle Fusion Middleware Administrator's Guide for Oracle Virtual Directory

Oracle HTTP Server

Oracle Fusion Middleware Administrator's Guide for Oracle HTTP Server

Oracle Internet Directory

Oracle Fusion Middleware Administrator's Guide for Oracle Internet Directory

Oracle Access Manager

Oracle Fusion Middleware Administrator's Guide for Oracle Access Manager

Oracle Authorization Policy Manager

Oracle Fusion Middleware Administrator's Guide for Authorization Policy Manager

Oracle Identity Manager

Oracle Fusion Middleware Administrator's Guide for Oracle Identity Manager

Oracle Adaptive Access Manager

Oracle Fusion Middleware Administrator's Guide for Oracle Adaptive Access Manager

Oracle Real Application Clusters (Oracle RAC)

Oracle Real Application Clusters Installation Guide

Oracle Enterprise Content Management Suite

Oracle Fusion Middleware Overview Guide for Oracle Enterprise Content Management

Oracle Imaging and Process Management

Oracle Fusion Middleware Administrator's Guide for Oracle Imaging and Process Management

Oracle Universal Content Management

Oracle Fusion Middleware System Administrator's Guide for Content Server

Oracle Universal Records Management

Oracle Fusion Middleware Administrator's Guide for Universal Records Management

Oracle Repository Creation Utility (RCU)

Oracle Fusion Middleware Repository Creation Utility User's Guide

Oracle Portal

Oracle Fusion Middleware Administrator's Guide for Oracle Portal

Oracle Forms

Oracle Fusion Middleware Forms Services Deployment Guide

Oracle Reports

Oracle Fusion Middleware Oracle Reports User's Guide to Building Reports

Oracle Business Intelligence Discoverer

Oracle Fusion Middleware Administrator's Guide for Oracle Business Intelligence Discoverer

Oracle Business Intelligence Enterprise Edition

Oracle Fusion Middleware System Administrator's Guide for Oracle Business Intelligence Enterprise Edition

Oracle Real-Time Decisions

Oracle Fusion Middleware Administrator's Guide for Oracle Real-Time Decisions