Skip Headers
Oracle® Enterprise Manager Advanced Configuration
10g Release 5 (10.2.0.5)

Part Number E10954-03
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Feedback page
Contact Us

Go to previous page
Previous
Go to next page
Next
PDF · Mobi · ePub

3 Grid Control Common Configurations

Oracle Enterprise Manager 10g Grid Control is based on a flexible architecture, which allows you to deploy the Grid Control components in the most efficient and practical manner for your organization. This chapter describes some common configurations that demonstrate how you can deploy the Grid Control architecture in various computing environments.

The common configurations are presented in a logical progression, starting with the simplest configuration and ending with a complex configuration that involves the deployment of high availability components, such as load balancers, Oracle Real Application Clusters, and Oracle Data Guard.

This chapter contains the following sections:

3.1 About Common Configurations

The common configurations described in this chapter are provided as examples only. The actual Grid Control configurations that you deploy in your own environment will vary depending upon the needs of your organization.

For instance, the examples in this chapter assume you are using the OracleAS Web Cache port to access the Grid Control Console. By default, when you first install Grid Control, you display the Grid Control Console by navigating to the default OracleAS Web Cache port. In fact, you can modify your own configuration so administrators bypass OracleAS Web Cache and use a port that connects them directly to the Oracle HTTP Server.

For another example, in a production environment you will likely want to implement firewalls and other security considerations. The common configurations described in this chapter are not meant to show how firewalls and security policies should be implemented in your environment.

See Also:

Chapter 6, "Enterprise Manager Security" for information about securing the connections between Grid Control components

Chapter 7, "Configuring Enterprise Manager for Firewalls" for information about configuring firewalls between Grid Control components

Besides providing a description of common configuration this chapter can also help you understand the architecture and flow of data among the Grid Control components. Based on this knowledge, you can make better decisions about how to configure Grid Control for your specific management requirements.

The Grid Control architecture consists of the following software components:

The remaining sections of this chapter describe how you can deploy these components in a variety of combinations and across a single host or multiple hosts.

3.2 Deploying Grid Control Components on a Single Host

Figure 3-1 shows how each of the Grid Control components are configured to interact when you install Grid Control on a single host. This is the default configuration that results when you use the Grid Control installation procedure to install the Enterprise Manager 10g Grid Control Using a New Database installation type.

Figure 3-1 Grid Control Components Installed on a Single Host

Description of Figure 3-1 follows
Description of "Figure 3-1 Grid Control Components Installed on a Single Host"

When you install all the Grid Control components on a single host, the management data travels along the following paths:

  1. Administrators use the Grid Control Console to monitor and administer the managed targets that are discovered by the Management Agents on each host. The Grid Control Console uses the default OracleAS Web Cache port (for example, port 7777 on UNIX systems and port 80 on Windows systems) to connect to the Oracle HTTP Server. The Management Service retrieves data from the Management Repository as it is requested by the administrator using the Grid Control Console.

    See Also:

    Oracle Application Server Web Cache Administrator's Guide for more information about the benefits of using OracleAS Web Cache
  2. The Management Agent loads its data (which includes management data about all the managed targets on the host, including the Management Service and the Management Repository database) by way of the Oracle HTTP Server upload URL. The Management Agent uploads data directly to Oracle HTTP Server and bypasses OracleAS Web Cache. The default port for the upload URL is 4889 (it if is available during the installation procedure). The upload URL is defined by the REPOSITORY_URL property in the following configuration file in the Management Agent home directory:

    AGENT_HOME/sysman/config/emd.properties (UNIX)
    AGENT_HOME\sysman\config\emd.properties (Windows)
    

    See Also:

    "Understanding the Enterprise Manager Directory Structure" for more information about the AGENT_HOME directory
  3. The Management Service uses JDBC connections to load data into the Management Repository database and to retrieve information from the Management Repository so it can be displayed in the Grid Control Console. The Management Repository connection information is defined in the following configuration file in the Management Service home directory:

    ORACLE_HOME/sysman/config/emoms.properties (UNIX)
    ORACLE_HOME\sysman\config\emoms.properties (Windows)
    

    See Also:

    "Reconfiguring the Oracle Management Service" for more information on modifying the Management Repository connection information in the emoms.properties file
  4. The Management Service sends data to the Management Agent by way of HTTP. The Management Agent software includes a built-in HTTP listener that listens on the Management Agent URL for messages from the Management Service. As a result, the Management Service can bypass the Oracle HTTP Server and communicate directly with the Management Agent. If the Management Agent is on a remote system, no Oracle HTTP Server is required on the Management Agent host.

    The Management Service uses the Management Agent URL to monitor the availability of the Management Agent, submit Enterprise Manager jobs, and other management functions.

    The Management Agent URL can be identified by the EMD_URL property in the following configuration file in the Management Agent home directory:

    AGENT_HOME/sysman/config/emd.properties (UNIX)
    AGENT_HOME\sysman\config\emd.properties (Windows)
    

    For example:

    EMD_URL=http://host1.acme.com:1831/emd/main/
    

    In addition, the name of the Management Agent as it appears in the Grid Control Console consists of the Management Agent host name and the port used by the Management Agent URL.

3.3 Managing Multiple Hosts and Deploying a Remote Management Repository

Installing all the Grid Control components on a single host is an effective way to initially explore the capabilities and features available to you when you centrally manage your Oracle environment.

A logical progression from the single-host environment is to a more distributed approach, where the Management Repository database is on a separate host and does not compete for resources with the Management Service. The benefit in such a configuration is scalability; the workload for the Management Repository and Management Service can now be split. This configuration also provides the flexibility to adjust the resources allocated to each tier, depending on the system load. (Such a configuration is shown in Figure 3-2.) See Section 3.4.2.1, "Monitoring the Load on Your Management Service Installations" for additional information.

Figure 3-2 Grid Control Components Distributed on Multiple Hosts with One Management Service

Description of Figure 3-2 follows
Description of "Figure 3-2 Grid Control Components Distributed on Multiple Hosts with One Management Service"

In this more distributed configuration, data about your managed targets travels along the following paths so it can be gathered, stored, and made available to administrators by way of the Grid Control Console:

  1. Administrators use the Grid Control Console to monitor and administer the targets just as they do in the single-host scenario described in Section 3.2.

  2. Management Agents are installed on each host on the network, including the Management Repository host and the Management Service host. The Management Agents upload their data to the Management Service by way of the Management Service upload URL, which is defined in the emd.properties file in each Management Agent home directory. The upload URL bypasses OracleAS Web Cache and uploads the data directly through the Oracle HTTP Server.

  3. The Management Repository is installed on a separate machine that is dedicated to hosting the Management Repository database. The Management Service uses JDBC connections to load data into the Management Repository database and to retrieve information from the Management Repository so it can be displayed in the Grid Control Console. This remote connection is defined in the emoms.properties configuration file in the Management Service home directory.

  4. The Management Service communicates directly with each remote Management Agent over HTTP by way of the Management Agent URL. The Management Agent URL is defined by the EMD_URL property in the emd.properties file of each Management Agent home directory. As described in Section 3.2, the Management Agent includes a built-in HTTP listener so no Oracle HTTP Server is required on the Management Agent host.

3.4 Using Multiple Management Service Installations

In larger production environments, you may find it necessary to add additional Management Service installations to help reduce the load on the Management Service and improve the efficiency of the data flow.

Note:

When you add additional Management Service installations to your Grid Control configuration, be sure to adjust the parameters of your Management Repository database. For example, you will likely need to increase the number of processes allowed to connect to the database at one time. Although the number of required processes will vary depending on the overall environment and the specific load on the database, as a general guideline, you should increase the number of processes by 40 for each additional Management Service.

For more information, see the description of the PROCESSES initialization parameter in the Oracle Database Reference.

The following sections provide more information about this configuration:

3.4.1 Understanding the Flow of Management Data When Using Multiple Management Services

Figure 3-3 shows a typical environment where an additional Management Service has been added to improve the scalability of the Grid Control environment.

Figure 3-3 Grid Control Architecture with Multiple Management Service Installations

Description of Figure 3-3 follows
Description of "Figure 3-3 Grid Control Architecture with Multiple Management Service Installations"

In a multiple Management Service configuration, the management data moves along the following paths:

  1. Administrators can use one of two URLs to access the Grid Control Console. Each URL refers to a different Management Service installation, but displays the same set of targets, all of which are loaded in the common Management Repository. Depending upon the host name and port in the URL, the Grid Control Console obtains data from the Management Service (by way of OracleAS Web Cache and the Oracle HTTP Server) on one of the Management Service hosts.

  2. Each Management Agent uploads its data to a specific Management Service, based on the upload URL in its emd.properties file. That data is uploaded directly to the Management Service by way of Oracle HTTP Server, bypassing OracleAS Web Cache.

    Whenever more than one Management Service is installed, it is a best practice to have the Management Service upload to a shared directory. This allows all Management Service processes to manage files that have been downloaded from any Management Agent. This protects from the loss of any one Management Server causing a disruption in upload data from Management Agents.

    Configure this functionality from the command line of each Management Service process as follows:

    emctl config oms loader -shared <yes|no> -dir <load directory>

    Important:

    By adding a load balancer, you can avoid the following problems:
    • Should the Management Service fail, any Management Agent connected to it cannot upload data.

    • Because user accounts only know about one Management Service, users lose connectivity should the Management Service go down even if the other Management Service is up.

    See Section 3.5, "High Availability Configurations - Maximum Availability Architecture" for information regarding load balancers.

    Note:

    If deployment procedures are being used in this environment, they should be configured to use shared storage in the same way as the shared Management Service loader. To modify the location for the deployment procedure library:
    1. Click the Deployments tab on the Enterprise Manager Home page.

    2. Click the Provisioning subtab.

    3. On the Provisioning page, click the Administration subtab.

    4. In the Software Library Configuration section, click Add to set the Software Library Directory Location to a shared storage that can be accessed by any host running the Management Service.

  3. Each Management Service communicates by way of JDBC with a common Management Repository, which is installed in a database on a dedicated Management Repository host. Each Management Service uses the same database connection information, defined in the emoms.properties file, to load data from its Management Agents into the Management Repository. The Management Service uses the same connection information to pull data from the Management Repository as it is requested by the Grid Control Console.

  4. Any Management Service in the system can communicate directly with any of the remote Management Agents defined in the common Management Repository. The Management Services communicate with the Management Agents over HTTP by way of the unique Management Agent URL assigned to each Management Agent.

    As described in Section 3.2, the Management Agent URL is defined by the EMD_URL property in the emd.properties file of each Management Agent home directory. Each Management Agent includes a built-in HTTP listener so no Oracle HTTP Server is required on the Management Agent host.

3.4.2 Determining When to Use Multiple Management Service Installations

Management Services not only exist as the receivers of upload information from Management Agents. They also retrieve data from the Management Repository. The Management Service renders this data in the form of HTML pages, which are requested by and displayed in the client Web browser. In addition, the Management Services perform background processing tasks, such as notification delivery and the dispatch of Enterprise Manager jobs.

As a result, the assignment of Management Agents to Management Services must be carefully managed. Improper distribution of load from Management Agents to Management Services may result in perceived:

  • Sluggish user interface response

  • Delays in delivering notification messages

  • Backlog in monitoring information being uploaded to the Management Repository

  • Delays in dispatching jobs

The following sections provide some tips for monitoring the load and response time of your Management Service installations:

3.4.2.1 Monitoring the Load on Your Management Service Installations

If your environment is not configured with an SLB, to keep the workload evenly distributed you should always be aware of how many Management Agents are configured per Management Service and monitor the load on each Management Service.

At any time, you can view a list of Management Agents and Management Services using Setup on the Grid Control console.

Use the charts on the Overview page of Management Services and Repository to monitor:

  • Loader backlog (files)

    The Loader is part of the Management Service that pushes metric data into the Management Repository at periodic intervals. If the Loader Backlog chart indicates that the backlog is high and Loader output is low, there is data pending load, which may indicate a system bottleneck or the need for another Management Service. The chart shows the total backlog of files totaled over all Oracle Management Services for the past 24 hours. Click the image to display loader backlog charts for each individual Management Service over the past 24 hours.

  • Notification delivery backlog

    The Notification Delivery Backlog chart displays the number of notifications to be delivered that could not be processed in the time allocated. Notifications are delivered by the Management Services. This number is summed across all Management Services and is sampled every 10 minutes. The graph displays the data for the last 24 hours. It is useful for determining a growing backlog of notifications. When this graph shows constant growth over the past 24 hours, then consider adding another Management Service, reducing the number of notification rules, and verifying that all rules and notification methods are useful and valid.

3.4.2.2 Monitoring the Response Time of the Enterprise Manager Web Application Target

The information on the Management Services and Repository page can help you determine the load being placed on your Management Service installations. More importantly, consider how the performance of your Management Service installations is affecting the performance of the Grid Control Console.

Use the EM Website Web Application target to review the response time of the Grid Control Console pages:

  1. From the Grid Control Console, click the Targets tab and then click the Web Applications subtab.

  2. Click EM Website in the list of Web Application targets.

  3. In the Key Test Summary table, click homepage. The resulting page provides the response time of the Grid Control Console homepage URL.

    See Also:

    The Enterprise Manager online help for more information about using the homepage URL and Application Performance Management (also known as Application Performance Monitoring) to determine the performance of your Web Applications
  4. Click Page Performance to view the response time of some selected Grid Control Console pages.

    Note:

    The Page Performance page provides data generated only by users who access the Grid Control Console by way of the OracleAS Web Cache port (usually, 7777).
  5. Select 7 Days or 31 Days from the View Data menu to determine whether or not there are any trends in the performance of your Grid Control installation.

    Consider adding additional Management Service installations if the response time of all pages is increasing over time or if the response time is unusually high for specific popular pages within the Grid Control Console.

Note:

You can use Application Performance Management and Web Application targets to monitor your own Web applications. For more information, see Chapter 8, "Configuring Services".

3.5 High Availability Configurations - Maximum Availability Architecture

Highly Available systems are critical to the success of virtually every business today. It is equally important that the management infrastructure monitoring these mission-critical systems are highly available. The Enterprise Manager Grid Control architecture is engineered to be scalable and available from the ground up. It is designed to ensure that you concentrate on managing the assets that support your business, while it takes care of meeting your business Service Level Agreements.

When you configure Grid Control for high availability, your aim is to protect each component of the system, as well as the flow of management data in case of performance or availability problems, such as a failure of a host or a Management Service.

Maximum Availability Architecture (MAA) provides a highly available Enterprise Manager implementation by guarding against failure at each component of Enterprise Manager.

The impacts of failure of the different Enterprise Manager components are:

Overall, failure in any component of Enterprise Manager can result in substantial service disruption. Therefore it is essential that each component be hardened using a highly available architecture.

You can configure Enterprise Manager to run in either active-active or active-passive mode using a single instance database as the Management Repository. The following text summarizes the active-active mode.

Refer to the following sections for more information about common Grid Control configurations that take advantage of high availability hardware and software solutions. These configurations are part of the Maximum Availability Architecture (MAA).

3.5.1 Configuring the Management Repository

Before installing Enterprise Manager, you should prepare the database, which will be used for setting up Management Repository. Install the database using Database Configuration Assistant (DBCA) to make sure that you inherit all Oracle install best practices.

  • Configure Database

    • For both high availability and scalability, you should configure the Management Repository in the latest certified database version, with the RAC option enabled. Check for the latest version of database certified for Enterprise Manager from the Certify tab on the My Oracle Support website.

    • Choose Automatic Storage Management (ASM) as the underlying storage technology.

    • When the database installation is complete:

      Go to $ORACLE_HOME/rbdms/admin directory of the database home and execute the 'dbmspool.sql'

      This installs the DBMS_SHARED_POOL package, which will help in improving throughput of the Management Repository.

  • Install Enterprise Manager

    While installing Enterprise Manager using Oracle Universal Installer (OUI), you will be presented with two options for configuring the Management Repository:

    • Option 1: Install using a new database (default install)

    • Option 2: Install using an existing database.

    For MAA, you should chose 'Option 2: Install using an existing database'. When prompted for the 'existing database', you can point to the database configured in the previous step to setup the Management Repository.

3.5.1.1 Post Management Service - Install Management Repository Configuration

There are some parameters that should be configured during the Management Repository database install (as previously mentioned) and some parameters that should be set after the Management Service has been installed. Once the Enterprise Manager console is available, it can be used to configure these best practices in the Management Repository. These best practices fall in the area of:

  • Configuring Storage

  • Configuring Oracle Database 10g with RAC for High Availability and Fast Recover Ability

    • Enable ARCHIVELOG Mode

    • Enable Block Checksums

    • Configure the Size of Redo Log Files and Groups Appropriately

    • Use a Flash Recovery Area

    • Enable Flashback Database

    • Use Fast-Start Fault Recovery to Control Instance Recovery Time

    • Enable Database Block Checking

    • Set DISK_ASYNCH_IO

    The details of these settings are available in the Oracle Database High Availability Best Practices.

    Use the MAA Advisor for additional high availability recommendations that should be applied to the Management Repository. To access the MAA Advisor:

    1. On the Database Target Home page, locate the High Availability section.

    2. Click Details next to the Console item.

    3. In the Availability Summary section of the High Availability Console page, click Details next to the MAA Advisor item.

3.5.2 Configuring the Management Services

Once you configure the Management Repository, the next step is to install and configure the Enterprise Manager Grid Control mid-tier, the Management Services, for greater availability. Before discussing steps that add mid-tier redundancy and scalability, note that the Management Service itself has a built in restart mechanism based on the Oracle Process Management and Notification Service (OPMN). This service will attempt to restart a Management Service that is down.

3.5.2.1 Management Service Install Location

If you are managing a large environment with multiple Management Services and Management Repository nodes, then consider installing the Management Services on hardware nodes that are different from Management Repository nodes (Figure 3-4). This allows you to scale out the Management Services in the future.

Figure 3-4 Management Service and Management Repository on Separate Hardware

Surrounding text describes Figure 3-4 .

Also consider the network latency between the Management Service and the Management Repository while determining the Management Service install location. The distance between the Management Service and the Management Repository may be one of the factors that effect network latency and hence determine Enterprise Manager performance.

If the network latency between the Management Service and Management Repository tiers is high or the hardware available for running Enterprise Manager is limited, then the Management Service can be installed on the same hardware as the Management Repository (Figure 3-5). This allows for Enterprise Manager high availability, as well as keep the costs down.

Figure 3-5 Management Service and Management Repository on Same Hardware

Surrounding text describes Figure 3-5 .

Note:

Starting with Enterprise Manager 10g release 10.2.0.2, you can install the Management Service onto the same nodes as the RAC Management Repository. Refer to the instructions specified in the README for doing the same.

3.5.2.2 Configure Management Service to Management Repository Communication

Once all the Management Service processes have been installed, they need to be configured to communicate with each node of the RAC Management Repository in a redundant fashion. To accomplish this, modify the field 'emdRepConnectDescriptor' in the file $ORACLE_HOME/sysman/config/emoms.properities for each installed Management Service. The purpose of this configuration is to make the Management Service aware of all instances in the database cluster that are able to provide access to the Management Repository through the database service 'EMREP'.

Note that Real Application Cluster (RAC) nodes are referred to by their virtual IP (vip) names. The service_name parameter is used instead of the system identifier (SID) in connect_data mode and failover is turned on. Refer to Oracle Database Net Services Administrator's Guide for details.

oracle.sysman.eml.mntr.emdRepConnectDescriptor=(DESCRIPTION\=
(ADDRESS_LIST\=(FAILOVER\=ON) (ADDRESS\=(PROTOCOL\=TCP)(HOST\=node1-vip.example.com)
(PORT\=1521))(ADDRESS\=(PROTOCOL\=TCP)(HOST\=node2-vip.example.com)
(PORT\=1521)) (CONNECT_DATA\=(SERVICE_NAME\=EMREP)))

After making the previous change, run the following command to make the same change to the monitoring configuration used for the Management Services and Repository target: emctl config emrep -conn_desc "<tns alias>"

3.5.2.3 Configure Management Service to Direct Traffic Through SLB

Finally, modify the Management Service to take advantage of the capabilities of the Server Load Balancer. These modifications will cause all the Management Service nodes to redirect Enterprise Manager console traffic through the server load balancer, thereby presenting a single URL to the Enterprise Manager user.

To prevent the browser from bypassing the load balancer when a URL is redirected, Grid Control will now reconfigure the ssl.conf file as a part of resecuring the Management Service. After the Load Balancer is configured, issue the following command from the Management Server home: emctl secure oms

Modify the 'Port' in the Oracle HTTP Server configuration file at $ORACLE_HOME/Apache/Apache/conf/ssl.conf to be '443'. This assumes you are running in the default secured configuration between the Management Service and Management Agent.

3.5.3 Installing Additional Management Services

Install at least one additional Management Service using the Oracle Universal Installer (OUI) option 'Add Additional Management Service'. While you need two Management Services at the minimum for High Availability, additional Management Service processes can be installed depending on anticipated workload or based on system usage data. See Chapter 12, "Sizing and Maximizing the Performance of Oracle Enterprise Manager" for sizing recommendations.

Now that the first Management Service has been setup for high availability, the configuration can be copied over to additional Management Services easily using new emctl commands. Note the following considerations before installing additional Management Services.

  • The additional Management Service should be hosted in close network proximity to the Management Repository database for network latency reasons.

  • Configure the directory used for the shared filesystem loader to be available on the additional Management Service host using the same path as the first Management Service. Refer to Section 3.5.3.1, "Configuring Shared File Areas for Management Services" for additional information.

  • Additional Management Services should be installed using the same OS user and group as the first Management Service. Proper user equivalence should be setup so that files created by the first Management Service on the shared loader directory can be accessed and modified by the additional Management Service process.

  • Adjust the parameters of your Management Repository database. For example, you will likely need to increase the number of processes allowed to connect to the database at one time. Although the number of required processes will vary depending on the overall environment and the specific load on the database, as a general guideline, you should increase the number of processes by 40 for each additional Management Service.

  • For the install to succeed, the emkey might need to be copied temporarily to the Management Repository on the first Management Service using emctl config emkey -copy_to_repos if it had been removed earlier from Management Repository as per security best practices.

Install the Management Service software using steps documented in the Oracle Enterprise Manager Grid Control Installation and Basic Configuration Guide. Refer to the Installing Software-Only and Configuring Later - Additional Management Service option. Update the software to the latest patchset to match the first Management Service. Note that the emctl commands to copy over the configuration from the first Management Service do not copy over any software binaries. Once you have the additional Management Service installed, use the following steps to copy over the configuration from the first Management Service.

  1. Export the configuration from the first Management Service using emctl exportconfig oms -dir <location for the export file>

  2. Copy over the exported file to the additional Management Service

  3. Shutdown the additional Management Service

  4. Import the exported configuration on the additional Management Service using emctl importconfig oms -file <full path of the export file>

  5. Restart the additional Management Service

  6. Setup EMCLI using emcli setup -url=https://slb.example.com/em -username sysman -password <sysman password> -nocertvalidate

  7. Resecure the Management Agent that is installed with the additional Management Service to upload to SLB using emctl secure agent -emdWalletSrcUrl https://slb.example.com:<upload port>/em

  8. Update the SLB configuration by adding the additional Management Service to the different pools on the SLB. Setup monitors for the new Management Service. Modify the ssl.conf file to set the Port directive to the SLB virtual port used for UI access.

3.5.3.1 Configuring Shared File Areas for Management Services

The Management Service for Grid Control 10g Release 2 has a high availability feature called the Shared Filesystem Loader. In the Shared Filesystem Loader, management data files received from Management Agents are stored temporarily on a common shared location called the shared receive directory. All Management Services are configured to use the same storage location for the shared receive directory. The Management Services coordinate internally and distribute amongst themselves the workload of uploading files into the Management Repository. Should a Management Service go down, its workload is taken up by surviving Management Services.

  1. Allow all Management Services to process the Management Agent data and take better advantage of available resources.

  2. The ability of another Management Service to process Management Agent data in the event of a failure of a Management Service.

    To configure the Management Service to use Shared File system Loader, you must run the following steps:

    1. Stop all Oracle Management Services.

    2. Configure a shared receive directory that is accessible by all Management Services using redundant file system storage.

    3. Execute:

      emctl config oms loader -shared yes -dir <loaderdirectory> individually on all Management Service hosts, where <loaderdirectory> is the full path to the shared receive directory created.

    Note:

    Enterprise Manager will fail to start if all the Management Services are not configured to point to the same shared directory. This shared directory should be on redundant storage.

    Note:

    To modify the location for the deployment procedure library using the Enterprise Manager UI:
    1. Click the Deployments tab on the Enterprise Manager Home page.

    2. Click the Provisioning subtab.

    3. On the Provisioning page, click the Administration subtab.

    4. In the Software Library Configuration section, click Add to set the Software Library Directory Location to a shared storage that can be accessed by any host running the Management Service.

  3. Configure Software Library

    Since software library location has to be accessed by all Management Services, considerations similar to shared filesystem loader directory apply here too. Refer to Chapter 16, "Using a Software Library,", for details.

3.5.4 Configuring a Load Balancer

This section describes guidelines you can use for configuring a load balancer to balance the upload of data from Management Agents to multiple Management Service installations.

In the following examples, assume that you have installed two Management Service processes on Host A and Host B. For ease of installation, start with two hosts that have no application server processes installed. This ensures that the default ports are used as seen in the following table. The examples use these default values for illustration purposes.

Table 3-1 Management Service Ports

Name Default Value Description Source Defined By

Secure Upload Port

1159

Used for secure upload of management data from Management Agents.

httpd_em.conf and emoms.properties

Install. Can be modified by emctl secure OMS - secure port <port> command.

Agent Registration Port

4889

Used by Management Agents during the registration phase to download Management Agent wallets, for example, during emctl secure agent. In an unlocked Management Service, it can be used for uploading management data to the Management Service.

httpd_em.conf and emoms.properties

Install

Secure Console Port

4444

Used for secure (https) console access.

ssl.conf

Install

Unsecure Console Port

7777

Used for unsecure (http) console access.

httpd.conf

Install

Webcache Secure Port

4443

Used for secure (https) console access.

webcache.xml

Install

Webcache Unsecure Port

7779

Used for unsecure (http) console access.

webcache.xml

Install


By default, the service name on the Management Service-side certificate uses the name of the Management Service host. Management Agents do not accept this certificate when they communicate with the Management Service through a load balancer. You must run the following command to regenerate the certificate on both Management Services:

emctl secure -oms -sysman_pwd <sysman_pwd> -reg_pwd <agent_reg_password> -host slb.acme.com -secure_port 1159 -slb_console_port 443

Specifically, you should use the administration tools that are packaged with your load balancer to configure a virtual pool that consists of the hosts and the services that each host provides. To insure a highly available Management Service, you should have two or more Management Services defined within the virtual pool. A sample configuration follows.

Sample Configuration

In this sample, both pools and virtual servers are created.

  1. Create Pools

    Pool abc_upload: Used for secure upload of management data from Management Agents to Management Services Members: hostA:1159, hostB:1159 Persistence: None Load Balancing: round robin Pool abc_genWallet: Used for securing new Management Agents Members: hostA:4889, host B:4889 Persistence: Active HTTP Cookie, method-> insert, expiration 60 minutes Load balancing: round robin Pool abc_uiAccess: Used for secure console access Members: hostA:4444, hostB:4444 Persistence: Simple (also known as Client IP based persistence), timeout-> 3000 seconds (should be greater than the OC4J session timeout of 45 minutes) Load balancing: round robin

  2. Create Virtual Servers

    Virtual Server for secure uploadAddress: slb.acme.com Service: 1159 Pool: abc_upload Virtual Server for Management Agent registrationAddress: slb.acme.com Service: 4889 Pool: abc_genWallet Virtual Server for UI accessAddress: sslb.acme.com Service: https i.e. 443 Pool: abc_uiAccess

Modify the REPOSITORY_URL property in the emd.properties file located in the sysman/config directory of the Management Agent home directory. The host name and port specified must be that of the load balancer virtual service.

See Also:

"Configuring the Management Agent to Use a New Management Service" for more information about modifying the REPOSITORY_URL property for a Management Agent

This configuration allows the distribution of connections from Management Agents equally between Management Services. In the event a Management Service becomes unavailable, the load balancer should be configured to direct connections to the surviving Management Services.

Detecting Unavailable Management Services

To successfully implement this configuration, the load balancer can be configured to monitor the underlying Management Service. On some models, for example, you can configure a monitor on the load balancer. The monitor defines the:

  • HTTP request that is to be sent to a Management Service

  • Expected result in the event of success

  • Frequency of evaluation

For example, the load balancer can be configured to check the state of the Management Service every 5 seconds. On three successive failures, the load balancer can then mark the component as unavailable and no longer route requests to it. The monitor should be configured to send the string GET /em/upload over HTTP and expect to get the response Http XML File receiver. See the following sample monitor configuration.

Sample Monitor Configuration

In this sample, three monitors are configured: mon_upload, mon_genWallet, and mon_uiAccess.

Monitor mon_uploadType: https Interval: 60 Timeout: 181 Send String: GET /em/upload HTTP/1.0\n Receive Rule: Http Receiver Servlet active! Associate with: hostA:1159, hostB:1159

Monitor mon_genWalletType: http Interval: 60 Timeout: 181 Send String: GET /em/genwallet HTTP/1.0\n Receive Rule: GenWallet Servlet activated Associate with: hostA:4889, hostB:4889

Monitor mon_uiAccessType: https Interval: 5 Timeout: 16 Send String: GET /em/console/home HTTP/1.0\nUser-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)\n Receive Rule: /em/console/logon/logon;jsessionid=Associate with: hostA:4444, hostB:4444

Note:

The network bandwidth requirements on the Load Balancer need to be reviewed carefully. Monitor the traffic being handled by the load balancer using the administrative tools packaged with your load balancer. Ensure that the load balancer is capable of handling the traffic passing through it. For example, deployments with a large number of targets can easily exhaust a 100 Mbps Ethernet card. A Gigabit Ethernet card would be required in such cases.

See Also:

Your Load Balancer documentation for more information on configuring virtual pools, load balancing policies, and monitoring network traffic

3.5.4.1 Configuring Oracle HTTP Server When Using a Load Balancer for the Grid Control Console

The Management Service is implemented as a J2EE Web application, which is deployed on an instance of Oracle Application Server. Like many Web-based applications, the Management Service often redirects the client browser to a specific set of HTML pages, such as a logon screen and a specific application component or feature.

When the Oracle HTTP Server redirects a URL, it sends the URL, including the Oracle HTTP Server host name, back to the client browser. The browser then uses that URL, which includes the Oracle HTTP Server host name, to reconnect to the Oracle HTTP Server. As a result, the client browser attempts to connect directly to the Management Service host and bypasses the load balancer.

To prevent the browser from bypassing the load balancer when a URL is redirected, Grid Control will now reconfigure the ssl.conf file as a part of resecuring the Management Service. After the Load Balancer is configured, issue the following command from the Management Server home: emctl secure oms

3.5.4.2 Configuring Console URL

Grid Control sends out notifications and reports using e-mail with links pointing back to the Grid Control Console. When a SLB is configured, the e-mails should contain links pointing the SLB and not the individual Management Service. Go to the Management Services and Repository page on Grid Control Console. Click Add Console URL and specify the SLB virtual service used for UI access.

3.5.4.3 Understanding the Flow of Data When Load Balancing the Grid Control Console

Using a load balancer to manage the flow of data from the Management Agents is not the only way in which a load balancer can help you configure a highly available Grid Control environment. You can also use a load balancer to balance the load and to provide a failover solution for the Grid Control Console

Figure 3-6 shows a typical configuration where a load balancer is used between the Management Agents and multiple Management Services, as well as between the Grid Control Console and multiple Management Services.

Figure 3-6 Load Balancing Between the Grid Control Console and the Management Service

Description of Figure 3-6 follows
Description of "Figure 3-6 Load Balancing Between the Grid Control Console and the Management Service"

In this example, a single load balancer is used for the upload of data from the Management Agents and for the connections between the Grid Control Console and the Management Service.

When you use a load balancer for the Grid Control Console, the management data uses the following paths through the Grid Control architecture:

  1. Administrators use one URL to access the Grid Control Console. This URL directs the browser to the load balancer virtual service. The virtual service redirects the browser to one of the Management Service installations. Depending upon the host name and port selected by the load balancer from the virtual pool of Management Service installations, the Grid Control Console obtains the management data by way of OracleAS Web Cache and the Oracle HTTP Server on one of the Management Service hosts.

  2. Each Management Agent uploads its data to a common load balancer URL (as described in Section 3.5.5.1) and data is written to the shared receive directory.

  3. Each Management Service communicates by way of JDBC with a common Management Repository, just as they do in the multiple Management Service configuration defined in Section 3.4.

  4. Each Management Service communicates directly with each Management Agent by way of HTTP, just as they do in the multiple Management Service configuration defined in Section 3.

3.5.5 Configuring the Management Agent

The final piece of Enterprise Manager High Availability is the Management Agent configuration. Before we jump into the Management Agent configuration, it is worthwhile to note that the Management Agent has high availability built into it out of the box. A 'watchdog' process, created automatically on Management Agent startup, monitors each Management Agent process. In the event of a failure of the Management Agent process, the 'watchdog' will try to automatically re-start the Management Agent process.

Communication between the Management Agent and the Management Service tiers in a default Enterprise Manager Grid Control install is a point-to-point set up. Therefore, the default configuration does not protect from the scenario where the Management Service becomes unavailable. In that scenario, a Management Agent will not be able to upload monitoring information to the Management Service (and to the Management Repository), resulting in the targets becoming unmonitored until that Management Agent is manually configured to point to a second Management Service.

To avoid this situation, use hardware Server Load Balancer (SLB) between the Management Agents and the Management Services. The Load Balancer monitors the health and status of each Management Service and makes sure that the connections made through it are directed to surviving Management Service nodes in the event of any type of failure. As an additional benefit of using SLB, the load balancer can also be configured to manage user communications to Enterprise Manager. The Load Balancer handles this through the creation of 'pools' of available resources.

  • Configure the Management Agent to Communicate Through SLB

    The load balancer provides a virtual IP address that all Management Agents can use. Once the load balancer is setup, the Management Agents need to be configured to route their traffic to the Management Service through the SLB. This can be achieved through a couple of property file changes on the Management Agents.

    Resecure all Management Agents - Management Agents that were installed prior to SLB setup would be uploading directly to the Management Service. These Management Agents will not be able to upload after SLB is setup. Resecure these Management Agents to upload to the SLB by running the following command on each Management Agent. This command is available for Management Agents available for Enterprise Manager release 10.2.0.5. Prior to this release, you must manually edit the emd.properties file and use the secure agent command to secure the Management Agent.

    emctl secure agent -emdWalletSrcUrl https://slb.example.com:<upload port>/em

  • Configure the Management Agent to Allow Retrofitting a SLB

    Some installations may not have access to a SLB during their initial install, but may foresee the need to add one later. If that is the case, consider configuring the Virtual IP address that will be used for the SLB as apart of the initial installation and having that IP address point to an existing Management Service. Secure communications between Management Agents and Management Services are based on the host name. If a new host name is introduced later, each Management Agent will not have to be re-secured as it is configured to point to the new Virtual IP maintained by the SLB.

3.5.5.1 Load Balancing Connections Between the Management Agent and the Management Service

Before you implement a plan to protect the flow of management data from the Management Agents to the Management Service, you should be aware of some key concepts.

Specifically, Management Agents do not maintain a persistent connection to the Management Service. When a Management Agent needs to upload collected monitoring data or an urgent target state change, the Management Agent establishes a connection to the Management Service. If the connection is not possible, such as in the case of a network failure or a host failure, the Management Agent retains the data and reattempts to send the information later.

To protect against the situation where a Management Service is unavailable, you can use a load balancer between the Management Agents and the Management Services.

However, if you decide to implement such a configuration, be sure to understand the flow of data when load balancing the upload of management data.

Figure 3-7 shows a typical scenario where a set of Management Agents upload their data to a load balancer, which redirects the data to one of two Management Service installations.

Figure 3-7 Load Balancing Between the Management Agent and the Management Service

Description of Figure 3-7 follows
Description of "Figure 3-7 Load Balancing Between the Management Agent and the Management Service"

In this example, only the upload of Management Agent data is routed through the load balancer. The Grid Control Console still connects directly to a single Management Service by way of the unique Management Service upload URL. This abstraction allows Grid Control to present a consistent URL to both Management Agents and Grid Control consoles, regardless of the loss of any one Management Service component.

When you load balance the upload of Management Agent data to multiple Management Service installations, the data is directed along the following paths:

  1. Each Management Agent uploads its data to a common load balancer URL. This URL is defined in the emd.properties file for each Management Agent. In other words, the Management Agents connect to a virtual service exposed by the load balancer. The load balancer routes the request to any one of a number of available servers that provide the requested service.

  2. Each Management Service, upon receipt of data, stores it temporarily in a local file and acknowledges receipt to the Management Agent. The Management Services then coordinate amongst themselves and one of them loads the data in a background thread in the correct chronological order.

    Also, each Management Service communicates by way of JDBC with a common Management Repository, just as they do in the multiple Management Service configuration defined in Section 3.4.

  3. Each Management Service communicates directly with each Management Agent by way of HTTP, just as they do in the multiple Management Service configuration defined in Section 3.4.

3.5.6 Disaster Recovery

While high availability typically protects against local outages such as application failures or system-level problems, disaster tolerance protects against larger outages such as catastrophic data-center failure due to natural disasters, fire, electrical failure, evacuation, or pervasive sabotage. For Maximum Availability, the loss of a site cannot be the cause for outage of the management tool that handles your enterprise.

Maximum Availability Architecture for Enterprise Manager mandates deploying a remote failover architecture that allows a secondary datacenter to take over the management infrastructure in the event that disaster strikes the primary management infrastructure.

Figure 3-8 Disaster Recovery Architecture

Surrounding text describes Figure 3-8 .

As can be seen in Figure 3-8, setting up disaster recovery for Enterprise Manager essentially consists of installing a standby RAC, a standby Management Service and a standby Server Load Balancer and configuring them to automatically startup when the primary components fail.

The following sections lists the best practices to configure the key Enterprise Manager components for disaster recovery:

3.5.6.1 Prerequisites

The following prerequisites must be satisfied.

  • The primary site must be configured as per Grid Control MAA guidelines described in previous sections. This includes Management Services fronted by an SLB and all Management Agents configured to upload to Management Services by the SLB.

  • The standby site must be similar to the primary site in terms of hardware and network resources to ensure there is no loss of performance when failover happens.

  • There must be sufficient network bandwidth between the primary and standby sites to handle peak redo data generation.

  • Configure shared storage used for shared filesystem loader and software library to be replicated at the primary and standby site. In the event of a site outage, the contents of this shared storage must be made available on the standby site using hardware vendor disk level replication technologies.

  • For complete redundancy in a disaster recovery environment, a second load balancer must be installed at the standby site. The secondary SLB must be configured in the same fashion as the primary. Some SLB vendors (such as F5 Networks) offer additional services that can be used to pass control of the Virtual IP presented by the SLB on the primary site to the SLB on the standby site in the event of a site outage. This can be used to facilitate automatic switching of Management Agent traffic from the primary site to the standby site.

3.5.6.2 Setup Standby Database

As described earlier, the starting point of this step is to have the primary site configured as per Grid Control MAA guidelines. The following steps will lay down the procedure for setting up the standby Management Repository database.

  1. Prepare Standby Management Repository hosts for Data Guard

    Install a Management Agent on each of the standby Management Repository hosts. Configure the Management Agents to upload by the SLB on the primary site.

    Install CRS and Database software on the standby Management Repository hosts. The version used must be the same as that on the primary site.

  2. Prepare Primary Management Repository database for Data Guard

    In case the primary Management Repository database is not already configured, enable archive log mode, setup flash recovery area and enable flashback database on the primary Management Repository database.

  3. Create Physical Standby Database

    In Enterprise Manager, the standby Management Repository database must be physical standbys. Use the Enterprise Manager Console to setup a physical standby database in the standby environment prepared in previous steps. Note that Enterprise Manager Console does not support creating a standby RAC database. If the standby database has to be RAC, configure the standby database using a single instance and then use the Convert to RAC option from Enterprise Manager Console to convert the single instance standby database to RAC. Also, note that during single instance standby creation, the database files should be created on shared storage to facilitate conversion to RAC later.

  4. Add Static Service to Listener

    To enable Data Guard to restart instances during the course of broker operations, a service with a specific name must be statically registered with the local listener of each instance. The value for the GLOBAL_DBNAME attribute must be set to a concatenation of <db_unique_name>_DGMGRL.<db_domain>. For example, in the LISTENER.ORA file:

    SID_LIST_LISTENER=(SID_LIST=(SID_DESC=(SID_NAME=sid_name)
         (GLOBAL_DBNAME=db_unique_name_DGMGRL.db_domain)
         (ORACLE_HOME=oracle_home)))
    
  5. Enable Flashback Database on the Standby Database

  6. Verify Physical Standby

    Verify the Physical Standby database through the Enterprise Manager Console. Click the Log Switch button on the Data Guard page to switch log and verify that it is received and applied to the standby database.

3.5.6.3 Setup Standby Management Service

The following considerations should be noted before installing the standby Management Services.

  • It is recommended that this activity be done during a lean period or during a planned maintenance window. When new Management Services are installed on the standby site, they are initially configured to connect to the Management Repository database on the primary site. Some workload will be taken up by the new Management Service. This could result in temporary loss in performance if the standby site Management Services are located far away from the primary site Management Repository database. However there would be no data loss and the performance would recover once the standby Management Services are shutdown post configuration.

  • The shared storage used for the shared filesystem loader and software library must be made available on the standby site using the same paths as the primary site.

Install the Management Service software using steps documented in the Oracle Enterprise Manager Grid Control Installation and Basic Configuration Guide. Refer to the Installing Software-Only and Configuring Later - Additional Management Service option. Specify the primary database connection settings in the response file. Once you have the Management Service installed, use the following steps to copy over the configuration from any the primary Management Service.

  1. Export the configuration from the primary Management Service using:

    emctl exportconfig oms -dir <location for the export file>

  2. Copy over the exported file to the standby Management Service

  3. Shutdown the standby Management Service

  4. Import the exported configuration on the standby Management Service using: emctl importconfig oms -file <full path of the export file>

  5. Make the standby Management Service point to the standby Management Repository database by updating the emoms.properties file: oracle.sysman.eml.mntr.emdRepConnectDescriptor=<connect descriptor of standby database>

  6. Setup EMCLI on the standby Management Service using the URL of the primary SLB using emcli setup -url=https://slb.example.com/em -username sysman -password <sysman password> -nocertvalidate

  7. Resecure the Management Agent that is installed with the standby Management Service to upload to primary SLB using emctl secure agent -emdWalletSrcUrl https://slb.example.com:<upload port>/em

  8. Update the standby SLB configuration by adding the standby Management Service to the different pools on the SLB. Setup monitors for the new Management Service. Modify the ssl.conf file to set the Port directive to the SLB virtual port used for UI access.

Repeat the previous steps for setting up an additional standby Management Service.

Note:

To monitor a standby database completely, the user monitoring the database must have SYSDBA privileges. This privilege is required because the standby database is in a mounted-only state. A best practice is to ensure that the users monitoring the primary and standby databases both have SYSDBA privileges.

3.5.6.4 Switchover

Switchover is a planned activity where operations are transferred from the Primary site to a Standby site. This is usually done for testing and validation of Disaster Recovery (DR) scenarios and for planned maintenance activities on the primary infrastructure. This section describes the steps to switchover to the standby site. The same procedure is applied to switchover in either direction.

Enterprise Manager Console cannot be used to perform switchover of the Management Repository database. The Data Guard Broker command line tool DGMGRL should be used instead.

  1. Prepare Standby Database

    Verify that recovery is up-to-date. Using the Enterprise Manager Console, you can view the value of the ApplyLag column for the standby database in the Standby Databases section of the Data Guard Overview Page.

  2. Shutdown the Primary Enterprise Manager Application Tier

    Shutdown all the Management Services in the primary site by running the following command on each Management Service: opmnctl stopall

    Shutdown the Enterprise Manager jobs running in Management Repository database: - alter system set job_queue_processes=0;

  3. Verify Shared Loader Directory / Software Library Availability

    Ensure all files from the primary site are available on the standby site.

  4. Switchover to the Standby Database

    Use DGMGRL to perform a switchover to the standby database. The command can be run on the primary site or the standby site. The switchover command verifies the states of the primary database and the standby database, affects switchover of roles, restarts the old primary database, and sets it up as the new standby database.

    SWITCHOVER TO <standby database name>;

    Verify the post switchover states:

    SHOW CONFIGURATION;
    SHOW DATABASE <primary database name>;
    SHOW DATABASE <standby database name>;
    
  5. Startup the Enterprise Manager Application Tier

    Startup all the Management Services on the standby site: opmnctl startall

    Startup the Enterprise Manager jobs running in Management Repository database on the standby site (the new primary database) - alter system set job_queue_processes=10;

  6. Relocate Management Services and Management Repository target

    The Management Services and Management Repository target is monitored by a Management Agent on one of the Management Services on the primary site. To ensure that the target is monitored after switchover/failover, relocate the target to a Management Agent on the standby site by running the following command on one of the Management Service standby sites.

    emctl config emrep -agent <agent name> -conn_desc

  7. Switchover to Standby SLB

    Make appropriate network changes to failover your primary SLB to standby SLB that is, all requests should now be served by the standby SLB without requiring any changes on the clients (browser and Management Agents).

This completes the switchover operation. Access and test the application to ensure that the site is fully operational and functionally equivalent to the primary site. Repeat the same procedure to switchover in the other direction.

3.5.6.5 Failover

A standby database can be converted to a primary database when the original primary database fails and there is no possibility of recovering the primary database in a timely manner. This is known as a manual failover. There may or may not be data loss depending upon whether your primary and target standby databases were synchronized at the time of the primary database failure.

This section describes the steps to failover to a standby database, recover the Enterprise Manager application state by resyncing the Management Repository database with all Management Agents, and enabling the original primary database as a standby using flashback database.

The word manual is used here to contrast this type of failover with a fast-start failover described later in Section 3.5.6.6, "Automatic Failover".

  1. Verify Shared Loader Directory and Software Library Availability

    Ensure all files from the primary site are available on the standby site.

  2. Failover to Standby Database

    Shutdown the database on the primary site. Use DGMGRL to connect to the standby database and execute the FAILOVER command: FAILOVER TO <standby database name>;

    Verify the post failover states:

    SHOW CONFIGURATION;
    SHOW DATABASE <primary database name>;
    SHOW DATABASE <standby database name>;
    

    Note that after the failover completes, the original primary database cannot be used as a standby database of the new primary database unless it is re-enabled.

  3. Resync the New Primary Database with Management Agents

    Skip this step if you are running in Data Guard Maximum Protection or Maximum Availability level as there is no data loss on failover. On the other hand, if there is data loss, you need to synchronize the new primary database with all Management Agents.

    On any one Management Service on the standby site, run the following command:

    emctl resync repos -full -name "<name for recovery action>"

    This command submits a resync job that would be executed on each Management Agent when the Management Services on the standby site are brought up.

    Repository resynchronization is a resource intensive operation. A well tuned Management Repository will help significantly to complete the operation as quickly as possible. Refer to Chapter 12, "Sizing and Maximizing the Performance of Oracle Enterprise Manager" for guidelines on routine housekeeping jobs that keep your site running well. Specifically if you are not routinely coalescing the IOTs/indexes associated with Advanced Queueing tables as described in My Oracle Support note 271855.1, running the procedure before resync will significantly help the resync operation to complete faster.

  4. Startup the Enterprise Manager Application Tier

    Startup all the Management Services on the standby site by running the following command on each Management Service.

    opmnctl startall

  5. Relocate Management Services and Management Repository target

    The Management Services and Management Repository target is monitored by a Management Agent on one of the Management Services on the primary site. To ensure that target is monitored after switchover/failover, relocate the target to a Management Agent on the standby site by running the following command on one of the standby site Management Services.

    emctl config emrep -agent <agent name> -conn_desc

  6. Switchover to Standby SLB

    Make appropriate network changes to failover your primary SLB to the standby SLB, that is, all requests should now be served by the standby SLB without requiring any changes on the clients (browser and Management Agents).

  7. Establish Original Primary Database as Standby Database Using Flashback

    Once access to the failed site is restored and if you had flashback database enabled, you can reinstate the original primary database as a physical standby of the new primary database.

    • Shutdown all the Management Services in original primary site:

      opmnctl stopall

    • Restart the original primary database in mount state:

      shutdown immediate;

      startup mount;

    • Reinstate the Original Primary Database

      Use DGMGRL to connect to old primary database and execute the REINSTATE command

      REINSTATE DATABASE <old primary database name>;

    • The newly reinstated standby database will begin serving as standby database to the new primary database.

    • Verify the post reinstate states:

      SHOW CONFIGURATION;
      SHOW DATABASE <primary database name>;
      SHOW DATABASE <standby database name>;
      
  8. Navigate to the Management Services and Repository Overview page of Grid Control Console. Under Related Links, click Repository Synchronization. This page shows the progress of the resynchronization operation on a per Management Agent basis. Monitor the progress.

    Operations that fail should be resubmitted manually from this page after fixing the error mentioned. Typically, communication related errors are caused by Management Agents being down and can be fixed by resubmitting the operation from this page after restarting the Management Agent.

    For Management Agents that cannot be started due to some reason, for example, old decommissioned Management Agents, the operation should be stopped manually from this page. Resynchronization is deemed complete when all the jobs have a completed or stopped status.

This completes the failover operation. Access and test the application to ensure that the site is fully operational and functionally equivalent to the primary site. Do a switchover procedure if the site operations have to be moved back to the original primary site.

3.5.6.6 Automatic Failover

This section details the steps to achieve complete automation of failure detection and failover procedure by utilizing Fast-Start Failover and Observer process. At a high level the process works like this:

  • Fast-Start Failover (FSFO) determines that a failover is necessary and initiates a failover to the standby database automatically

  • When the database failover has completed the DB_ROLE_CHANGE database event is fired

  • The event causes a trigger to be fired which calls a script that configures and starts Enterprise Manager Application Tier

Perform the following steps:

  1. Develop Enterprise Manager Application Tier Configuration and Startup Script

    Develop a script that will automate the Enterprise Manager Application configuration and startup process. See the sample shipped with Grid Control in the OH/sysman/ha directory. A sample script for the standby site is included here and should be customized as needed. Make sure ssh equivalence is setup so that remote shell scripts can be executed without password prompts. Place the script in a location accessible from the standby database host. Place a similar script on the primary site.

    #!/bin/sh
    # Script: /scratch/EMSBY_start.sh
    # Primary Site Hosts
    # Repos: earth, OMS: jupiter1, jupiter2
    # Standby Site Hosts
    # Repos: mars, # OMS: saturn1, saturn2
    LOGFILE="/net/mars/em/failover/em_failover.log"
    OMS_ORACLE_HOME="/scratch/OracleHomes/em/oms10g"
    CENTRAL_AGENT="saturn1.example.com:3872"
     
    #log message
    echo "###############################" >> $LOGFILE
    date >> $LOGFILE
    echo $OMS_ORACLE_HOME >> $LOGFILE
    id >>  $LOGFILE 2>&1
     
    #startup all OMS
    #Add additional lines, one each per OMS in a multiple OMS setup
    ssh orausr@saturn1 "$OMS_ORACLE_HOME/opmn/bin/opmnctl startall" >>  $LOGFILE 2>&1
    ssh orausr@saturn2 "$OMS_ORACLE_HOME/opmn/bin/opmnctl startall" >>  $LOGFILE 2>&1
     
    #relocate Management Services and Repository target
    #to be done only once in a multiple OMS setup
    #allow time for OMS to be fully initialized
    ssh orausr@saturn1 "$OMS_ORACLE_HOME/bin/emctl config emrep -agent $CENTRAL_AGENT -conn_desc -sysman_pwd <password>" >> $LOGFILE 2>&1
     
    #always return 0 so that dbms scheduler job completes successfully
    exit 0
    
  2. Automate Execution of Script by Trigger

    Create a database event "DB_ROLE_CHANGE" trigger, which fires after the database role changes from standby to primary. See the sample shipped with Grid Control in OH/sysman/ha directory.

    --
    --
    -- Sample database role change trigger
    --
    --
    CREATE OR REPLACE TRIGGER FAILOVER_EM
    AFTER DB_ROLE_CHANGE ON DATABASE
    DECLARE
        v_db_unique_name varchar2(30);
        v_db_role varchar2(30);
    BEGIN
        select upper(VALUE) into v_db_unique_name
        from v$parameter where NAME='db_unique_name';
        select database_role into v_db_role
        from v$database;
     
        if v_db_role = 'PRIMARY' then
     
          -- Submit job to Resync agents with repository
          -- Needed if running in maximum performance mode
          -- and there are chances of data-loss on failover
          -- Uncomment block below if required
          -- begin
          --  SYSMAN.setemusercontext('SYSMAN', SYSMAN.MGMT_USER.OP_SET_IDENTIFIER);
          --  SYSMAN.emd_maintenance.full_repository_resync(('AUTO-FAILOVER to '||v_db_unique_name);
          --  SYSMAN.setemusercontext('SYSMAN', SYSMAN.MGMT_USER.OP_CLEAR_IDENTIFIER);
          -- end;
     
          -- Start the EM mid-tier
          dbms_scheduler.create_job(
              job_name=>'START_EM',
              job_type=>'executable',
              job_action=> '<location>' || v_db_unique_name|| '_start_oms.sh',
              enabled=>TRUE
          );
        end if;
    EXCEPTION
    WHEN OTHERS
    THEN
        SYSMAN.mgmt_log.log_error('LOGGING', SYSMAN.MGMT_GLOBAL.UNEXPECTED_ERR,
    SYSMAN.MGMT_GLOBAL.UNEXPECTED_ERR_M || 'EM_FAILOVER: ' ||SQLERRM);
    END;
    /
     
    

    Note:

    Based on your deployment, you might require additional steps to synchronize and automate the failover of SLB and shared storage used for loader receive directory and software library. These steps are vendor specific and beyond the scope of this document. One possibility is to invoke these steps from the Enterprise Manager Application Tier startup and configuration script.
  3. Configure Fast-Start Failover and Observer

    Use the Fast-Start Failover configuration wizard in Enterprise Manager Console to enable FSFO and configure the Observer.

    This completes the setup of automatic failover.

3.6 Installation Best Practices for Enterprise Manager High Availability

The following sections document best practices for installation and configuration of each Grid Control component.

3.6.1 Configuring the Management Agent to Automatically Start on Boot and Restart on Failure

The Management Agent is started manually. It is important that the Management Agent be automatically started when the host is booted to insure monitoring of critical resources on the administered host. To that end, use any and all operating system mechanisms to automatically start the Management Agent. For example, on UNIX systems this is done by placing an entry in the UNIX /etc/init.d that calls the Management Agent on boot or by setting the Windows service to start automatically.

3.6.2 Configuring Restart for the Management Agent

Once the Management Agent is started, the watchdog process monitors the Management Agent and attempts to restart it in the event of a failure. The behavior of the watchdog is controlled by environment variables set before the Management Agent process starts. The environment variables that control this behavior follow. All testing discussed here was done with the default settings.

  • EM_MAX_RETRIES – This is the maximum number of times the watchdog will attempt to restart the Management Agent within the EM_RETRY_WINDOW. The default is to attempt restart of the Management Agent 3 times.

  • EM_RETRY_WINDOW - This is the time interval in seconds that is used together with the EM_MAX_RETRIES environmental variable to determine whether the Management Agent is to be restarted. The default is 600 seconds.

The watchdog will not restart the Management Agent if the watchdog detects that the Management Agent has required restart more than EM_MAX_RETRIES within the EM_RETRY_WINDOW time period.

3.6.3 Installing the Management Agent Software on Redundant Storage

The Management Agent persists its intermediate state and collected information using local files in the $AGENT_HOME/$HOSTNAME/sysman/emd sub tree under the Management Agent home directory.

In the event that these files are lost or corrupted before being uploaded to the Management Repository, a loss of monitoring data and any pending alerts not yet uploaded to the Management Repository occurs.

At a minimum, configure these sub-directories on striped redundant or mirrored storage. Availability would be further enhanced by placing the entire $AGENT_HOME on redundant storage. The Management Agent home directory is shown by entering the command 'emctl getemhome' on the command line, or from the Management Services and Repository tab and Agents tab in the Grid Control console.

3.6.4 Install the Management Service Shared File Areas on Redundant Storage

The Management Service contains results of the intermediate collected data before it is loaded into the Management Repository. The loader receive directory contains these files and is typically empty when the Management Service is able to load data as quickly as it is received. Once the files are received by the Management Service, the Management Agent considers them committed and therefore removes its local copy. In the event that these files are lost before being uploaded to the Management Repository, data loss will occur. At a minimum, configure these sub-directories on striped redundant or mirrored storage. When Management Services are configured for the Shared Filesystem Loader, all services share the same loader receive directory. It is recommended that the shared loader receive directory be on a clustered file system like NetApps Filer.

3.7 Configuration With Grid Control

Grid Control comes preconfigured with a series of default rules to monitor many common targets. These rules can be extended to monitor the Grid Control infrastructure as well as the other targets on your network to meet specific monitoring needs.

3.7.1 Console Warnings, Alerts, and Notifications

The following list is a set of recommendations that extend the default monitoring performed by Enterprise Manager. Use the Notification Rules link on the Preferences page to adjust the default rules provided on the Configuration/Rules page:

  • Ensure the Agent Unreachable rule is set to alert on all Management Agents unreachable and Management Agents clear errors.

  • Ensure the Repository Operations Availability rule is set to notify on any unreachable problems with the Management Service or Management Repository nodes. Also modify this rule to alert on the Targets Not Providing Data condition and any database alerts that are detected against the database serving as the Management Repository.

Modify the Agent Upload Problems Rule to alert when the Management Service status has hit a warning or clear threshold.

3.7.2 Configure Additional Error Reporting Mechanisms

Enterprise Manager provides error reporting mechanisms through e-mail notifications, PL/SQL packages, and SNMP alerts. Configure these mechanisms based on the infrastructure of the production site. If using e-mail for notifications, configure the notification rule through the Grid Control console to notify administrators using multiple SMTP servers if they are available. This can be done by modifying the default e-mail server setting on the Notification Methods option under Setup.

3.7.3 Component Backup

Backup procedures for the database are well established standards. Configure backup for the Management Repository using the RMAN interface provided in the Grid Control console. Refer to the RMAN documentation or the Maximum Availability architecture document for detailed implementation instructions.

In addition to the Management Repository, the Management Service and Management Agent should also have regular backups. Backups should be performed after any configuration change. Best practices for backing up these tiers are documented in the section, Section 12.3, "Oracle Enterprise Manager Backup, Recovery, and Disaster Recovery Considerations".

3.7.4 Troubleshooting

In the event of a problem with Grid Control, the starting point for any diagnostic effort is the console itself. The Management System tab provides access to an overview of all Management Service operations and current alerts. Other pages summarize the health of Management Service processes and logged errors These pages are useful for determining the causes of any performance problems as the summary page shows at a historical view of the amount of files waiting to be loaded to the Management Repository and the amount of work waiting to be completed by Management Agents.

3.7.4.1 Upload Delay for Monitoring Data

When assessing the health and availability of targets through the Grid Control console, information is slow to appear in the UI, especially after a Management Service outage. The state of a target in the Grid Control console may be delayed after a state change on the monitored host. Use the Management System page to gauge backlog for pending files to be processed.

3.7.4.2 Notification Delay of Target State Change

The model used by the Management Agent to assess the state of health for any particular monitored target is poll based. Management Agents immediately post a notification to the Management Service as soon as a change in state is detected. This infers that there is some potential delay for the Management Agent to actually detect a change in state.