| |
| Sun Java System Application Server Standard and Enterprise Edition 7 2004Q2 Update 2 System Deployment Guide | |
Chapter 2
Planning your EnvironmentPlanning your environment is one of the first phases of deployment. In this phase, you should first decide your performance and availability goals, and then accordingly make decisions about the hardware, network, and storage requirements.
The main objective of this phase is to determine the environment that best meets your business requirements.
This chapter contains the following sections:
Introducing HADBThe High Availability Database (HADB) provides a highly available persistence store for the Application Server for HTTP sessions, stateful session beans, and remote references of EJB look-ups on the RMI/IIOP path.
This section contains the following topics:
Overview
J2EE applications’ need for session persistence was previously described in the section Understanding Session Persistence. The Application Server uses the HADB as a highly available session store. The HADB is included with the Application Server Enterprise Edition, but in deployment can be run on separate hosts. HADB provides a highly available data store for HTTP session and stateful session bean data.
The advantages of this decoupled architecture include:
- Server instances in a highly available cluster are loosely coupled and act as high performance J2EE containers.
- Starting or stopping server instances does not affect other servers or their availability.
- The HADB can run on a different set of less expensive machines (for example, with single or dual processors). Several clusters can share these machines. Depending upon the deployment needs, you can run the HADB on the same machines as Application Server (co-located) or different machines (separate tier). For more information on the two options, see Co-located Topology and Separate Tier Topology in Chapter 3, "Selecting a Topology."
- As state management requirements change, you can add resources to the HADB system without affecting existing clusters or their applications.
For the HADB hardware and network system requirements, see the Sun Java System Application Server Release Notes. For additional system configuration steps required for HADB, see Sun Java System Application Server Administration Guide.
System Requirements
The recommended system requirements for the HADB hosts are the following:
For additional requirements for very high availability, see Mitigating Double Failures.
HADB Architecture
HADB is a distributed system comprising pairs of nodes, which are divided into two data redundancy units (DRUs). Each node consists of the following:
A set of HADB nodes can host one or more session databases. Each session database is associated with a distinct application server cluster. Deleting a cluster also deletes the associated session database.
For HADB hardware requirements, see the Sun Java System Application Server Release Notes.
Nodes and Node Processes
There are two types of HADB nodes:
Each node has a parent process and numerous child processes. The parent process, called the node supervisor (NSUP) is started by the management agent and is responsible for creating the child processes and keeping them running.
The child processes are:
- Transaction server process (TRANS), that coordinates transactions on distributed nodes, and manages data storage.
- Relational algebra server process (RELALG) that coordinates and executes complex relational algebra queries like sorts and joins.
- SQL shared memory server process (SQLSHM) that maintains the SQL dictionary cache.
- SQL server process (SQLC), that receives client queries, compiles them into local HADB instructions, sends the instructions to TRANS, receives the results and conveys them to the client. Each node has one main SQL server and one sub server for each client connection.
- Node manager server process (NOMAN) that management agents use to execute management commands issued by the hadbm management client.
Data Redundancy Units
An HADB instance contains a pair of DRUs. Each DRU has the same number of active and spare nodes as the other DRU in the pair. Each active node in a DRU has a mirror node in the other DRU. Due to mirroring, each DRU contains a complete copy of the database.
Figure 2-1 shows an example HADB architecture with six nodes: four active nodes and two spare nodes. Nodes 0 and 1 are a mirror node pair, as are nodes 2 and 3. In this example, each host has one node. In general, a host can have more than one node if it has sufficient system resources (see System Requirements).
Figure 2-1 Sample HADB Configuration with Double Interconnects
HADB achieves high availability by replicating data and services. The data replicas on mirror nodes are designated as primary replicas and hot standby replicas. The primary replica performs operations such as inserts, deletes, updates, and reads. The hot standby replica receives log records of the primary replica’s operations and redoes them within the transaction life time. Read operations are performed only by the primary node and thus not logged. Each node contains both primary and hot standby replicas and plays both roles.
The database is fragmented and distributed over the active nodes in a DRU. Each active node in a DRU has a mirror node on the other DRU. The mirror node pair contains the replicas of the same data fragment. Since a spare node does not have data, it does not have a mirror node. Due to the mirroring, each DRU contains a complete copy of the database.When a mirror node takes over the functions of a failed node, it has to perform double the work: its own and that of the failed node. If the mirror node does not have sufficient resources, the overload will reduce its performance and increase its failure probability. When a node fails, HADB attempts to restart it. If the failed node does not restart (for example, due to hardware failure), the system continues to operate but with reduced availability.
HADB tolerates failure of a node, an entire DRU, or multiple non-mirror nodes, but not a double failure when both a node and its mirror fail. For information on how to reduce the likelihood of a double failure, see Mitigating Double Failures.
Spare Nodes
When a node fails, its mirror node takes over. If the failed node does not have a spare node, then at this point, the failed node will not have a mirror. A spare node will automatically replace a failed node’s mirror. Having a spare node reduces the time the system functions without a mirror node.
A spare node does not normally contain data, but constantly monitors for failure of active nodes in the DRU. When a node fails and does not recover within a specified timeout period, the spare node copies data from the mirror node and synchronizes with it. The time this takes depends on the amount of data copied and the system and network capacity. After synchronizing, the spare node automatically replaces the mirror node without manual intervention, thus relieving the mirror node from overload, thus balancing load on the mirrors. This is known as failback or self-healing.
When a failed host is repaired (by shifting the hardware or upgrading the software) and restarted, the node or nodes running on it join the system as spare nodes, since the original spare nodes are now active.
Spare nodes are not mandatory, but they enable a system to maintain its overall level of service even if a machine fails. Spare nodes also make it easy to perform planned maintenance on machines hosting active nodes. Allocate one machine for each DRU to act as a spare machine, so that if one of the machines fails, the HADB system continues without adversely affecting performance and availability.
Note
As a general rule, you should have a spare machine with enough Application Server instances and HADB nodes to replace any machine that becomes unavailable.
Examples of Spare Node Configurations
The following examples illustrate using spare nodes in HADB deployments. There are two fundamental categories of deployment topology: co-located, in which HADB and Application Servers reside on the same hosts, and separate tier, in which they reside on separate hosts. For more information about deployment topologies, see Chapter 3, "Selecting a Topology."
Example: Co-located Spare Node Configuration
Suppose you have a co-located deployment, with four Sun FireTM V480 servers where each server has one Application Server instance and two HADB data nodes.
In this scenario, you should allocate two more servers as spare machines (one machine per DRU). Each spare machine should run one application server instance and two spare HADB nodes.
Example: Separate-tier Spare Node Configuration
Suppose you have a separate-tier deployment where the HADB tier has two Sun FireTM 280R servers, each running two HADB data nodes. To maintain this system at full capacity, even if one machine becomes unavailable, configure one spare machine for the Application Server instances tier and one spare machine for the HADB tier.
The spare machine for the Application Server instances tier should have as many instances as the other machines in the Application Server instances tier. Similarly, the spare machine for the HADB tier should have as many HADB nodes as the other machines in the HADB tier.
For more information about the co-located and the separate tier deployment topologies, see Chapter 3, "Selecting a Topology."
Mitigating Double Failures
HADB’s built-in data replication enables it to tolerate failure of a single node or DRU. However, by default, HADB will not survive a double failure, when a mirror node pair or both DRUs fail. In such cases, HADB becomes unavailable.
In addition to using spare nodes as described in the previous section, you can minimize the likelihood of a double failure by:
- Providing independent power supplies. For optimum fault tolerance, the servers that support one DRU must have independent power (through uninterruptible power supplies), processing units, and storage. If a power failure occurs in one DRU, the nodes in the other DRU continue servicing requests until the power returns.
- Providing double interconnections. To tolerate single network failures, replicate the lines and switches as shown in Figure 2-1.
These steps are optional, but will increase the overall availability of the HADB instance.
HADB Management System
The HADB management system provides built-in security and facilitates multi-platform management. As illustrated in Figure 2-2, the HADB management architecture contains the following components:
As shown in Figure 2-2, one HADB management agent runs on every machine that runs the HADB service. Each machine typically hosts one or more HADB nodes. An HADB management domain contains many machines, similar to an Application Server domain. At least two machines are required in a domain for the database to be fault tolerant, and in general there must be an even number of machines to form the DRU pairs. Thus, a domain contains many management agents.
Figure 2-2 HADB Management Architecture
As shown in the figure, a domain can contain one or more database instances. One machine can contain one or more nodes belonging to one or more database instances.
Management Client
The HADB management client is a command-line utility, hadbm, for managing the HADB domain and its database instances. HADB services can run continuously— even when the associated Application Server cluster is stopped—but must be shut down carefully if they are to be deleted. For more information on using hadbm, see the Sun Java System Application Server Administration Guide.
You can use the asadmin command line utility to create and delete the HADB instance associated with a highly available cluster. For more information, see the Sun Java System Application Server Administration Guide.
Management Agent
The management agent is a server process (named ma) that can access resources on a host; for example, it can create devices and start database processes. The management agent coordinates and performs management client commands, such as starting or stopping a database instance.
A management client instance connects to a management agent by specifying the address and port number of the agent. Once connected, the management client sends commands to the HADB through the management agent. The agent receives requests and executes them. Thus, a management agent must be running on a host before issuing any hadbm management commands to that host. The management agent can be configured as a system service that starts up automatically.
Ensuring availability of management agents
The management agent process ensures the availability of the HADB node supervisor processes by restarting them if they fail. Thus, for deployment, you must ensure the availability of the ma process to maintain the overall availability of HADB. After restarting, the management agent recovers the domain and database configuration data from other agents in the domain.
Use the host operating system to ensure the availability of the management agent. On Solaris or Linux, init.d ensures the availability of the ma process after a process failure and reboot of the operating system. On Windows, the management agent runs as a Windows service. Thus, the operating system restarts the management agent if the agent fails or the operating system reboots.
Management Domains
An HADB management domain is a set of hosts, each of which has a management agent running on the same port number. The hosts in a domain can contain one or more HADB database instances. A management domain is defined by the common port number the agents use and an identifier (called a domainkey) that is generated when you create or the domain or add an agent to it. The domainkey provides a unique identifier for the domain, which is crucial because management agents communicate using multicast. You can set up an HADB management domain to match an Application Server domain.
Having multiple database instances in one domain can be useful in a development environment, since it enables different developer groups to use their own database instance. In some cases, it may also be useful in production environments.
All agents belonging to a domain coordinate their management operations. When you change the database configuration through an hadbm command, all agents will change the configuration accordingly. You cannot stop or restart a node unless the management agent on the node's host is running. However, you can execute hadbm commands that read HADB state or configuration variable values even if some agents are not available.
Use the following management client commands to work with management domains:
- hadbm createdomain: creates a management domain with the specified hosts.
- hadbm extenddomain: adds hosts to an existing management domain.
- hadbm deletedomain: removes a management domain.
- hadbm reducedomain: removes hosts from the management domain.
- hadbm listdomain: lists all hosts defined in the management domain.
For more information on these commands, see the Sun Java Application Server Reference Manual (or the corresponding man pages).
Repository
Management agents store the database configuration in a repository. The repository is highly fault-tolerant, because it is replicated over all the management agents. Keeping the configuration on the server enables you to perform management operations from any computer that has a management client installed.
A majority of the management agents in a domain must be running to perform any changes to the repository. Thus, if there are M agents in a domain, at least M/2 + 1 agents (rounded down to the nearest integer) must be running to make a change to the repository.
If you cannot perform some management commands because a majority of the hosts in a domain are unavailable (for example due to hardware failures), use the hadbm disablehost command to remove failed hosts from the domain until you have a majority. For more information on this command, see the Sun Java System Application Server Utility Reference Guide.
Setup and Configuration RoadmapFollow this procedure to setup and configure your Application Server system for high availability:
- Determine your performance and QoS requirements and goals, as described later in this chapter.
- Size your system, as described in Design Decisions later in this chapter. In particular, determine:
- Determine system topology, as described in Chapter 3, "Selecting a Topology," that is, whether you are going to install HADB on the same host machines as Application Server or on different machines.
- Install Application Server instances
- Create domains and clusters
- Install and configure your web server software.
- Install the Load Balancer Plug-in.
- Set up and configure load balancing
- Set up and configure HADB nodes and DRUs
- Configure Application Server Web container and EJB container for HA session persistence.
- Deploy applications and configure them for high availability and session failover.
- Configure JMS cluster for failover. For more information, see the Sun Java System Message Queue Administration Guide.
Establishing Performance GoalsAs explained in Chapter 1, "Overview of Deployment," one of your main goals is to maximize performance. This essentially translates into maximizing throughput and reducing response time.
Beyond these basic goals, you should establish specific goals by determining the following:
These factors are interrelated. If you know the answer to any three of these four factors, you can calculate the fourth.
Some of the metrics described in this chapter can be calculated using a remote browser emulator (RBE) tool, or web site performance and benchmarking software, that simulates your enterprise’s web application activity. Typically, RBE and benchmarking products generate concurrent HTTP requests and then report back the response time and number of requests per minute. You can then use these figures to calculate server activity.
The results of the calculations described in this chapter are not absolute. Treat them as reference points to work against, as you fine-tune the performance of Sun Java System Application Server.
This section describes the following topics:
Estimating Throughput
Throughput, as measured for application server instances and for HADB, has different implications.
A good measure of the throughput for Application Server instances is the number of requests precessed per minute. A good measure of throughput for the HADB is the number of requests processed per minute by HADB, and the session size per request. The session size per request is important because the size of session data stored varies from request to request.
For more information session persistence, see Chapter 1, "Overview of Deployment."
Estimating Load on Application Server Instances
Consider the following factors to estimate the load on application server instances:
Calculating Maximum Number of Concurrent Users
A user runs a process (for example through a web browser) that periodically sends requests from a client machine to Application Server. When estimating the number of concurrent users, include all users currently active. A user is considered active as long as the session that user is running is active (for example, the session has neither expired nor terminated).
A user is concurrent for as long as the user is on the system as a running process submitting requests, receiving results of requests from the server, and viewing the results.
Eventually, as the number of concurrent users submitting requests increases, requests processed per minute begins to decline (and the response time begins to increase). The following diagram illustrates this situation.
Figure 2-3 Performance Pattern with Increasing Number of Users.
You should identify the point at which adding more concurrent users reduces the number of requests that can be processed per minute. This point indicates when performance starts to degrade.
Calculating Think Time
A user does not submit requests continuously. A user submits a request, the server receives the request, processes it and then returns a result, at which point the user spends some time analyzing the result before submitting a new request. The time spent reviewing the result of a request is called think time.
Determining the typical duration of think time is important. You can use the duration to calculate more accurately the number of requests per minute, as well as the number of concurrent users your system can support. Essentially, when a user is on the system but not submitting a request, a gap opens for another user to submit a request without altering system load. This implies that you can support more concurrent users.
Calculating Average Response Time
Response time refers to the amount of time it takes for results of a request to be returned to the user. The response time is affected by a number of factors including network bandwidth, number of users, number and type of requests submitted, and average think time.
In this section, response time refers to the mean, or average, response time. Each type of request has its own minimal response time. However, when evaluating system performance, you should base your analysis on the average response time of all requests.
The faster the response time, the more requests per minute are being processed. However, as the number of users on your system increases, response time starts to increase as well, even though the number of requests per minute declines, as the following diagram illustrates:
Figure 2-4 Response Time with Increasing Number of Users
A system performance graph similar to Figure 2-4, indicates that after a certain point (point A in this diagram), requests per minute are inversely proportional to response time- the sharper the decline in requests per minute, the steeper the increase in response time (represented by the dotted line arrow).
In Figure 2-4, point A represents peak load, that is, the point at which requests per minute start to decline. Prior to this point response time calculations are not necessarily accurate because they do not use peak numbers in the formula. After this point, (because of the inversely proportional relationship between requests per minute and response time), you can more accurately calculate response time using maximum number of users and requests per minute.
To determine response time at peak load, use the following formula:
Response time = (concurrent users / requests per second) - think time in seconds
To obtain an accurate response time result, you must always include think time in the equation.
Example Calculation of Response Time
For example, if the following conditions exist:
Therefore, the response time is 2 seconds.
After you have calculated your system’s response time, particularly at peak load, decide what is an acceptable response time for your enterprise. Response time, along with throughput, is one of the main factors critical to Sun Java System Application Server performance. Improving the response time should be one of your goals.
If there is a response time beyond which you do not want to wait, and performance is such that you get response times over that level, then work towards improving your response time or redefine your response time threshold.
Calculating Requests Per Minute
If you know the number of concurrent users at any given time, the response time of their requests and the average user think time at that time, you can determine requests per minute. Typically, you start by knowing how many concurrent users are on your system.
For example, after running a few web site performance calculation software, you conclude that the average number of concurrent users submitting requests on your online banking web site is 3,000. This number is dependent on the number of users who have signed up to be members of your online bank, their banking transaction behavior, the times of the day or week they choose to submit requests, and so on.
Therefore, knowing this information enables you to use the requests per minute formula described in this section to calculate how many requests per minute your system can handle for this user base. Since requests per minute and response time become inversely proportional at peak load, decide if fewer requests per minute are acceptable as a trade-off for better response time, or alternatively, if a slower response time is acceptable as a trade-off for more requests per minute.
Essentially, you should experiment with the requests per minute and response time thresholds that is acceptable as a starting point for fine-tuning system performance. Thereafter, decide which areas of your system you want to adjust.
The formula for obtaining the requests per second is as follows:
requests per second = concurrent users / (response time in seconds + think time in seconds)
Example Calculation of Requests per Second
For example, if the following conditions exists:
Therefore, the number of requests per second is 700 and the number of requests per minute is 42000.
Estimating Load on HADB
To calculate load on HADB, consider the following factors:
For more information on configuring session persistence, see Sun Java System Application Server Administration Guide.
HTTP Session Persistence Frequency
The number of requests per minute received by the HADB depends on the persistence frequency. Persistence frequency determines how often Application Server saves HTTP session data to the HADB.
The persistence frequency options are:
- web-method (default): the server stores session data with every HTTP response. This option guarantees that stored session information will be up to date; but it leads to high traffic to the HADB.
- time-based: the session is stored at the specified time interval. This option reduces the traffic to the HADB, but does not guarantee that the session information will be up to date.
Table 2-1 summarizes the advantages and disadvantages of persistence frequency options.
HTTP Session Size and Scope
The session size per request depends on the amount of session information stored in the session.
Tip
To improve overall performance, reduce the amount of information in the session as much as possible.
You can further fine-tune the session size per request through the persistence scope settings. Choose from the following options for HTTP session persistence scope:
- session: The server serializes and saves the entire session object every time it saves session information to HADB.
- modified-session: The server saves the session only if the session has been modified. It detects modification by intercepting calls to the bean’s setAttribute() method. This option will not detect direct modifications to inner objects, so in such cases the SFSB must be coded to call setAttribute() explicitly.
- modified-attribute: The server saves only those attributes that have been modified (inserted, updated, or deleted) since the last time the session was stored. This has the same drawback as modified-session but can significantly reduce HADB write throughput requirements if properly applied.
Table # summarizes the advantages and disadvantages of the persistence scope options.
SFSB Checkpointing
For SFSB session persistence, the load on HADB depends on the following:
Checkpointing generally occurs after any transaction involving the SFSB is completed (even if the transaction rolls back).
For better performance, specify a small set of methods for checkpointing. The size of the data that is being checkpointed and the frequency of checkpointing determine the additional overhead in response time for a given client interaction.
Design DecisionsDepending on the load on the application server instances, the load on the HADB, and the failover requirements, you should make the following decisions at this stage:
Number of Application Server Instances Required
To determine the number of applications server instances needed, evaluate your environment on the basis of the factors explained in Estimating Load on Application Server Instances. Each application server instance can use more than one Central Processing Unit (CPU) and should have at least one CPU allocated to it.
Number of HADB Nodes Required
As a general guideline, you should plan to have one HADB node for each CPU in your system. For example, use two HADB nodes for a machine that has two CPUs.
Alternatively, use the following procedure to determine the required number of HADB nodes:
- Determine the following parameters:
- Determine the size in Gigabytes of the maximum primary data volume, Vdata, using the following formula:
Vdata = nusers . s
- Determine the maximum HADB data transfer rate, Rdt. This reflects the data volume shipped into HADB from the application side. Use the following formula:
Rdt = nusers . s . NTPS
- Determine the number of nodes based on data volume considerations, NNODES, using the following formula:
NNODES = Vdata /5GB
Round this value up to an even number, since nodes work in pairs.
Number of HADB Hosts
Determine the number of hosts based on data transfer requirements. This calculation assumes all hosts have similar hardware configurations and operating systems, and have the necessary resources to accommodate the nodes they run.
To calculate the number of hosts based on data transfer considerations, follow this procedure:
- Determine the maximum host data transfer rate, Rmax.. Determine this value empirically, because it depends on the network and the host hardware. Note that this is different from the maximum HADB data transfer rate, Rdt, determined in the previous section.
- Updating a volume of data V distributed over a number of hosts NHOSTS causes each host to receive approximately 4V/NHOSTS of data. The number of hosts needed to accommodate this data is determined by using the following formula:
NHOSTS = 4 . Rdt / Rmax
Round this value up to the nearest even number to get the same number of hosts for each DRU.
- Add one host on each DRU for spare nodes. If each of the other hosts run N data nodes, let this host run N spare nodes. This allows for single-machine failure taking down N data nodes.
Each host needs to run at least one node, so if the number of nodes is less than the number of hosts (NNODES < NHOSTS), adjust NNODES to be equal to NHOSTS. If the number of nodes is greater than the number of hosts, (NNODES > NHOSTS), several nodes can be run on the same host.
HADB Storage Capacity
The HADB provides near-linear scaling with the addition of more nodes, until you exceed the network capacity. Each node must be configured with storage devices on a dedicated disk or disks. All nodes must have equal space allocated on the storage devices. Make sure that the storage devices are allocated on local disks.
For example, suppose the expected session data is X MB. The HADB replicates the data on mirror nodes, and therefore needs 2X MB of storage.Further, the HADB uses indexes to enable fast access to data. An additional 2X MB is required (for both nodes together) for indexes (assuming a less than 100% fillings rate). This implies that a storage capacity of 4X is required.Therefore, the expected storage capacity needed by the HADB is four times the expected data volume.
To account for future expansion without loss of data from HADB, you must provide additional storage capacity for online upgrades because you might want to refragment the data after adding new nodes. In this case, a similar amount (4x) of additional space on the data devices is required. Thus, the expected storage capacity is eight times the expected data volume.
Additionally, HADB uses disk space for internal use as follows:
For more information, see Sun Java System Application Server Administration Guide and Sun Java System Application Server Performance Tuning Guide.
The following table summarizes the HADB storage space requirements for a session data of X MB.
If the HADB runs out of device space, it will not accept client requests to insert or update data. However, it will accept delete operations. If the HADB runs out of device space, it returns error codes 4593 or 4592 and writes corresponding error messages to the history files. For more information on these messages, see Sun Java System Application Server Troubleshooting Guide.
Setting Data Device Size
Use the following command to set the size of the data devices of the HADB:
hadbm set TotalDatadeviceSizePerNode
The hadbm command restarts all the nodes, one by one, for the change to take effect. For more information on configuring the HADB, see Sun Java System Application Server Administration Guide.
Designing for Peak Load Compared to Steady State Load
In a typical deployment, there is a difference between steady state and peak workloads.
If you design for peak load, you must deploy a system that can sustain the expected maximum load of users and requests without a degradation in response time. This implies that your system can handle extreme cases of expected system load.
If the difference between peak load and steady state load is substantial, designing for peak loads may mean that you are spending on resources that will be idle for a significant amount of time.
If you design for steady state load, then you don’t have to deploy a system with all the resources required to handle the server’s expected peak load. However a system designed to support steady load will have slower response time when peak load occurs.
Frequency and Duration of Peak Load
The factor that may affect whether you want to design for peak load or for steady state is how often your system is expected to handle the peak load. If peak load occurs several times a day or even per week, you may decide that this is enough time to warrant expanding capacity to handle this load. If the system operates at steady state 90 percent of the time, and at peak only 10 percent of the time, then you may prefer to deploy a system designed around steady state load.
This implies that your system’s response time will be slower only 10 percent of the time. Decide if the frequency or duration of time that the system operates at peak justifies the need to add resources to your system (should this be required to handle peak load).
Planning the Network ConfigurationWhen planning how to integrate Sun Java System Application Server into your network for optimal performance, you should estimate the bandwidth requirements and plan your network in such a way that it can meet your performance requirements.
The following topics are covered in this section:
Estimating Bandwidth Requirements
When you decide on the desired size and bandwidth of your network, first determine your network traffic and identify its peak. Check if there is a particular hour, day of the week, or day of the month when overall volume peaks, and then determine the duration of that peak.
Tip
At all times consult network experts at your site about the size and type of network components you are considering.
During peak load times, the number of packets in the network is at its highest level. In general, if you design for peak load, scale your system with the goal of handling 100 percent of peak volume. Bear in mind, however, that any network behaves unpredictably and that despite your scaling efforts, it might not always be able handle 100 percent of peak volume.
For example, assume that at peak load, five percent of your users occasionally do not have immediate Internet access when accessing applications deployed on Application Server. Of that five percent, determine how many users retry access after the first attempt. Again, not all of those users may get through, and of that unsuccessful portion, another percentage will retry. As a result, the peak appears longer because peak use is spread out over time as users continue to attempt access.
To ensure optimal access during times of peak load, start by verifying that your Internet service provider (ISP) has a backbone network connection that can reach an Internet hub without degradation.
Calculating Bandwidth Required
Based on the calculations you made in Establishing Performance Goals, you should determine the additional bandwidth required for deploying Sun Java System Application Server at your site.
Depending on your method of access (T-1 lines, ISDN, and so on), you can calculate the amount of increased bandwidth you require to handle your estimated load. For example, suppose your site uses T-1 or higher-speed T-3 links for Internet access. Given their bandwidth, you can estimate how many lines you will need on your network, based on the average number of requests generated per second at your site and the maximum peak load. You can calculate these figures using a web site analysis and monitoring-tool.
Example Calculation of Bandwidth Required
A single T-1 line can handle 1.544 Mbps. Therefore, a network of four T-1 lines carrying 1.544 Mbps each can handle approximately 6 Mbps of data. Assuming that the average HTML page sent back to a client is 30 kilobytes (KB), this network of four T-1 lines can handle the following traffic per second:
6,176,000 bits/8 bits = 772,000 bytes per second
772,000 bytes per second/30 KB = approximately 25 concurrent client requests for pages per second.
With a traffic of 25 pages per second, this system can handle 90,000 pages per hour (25 x 60 seconds x 60 minutes), and therefore 2,160,000 pages per day maximum, assuming an even load throughout the day. If the maximum peak load is greater than this, you will have to increase the bandwidth accordingly.
Estimating Peak Load
Having an even load throughout the day is probably not realistic. You need to determine when peak load occurs, how long it lasts, and what percentage of the total load is the peak load.
Example Calculation of Peak Load
If peak load lasts for two hours and takes up 30 percent of the total load of 2,160,000 pages, this implies that 648,000 pages must be carried over the T-1 lines during two hours of the day.
Therefore, to accommodate peak load during those two hours, you should increase the number of T-1 lines according to the following calculations:
648,000 pages/120 minutes = 5,400 pages per minute
5,400 pages per minute/60 seconds = 90 pages per second
If four lines can handle 25 pages per second, then approximately four times that many pages requires four times that many lines, in this case 16 lines. The 16 lines are meant for handling the realistic maximum of a 30 percent peak load. Obviously, the other 70 percent of your load can be handled throughout the rest of the day by these many lines.
Configuring Subnets
If you use the separate tier topology, where the application server instances and HADB nodes are on separate tiers, you can achieve a performance improvement by keeping HADB nodes on a separate subnet. This is because HADB uses the User Datagram Protocol (UDP). Using a separate subnet reduces the UDP traffic on the machines outside of that subnet.
Choosing Network Cards
For greater bandwidth and optimal network performance, use at least 100 Mbps Ethernet cards or, preferably, 1 Gbps Ethernet cards between servers hosting Sun Java System Application Server and the HADB nodes, as well as among other resources such as HADB databases that are hosted on other machines.
Network Settings for HADB
HADB uses UDP multicast and hence you must enable multicast on your system’s routers and host network interface cards. If HADB spans multiple sub-networks, you must also enable multicast on the routers between the sub-networks. For best results, put all the HADB nodes on the same network. Application server instances may be on a different sub network.
Use the following suggestions to make HADB work optimally in the network:
- Use switched routers so that each network interface has a dedicated 100 Mbps or better Ethernet channel.
- If you are running HADB on a multi-CPU machine hosting four or more HADB nodes, use 1 Gbps Ethernet cards. If the average session size is greater than 50 KB, use 1 Gbps Ethernet cards even if there are less than four HADB nodes per machine.
- If you suspect network bottlenecks within HADB nodes:
Planning for AvailabilityAvailability must be planned according to the application and customer requirements.
There are two ways to achieve high availability:
Adding Redundancy to the System
One way to achieve high availability is to add redundancy to the system—redundancy of hardware and software. When one unit fails, the redundant unit takes over. This is also referred to as fault tolerance.
In general, to achieve high availability, you should determine and remove every possible point of failure in the system.
This section discusses the following topics:
Identifying Failure Classes
The level of redundancy is determined by the failure classes (types of failure) that the system needs to tolerate. Some examples of failure classes are: system process, machine, power supply, disk, network failures, building fires and catastrophes.
Duplicated system processes tolerate single system process failures. Duplicated machines tolerate single machine failures. Attaching the duplicated mirrored (paired) machines to different power supplies tolerates single power failures. By keeping the mirrored machines in separate buildings, a single-building fire can be tolerated and by keeping them in separate geographical locations, natural catastrophes like earth quake in a location can be tolerated.
When planning availability, you should determine the failure classes covered by the system.
Using Redundancy Units to Improve Availability
To improve availability, HADB nodes are always used in Data Redundancy Units (DRUs) as explained in Introducing HADB.
Using Spare Nodes to Improve Fault Tolerance
The use of spare nodes as explained in Spare Nodes improves fault tolerance. Although spare nodes are not mandatory, their use is recommended for maximum availability.
Planning Failover Capacity
Failover capacity planning implies deciding how many additional servers and processes you need to add to Sun Java System Application Server installation so that in the event of a server or process failure, the system can seamlessly recover data and continue processing. If your system gets overloaded, a process or server failure might result, causing response time degradation or even total loss of service. Preparing for such an occurrence is critical to successful deployment.
To maintain capacity, especially at peak loads, we recommended that you add spare machines running Application Server instances to your existing Application Server installation. For example, assume you have a system with two machines running one Application Server instance each. Together, these machines can handle a peak load of 300 requests per second. If one of these machines becomes unavailable, the system will be able to handle only 150 requests, assuming an even load distribution between the machines. Therefore half the requests during peak load would not be served.
Using Multiple Clusters to Improve Availability
To improve availability, instead of using a single cluster, you should group the application server instances into multiple clusters. This way, you can perform online upgrades for clusters (one by one) without loss of service.
For more information on setting up multiple clusters and using multiple clusters to perform online upgrades without loss of service, see Sun Java System Application Server Administration Guide.