|
Netra High Availability Suite 3.0 1/08 Release Notes |
The Netra High Availability Suite 3.0 1/08 Release Notes contain important and late-breaking information about the current release of the Netra
High Availability (HA) Suite Foundation Service software. These notes contain known restrictions and workarounds to known bugs. In cases where there are differences between the release notes and the Netra HA Suite 3.0 1/08 documentation set, the information in the release notes takes precedence.
The Netra HA Suite 3.0 1/08 release is a set of Netra HA Suite patches to be applied to the Netra HA Suite 3.0 first customer shipment (FCS) release. See sections on specific types of patches presented later in these notes for a complete list of the patches to be applied.
This document contains the following sections:
The following new functionalities have been introduced since the release of Netra HA Suite 3.0 software:
The Netra HA Suite 3.0 1/08 software is supported for use with the Solaris
10 1/06 OS on SPARC® and x64 platforms only for platforms that are already supported for use with the Netra HA Suite 3.0 FCS release. For more information about the platforms supported for use with the Netra HA Suite 3.0 FCS release, see TABLE 4.
To support the Netra HA Suite 3.0 1/08 release on the Solaris 10 1/06 OS release, you must install the Solaris patches documented in Solaris OS Patches.
The Netra HA Suite 3.0 1/08 software is supported for use with the Solaris 10 8/07 OS on SPARC (including chip multithreading [CMT]) and x64 platforms. For more information, refer to TABLE 4. To support the Netra HA Suite 3.0 1/08 release on the Solaris 10 8/07 OS release, you must install the Solaris patches documented in Solaris OS Patches.
The Netra HA Suite 3.0 1/08 software is supported for use with the MontaVista Linux Carrier Grade Edition 4.0 OS (MV CGE 4.0) on only the Netra CT 900 servers equipped with Netra CP3020 blades. This is a 64-bit Linux distribution.
For more information, refer to TABLE 4.
The Netra HA Suite 3.0 1/08 software is supported for use with the WindRiver CGL OS on the Netra CT900 blade server with Netra CP3020 or CP3220 blades, as well as on the Netra X4200 rack-mounted server. TABLE 1 describes the supported platforms and 64-bit capabilities available with each supported version of the PNE-LE bundle release.
For PNE-LE bundle release 1.4, a patch is delivered for use with Netra HA Suite 3.0 1/08. If you want to run WindRiver CGL PNE-LE bundle release 2.0, contact your service representative. For more information, refer to TABLE 4.
The default detection delay for the heartbeat mechanism has been reduced on the Solaris OS from 900 milliseconds to 150 milliseconds, reducing the global failover time. On the Wind River Linux OS, the default value has been reduced to 300 milliseconds.
To change the default value, add Probe.DetectionDelay=value to the nhfs.conf file and reboot the node. Note that CMM.Probe.DetectionDelay is the deprecated name for Probe.DetectionDelay and can still be used.
Note that decreasing the value of Probe.DetectionDelay below 150 ms on Solaris OS and below 300 ms on Wind River Linux OS might lead to an unexpected loss of heartbeats and, as a result, nodes might unexpectedly leave the cluster. To avoid this situation, do not set Probe.DetectionDelay below the default values.
To return Probe.DetectionDelay to its default value, set the value to 900.
If you use external addresses managed by External Address Manager (EAM) to access the Reliable Network File System (RNFS) from client nodes that are outside of the cluster, specify that the services should be synchronized when a switchover occurs. To enable this synchronization, set the EAM.SyncWithRNFS property in the nhfs.conf file to True. For information about this property, refer to the Netra High Availability Suite 3.0 1/08 Foundation Services Reference Manual.
A new CMM property, CMM.StartUp.Join, allows you to define whether nodes will automatically try to join the cluster at startup. If the property is set to False, the node will not join the cluster at boot time. In this case, the node will join the cluster only upon request of the application through a CMM command. For information about this property, see the nhfs.conf(4) man page for Solaris or the nhfs.conf(5) man page for Linux, or refer to the Netra HA Suite reference manual for your operating system.
The Netra HA Suite enables a service tag that can be automatically discovered and identified by the Sun
Connection Inventory channel. For details about using Sun Connection’s Inventory channel to track and organize your Sun software and hardware, refer to:
https://sunconnection.sun.com/inventory
Sun’s Logical Domains (LDoms) technology is a server virtualization and partitioning technology that enables the allocation of various system resources, such as memory, CPUs, I/O, and storage into partitions known as logical or virtual domains. Each logical domain can have an independent operating system, resources, and identity within a single computer system. Specialized service and control domains allow these resources to be managed using the Logical Domains Manager software.
For information about the LDoms configurations supported with this release of the Netra HA Suite Foundation Services, refer to LDoms.
The following limitations apply to the configurations supported in this release of the Netra HA Suite software.
Netra HA Suite 3.0 1/08 software is supported for use with LDoms 1.0.1 on Netra CP3060 and CP3260 ATCA blades (CMT) and Netra T2000 and T5220 servers (CMT). LDoms functionality is supported only on the Solaris 10 8/07 OS or newer.
The Netra HA Suite Foundation Services should be installed only in guest domains.
Netra CP3060 ATCA blades support only one physical disk drive, and this disk is owned by the control domain. Master eligible nodes and dataless non-master eligible nodes must use the virtual disk devices that are serviced by the control domain.
The control/service domain will be a single point of failure if Netra HA Suite is used with LDoms. If the control domain fails, all the other domains on the same system will also fail.
Netra HA Suite 3.0 1/08 supports 64-node cluster configurations (a master node, a vice-master node, and 62 dataless nodes). However, configurations using Advanced Telecommunications Computing Architecture (ATCA)-based hardware have been qualified at the hardware level with a maximum of 12 nodes.
This release supports 64-node dataless clusters. Cluster performance (for example, the time required for switchover, failover, and boot) depends on the number of client nodes (master-ineligible nodes) in the cluster. When there are more than 18 client nodes, we suggest that you use server nodes (master-eligible nodes) that are more powerful than your client nodes to get expected performance results.
The following limitations exist when you run the Netra HA Suite software on a cluster where all or some nodes are running under Linux:
Netra HA Suite 3.0 1/08 software provides the following new functions through the SA Forum CLM API.
The initialViewNumber field of the saClmClusterNodeT structure is supported for use with the Netra HA Suite 3.0 1/08 software. For information, refer to the Netra High Availability Suite 3.0 1/08 Foundation Services SA Forum Programming Guide. For information about these functions, go to http://www.saforum.org/
The following values apply to the SA Forum/CLM man pages when they are used with the Netra High Availability Suite Foundation Services:
Automated installation procedures described in the Netra High Availability Suite 3.0 1/08 Foundation Services Installation Guide have been adapted for the support of Solaris 10 8/07 OS on SPARC/CMT and x64 processors, and the support of MontaVista Linux Carrier Grade Edition 4.0 and WindRiver PNE-LE 1.4 Linux distributions.
All corresponding manual installation procedures have been detailed for the Solaris OS only, in the Netra High Availability Suite 3.0 1/08 Foundation Services Manual Installation Guide for the Solaris OS. No manual installation procedure is described for Linux.
TABLE 3 summarizes the hardware supported with the Netra HA Suite 3.0 1/08 software as of publication of this document. for more information about operating systems supported with Netra HA Suite for each platform, see TABLE 4.
|
Netra CT 900 blade server (ATCA chassis) |
|
|
Netra CP2140/CP2160, and CP2500 Netra CP3010 ATCA SPARC blades |
|
| Note - For information about required patches and firmware versions for Netra CP3010 ATCA SPARC blades, Netra CP3020 or CP 3220 ATCA x64 blades, Netra CP3060 or CP3260 ATCA CMT blades, or Netra CT 900 servers, refer to the appropriate release notes, which can be downloaded from: http://docs.sun.com. |
| Note - For information about iSCSI support with the Netra HA Suite, contact your support representative. |
On Netra CT 900 servers, Base Fabric Ethernet switches (respectively, Extended Fabric Ethernet switches) are interconnected. This factory-preset configuration might lead to unexpected behavior on Linux if left unmodified because redundant network interfaces used by Netra HA Suite are in the same broadcast domain.
It is strongly suggested that you use redundant network interfaces in different broadcast domains. This can be achieved in a variety of ways. For example, you can disable the interconnect between switches or configure VLAN on switches. Having one interface on the Base Fabric and the other on Extended Fabric is not suggested because the technologies of the two fabrics differ.
Refer to the Netra CT 900 Server Administration and Reference Manual and the Netra CT 900 Server Switch Software Reference Manual for more information.
The following mixed hardware configurations are supported for use on clusters running this release of the Netra HA Suite software.
| Note - Blades running the Solaris OS are always MEN nodes, and blades running Linux are always NMEN nodes. |
This section lists the software you can use with the Netra HA Suite 3.0 1/08 and specifies the supported versions for different types of hardware.
The following servers and boards are supported for use on clusters that have the following versions of operating system (OS) installed.
For example cluster configurations, see the Netra High Availability Suite 3.0 1/08 Foundation Services Getting Started Guide.
The following volume management software is supported for use with the Netra HA Suite 3.0 1/08 software:
Volume Manager (SVM) software for the Solaris 9 9/05 OS and Solaris 9 9/05 HW OS and the Solaris 10 1/06 OS and Solaris 10 8/07 OS (for SPARC and x64). For installation information, see the Solaris Volume Manager Administration Guide.
The following software is embedded with the release of Foundation Services 3.0:
The following versions of data replication software are supported on the specified versions of operating system.
Availability Suite (AVS) software version 3.1 for the Solaris 9 9/05 OS and Solaris 9 9/05 HW OS
| Note - AVS 3.2 is not supported for use with the Foundation Services software. |
Dynamic Management Kit 5.0 software only for the Solaris OS (NMA is not supported on the Linux OS)
The following development tools are supported for use with this release of the Foundation Services software:
Studio 10 software
For the Netra HA Suite 3.0 1/08 software to be properly installed and operational, you must download a set of patches and apply them to the Netra HA Suite 3.0 FCS. To download the patches, visit the SunSolveSM web site:
TABLE 5 lists the required patches for each supported operating system.
| Note - For Netra HA Suite 3.0 1/08 software, you need to install the level 5 version (-05) of these patches, at a minimum. |
During the first reboot after patches 124481-05 or 124482-05 are applied on a Solaris cluster Master Eligible node, the following message appears on the console:
This message appears when changes in the NHAS services are taken into account by the Solaris Management Facility, but it can be safely ignored. The service status svc:/system/cgha/rnfs/server:default will subsequently be cleared, and this service will be restarted correctly at the end of the boot.
If you already have a cluster up and running with the Netra HA Suite 3.0 software and you want to upgrade it to Netra HA Suite 3.0 1/08, install the above-mentioned Netra HA Suite patches using the procedure described in the README files delivered with the Netra HA Suite patches.
| Note - If you are running the Solaris 10 1/06 OS, there are additional Solaris OS patches that must be installed before you install the Netra HA Suite patches. For more information, see Solaris OS Patches. |
If you are installing a new cluster, you can use the nhinstall tool to perform an automated full installation of the Netra HA Suite 3.0 1/08 software. To do this, install the Netra HA Suite 3.0 FCS packages and patches on your installation server and follow the procedure described in the README files delivered with the Netra HA Suite patches.
| Note - When installing a new cluster on the Solaris 10 OS, after the Solaris 10 8/07 Operating System and Netra HA Suite 3.0 1/08 software are installed, you must then install the latest recommended Solaris patches for the platform architecture. See the following description, “To Install the Latest Recommended Solaris Patches.” |
1. Install the Solaris 10 8/07 Operating System and Netra HA Suite 3.0 1/08 software.
Use the nhinstall tool to install the Netra HA Suite software.
2. Obtain the latest “Recommended Solaris Patch Cluster” for Solaris 10 and the platform architecture used by your system.
You can download these patches from:
3. Disable the Netra HA Suite software.
4. Run the installation script.
The script is bundled with the patches.
The procedure is not “nhinstall friendly,” as the patch install script might require several reconfiguration reboots.
5. Re-enable the Netra HA Suite software.
After you have finished installing the patches, remember to re-enable the software by removing the not_configured file.
When installing the Foundation Services software, install the latest version of following patches that are available on the SunSolve web site, depending on the version of the Solaris OS that is installed on your system:
At a minimum, install a patch for init s/init 3 sequence: 127111-09 or higher (SPARC) and 127112-09 or higher (x64).
- 118833 (SPARC): Before installing patch 118833, install patches 118918-13, 119042-09, and 119578-30, in this order, and reboot the node.
- 118855 (x86): Before installing patch 118855, install patches 119043, 118344, 123840, 122035 (in this order) and reboot the node.
These patches must be manually installed if you want to upgrade an existing Netra HA Suite 3.0 cluster to Netra HA Suite 3.0 1/08.
These patches are automatically installed if you use the nhinstall tool. If you manually install the software, download these patches from SunSolve.
| Note - On the Solaris 9 9/05 HW OS, these patches are not required. |
The Netra HA Suite download contains one SNDR patch: 116710-03. This SNDR/AVS point patch replaces the SNDR patches released with the previous version of the software and should be installed only if you are running the Solaris 9 9/05 OS and Solaris 9 9/05 HW OS (SNDR 3.1). No patch should be installed for AVS 4.0.
This SNDR patch is available on SunSolve at http://sunsolve.sun.com/point.
No software patches for CGTP are required for the Solaris 9 9/05 OS and Solaris 9 9/05 HW OS or Solaris 10 OS.
A CGTP patch must be added to standard Linux kernels if you choose not to use the Linux kernel delivered with the Netra HA Suite 3.0 1/08 patches for Linux distributions. In this case, you must rebuild your Linux kernel using the CGTP source patch delivered with the Netra HA Suite 3.0 1/08 patches. For help rebuilding your kernel with CGTP, contact your authorized service representative.
The following sections describe recommended uses of particular functionalities and features of the Foundation Services.
When rebooting a master-eligible node on a running cluster, do not use the reboot command. Doing so will kill processes in an indeterminate order, effectively ignoring the required sequence for stopping services, which can lead to inconsistencies in data replication.
Instead, reboot a node using the steps provided in the Netra High Availability Suite 3.0 1/08 Foundation Services Cluster Administration Guide, which vary depending on the version of the operating system in use at your site.
When a master-eligible node is reintegrated into the cluster (for example, after maintenance or failure), there is a period when disk partitions are resynchronizing. While a cluster is unsynchronized, the data on the master node disk is not fully backed up. Do not schedule major tasks when the cluster is unsynchronized.
The symptoms of node corruption can include the presence of “maintenance required” messages, nodes remaining at run level 1, the inability to execute basic UNIX® commands (for example, ls, pwd, and cd), and the presence of messages about recovering the repository using archives.
If, when installing clusters, you experience any of these symptoms and determine that a node failure has occurred, manually recover the node(s) by following the procedures in the README file included with the Solaris 10 OS distribution (/lib/svc/share/README). For specific examples, refer to Section 2 of the README file.
Due to issues with the Linux kernel, file locking is not supported for use with the Netra HA Suite 3.0 8/07 Foundation Services software. For more information, refer to Linux Known Issues.
The following subsections list known bugs and their workarounds where available.
TABLE 6 describes the issues most commonly encountered when using the Foundation Services, beginning with the issue of which you should be most aware.
TABLE 7 describes issues that exist using the Foundation Services with MontaVista Carrier Grade Edition and Wind River Linux.
|
CGTP fails on Linux occasionally Configure CGTP’s gateway table on pure Linux clusters (gateway tables should normally be used only with heterogeneous, Linux-Solaris clusters). If a Linux cluster is installed using nhinstall, no further action is needed because the nhinstall tool will configure CGTP’s gateway table. If the installation is done manually, without using nhinstall, the gateway table must also be manually populated. See Example: Configuring CGTP’s Gateway Table on Linux for an example of configuring a gateway table for a three-node cluster. |
|
|
On Linux, when multiple threads intensively use the CMM API or SA Forum CLM API, because of the way Linux schedules the threads, some calls might return CMM_ETIMEDOUT. Users can safely retry the operation. |
|
|
The syslog facility can be configured to log Netra HA Suite messages as described in the Netra High Availability Suite 3.0 1/08 Foundation Services Cluster Administration Guide. On Linux, syslog can be very slow. Therefore, when configuring syslog to get Netra HA Suite messages of info or debug level, it is strongly suggested that you omit syncing of the log file after logging by prefixing entries in the /etc/syslog.conf file with the minus sign “-” as described in the syslog.conf(5) man page. |
|
|
Locks are lost upon switch over or fail over on Linux Due to an issue with the Linux kernel, file locking is not supported for use with the Netra HA Suite 3.0 1/08 Foundation Services. Using file locking on the replicated partitions works until a failover or switchover is triggered, then locks are lost. |
|
|
Confusing message from bonding when executing a switchover When a switchover is requested, you might see a message on the console stating that a bond interface (bondX) has failed. You can safely ignore this message, as it has no real impact on the system. |
If there is a cluster with two master-eligible nodes and one non-master-eligible node:
On MEN-1, the following must be done:
On MEN-2, the following must be done:
On NMEN, the following must be done:
This should be added to /etc/network/interfaces to ensure that gateway table entries are automatically added after reboot (or ifdown/ifup), for example, for MEN-1:
|
SNDR sets using sector 0 fail, which is not detected by nhinstall/nhadm |
|
CGTP Broadcast IRE Are Not Recreated After plumb or unplumb Use of the ifconfig command to plumb or unplumb the CGTP interface is not supported. Using the ifconfig command in this way can lead to unexpected cluster outage. Action on a single interface leads to inoperative CGTP broadcasts. Broadcasts replicated by CGTP might not be delivered if one of the underlying incoming interfaces is down, and, for the same reason, if the interface has been unplumbed. CGTP broadcasts cannot survive the brutal unplumbing/replumbing of the underlying network interfaces. The only way for CGTP broadcasts to survive an ifconfig unplumb is to always respect the following sequence of operations: |
|
|
“CGTP fails on Linux occasionally” See Linux Known Issues for information. |
The following table lists the guides that make up the current documentation set and briefly describes the type of information they contain. The documentation can be found at:
http://docs.sun.com/app/docs/prod/netra.avail
The following known issue exists in this release of the Netra HA Suite Foundation Services documentation set.
The init. election field is currently not documented in the nhcmmstat man page or Netra High Availability Suite 3.0 1/08 Foundation Services Reference Manual. The following definition applies for this field:
init. election
The election number the cluster had when the node joined the cluster. The election number is increased each time there is a change in the cluster membership, so a node joining the cluster before another node will have a lower election number than the latter.
The Intro(1M) man page for Solaris erroneously lists a man page for an nhpmdadmwrapper(1M) command. This command is not available, and its man page is not included with this distribution.
Copyright © 2008, Sun Microsystems, Inc. All Rights Reserved.