This section describes some typical problems when using SGD servers, and how to fix them.
The following troubleshooting topics are covered:
To help you to diagnose and fix problems when using array resilience, you can do the following:
Show status information for the SGD array
Enable array resilience logging
You use the tarantella status command on an SGD server to show status information for the server.
This section includes some examples of using tarantella status to show status information for an SGD array when the primary server in the array goes down. Section 7.1.6.2.1, “Primary Server Goes Down” includes a detailed description of this array resilience scenario.
The original network configuration used for the examples is a
three-node array of SGD servers in the domain
example.com
, as follows:
Primary server –
boston
Secondary servers
– newyork
,
detroit
When the primary server boston
goes down,
running tarantella status on
newyork
indicates that there is a
connection problem with the SGD array, as
follows:
$ tarantella status Array members (3): - newyork.example.com (secondary): Accepting standard connections. - boston.example.com (primary): NOT ACCEPTING CONNECTIONS. - detroit.example.com (secondary): Accepting standard connections. ...
If the SGD servers in the array do not agree on
the array membership, tarantella status
shows the array configuration as seen by every
SGD server in the array. For example, running
tarantella status on
newyork
during the failover stage might
show the following information:
$ tarantella status Inconsistent array: the servers report different array membership. ... boston.example.com reports an error: - Host is unavailable newyork.example.com reports 3 members as: - newyork.example.com - boston.example.com - detroit.example.com detroit.example.com reports 1 member as: - detroit.example.com
The tarantella status command indicates if
the array is in a repaired state. For example, running
tarantella status from
detroit
after the failover stage has
completed might show the following information:
$ tarantella status Array members (2): - newyork.example.com (primary) - detroit.example.com (secondary) ... This node is in a repaired array. Any alterations to array state will prevent recovery of the original array. Use the tarantella status --originalstate command to see the original array state.
You use the --originalstate
option to list
the members of the array before it was repaired. For example,
using the --originalstate
option on any
server in the array shows the original array members, as
follows:
$ tarantella status --originalstate Original array members (3): - boston.example.com (primary) - newyork.example.com (secondary) - detroit.example.com (secondary) ...
After the recovery stage, you can use the tarantella
status command to verify that the original array
formation has been recreated. For example, running
tarantella status on
newyork
might display the following
information:
$ tarantella status Array members (3): - newyork.example.com (secondary): Accepting standard connections. - boston.example.com (primary): Accepting standard connections. - detroit.example.com (secondary): Accepting standard connections. ...
To enable logging for array resilience, add the following log filters in the Log Filter field on the Global Settings → Monitoring tab in the Administration Console:
server/failover/*:failover%%PID%%.log server/failover/*:failover%%PID%%.jsl
See Section 7.4.3, “Using Log Filters to Troubleshoot Problems With an SGD Server” for more information on configuring and using SGD log filters.
Problems can arise if the clocks on the SGD servers in an array are not in synchronization. If possible, use NTP software or the rdate command to ensure the clocks on all SGD hosts are synchronized.
You run the tarantella status command on the
primary SGD server to show any clock
synchronization issues for the array. The following example
indicates that the clock on the secondary server
newyork.example.com
is out of
synchronization.
$ tarantella status Array members (3): - boston.example.com (primary): Accepting standard connections. - newyork.example.com (secondary): Accepting standard connections. - detroit.example.com (secondary): Accepting standard connections. WARNING: The clocks on the array nodes are not synchronized. The following array members disagree with the primary: - newyork.example.com
If clocks are out of synchronization, a warning message is also displayed on the Secure Global Desktop Servers tab of the Administration Console.
You use the --byserver
option of tarantella status to display the
clock setting on each SGD server in the array, as
follows:
$ tarantella status --byserver boston.example.com: - Array member (primary): Accepting standard connections. ... - Current time reported: Wed Apr 28 09:36:16 BST 2010 newyork.example.com: - Array member (secondary): Accepting standard connections. ... - Current time reported: Wed Apr 28 09:38:02 BST 2010 detroit.example.com: - Array member (secondary): Accepting standard connections. ... - Current time reported: Wed Apr 28 09:36:16 BST 2010 WARNING: The clocks on the array nodes are not synchronized.
If you experience problems with the Least CPU Usage and Most Free Memory methods of application load balancing, you can get information from the following places to help you understand what is happening:
SGD server log files
Add the following filters to the Log Filters field on the Global Settings → Monitoring tab in the Administration Console:
server/tier3loadbalancing/*:t3loadbal%%PID%%.log server/tier3loadbalancing/*:t3loadbal%%PID%%.jsl
This provides detailed information about the decision to run an application and the data being sent by the application server.
See Section 7.4.3, “Using Log Filters to Troubleshoot Problems With an SGD Server” for more information on configuring and using SGD log filters.
SGD Enhancement Module logs
For UNIX or Linux platform application servers, these are in
the
/opt/tta_tem/var/log/tier3loadprobe
file.
PID
_error.log
For Windows application servers, this information is displayed in the Event Viewer.
Load balancing service connection Common Gateway Interface (CGI) program
Go to the
https://
URL.
applicationserver
:3579?get&ttalbinfo
You can use this information to troubleshoot the following common problems:
Section 7.7.3.1, “The Load Balancing Service Is Not Working”
Section 7.7.3.2, “SGD Ignores an Application Server Load Balancing Properties File”
Section 7.7.3.3, “One of the Application Servers Is Never Picked”
Section 7.7.3.4, “One of the Application Servers Is Always Picked”
Section 7.7.3.5, “Two Identical Application Servers, But One Runs More Applications Than the Other”
Section 7.7.3.6, “The SGD Server Log File Shows an Update Received for an Unknown ID”
If you think the load balancing service is not working, check the following.
Questions
7.7.3.1.1: Is the SGD Enhancement Module installed and running?
7.7.3.1.2: Is the primary SGD server running?
7.7.3.1.3: Is your firewall blocking the load balancing service?
7.7.3.1.4: What do the log files show?
Questions and Answers
7.7.3.1.1: Is the SGD Enhancement Module installed and running?
On Microsoft Windows applications servers, use Control Panel → Administrative Tools → Services to check whether the Tarantella Load Balancing Service is listed and is started.
On UNIX and Linux platform application servers, run the following command as superuser (root) to check that load balancing processes are running:
# /opt/tta_tem/bin/tem status
7.7.3.1.2: Is the primary SGD server running?
The load balancing service on the application server sends load information to the primary SGD server. If the primary is not available, SGD uses Fewest application sessions as the method for load balancing application servers.
7.7.3.1.3: Is your firewall blocking the load balancing service?
For the load balancing service to work, the firewall must allow the following connections:
A TCP connection on port 3579 between the SGD server and the application server.
A UDP connection on port 3579 between the application server and the SGD server.
These connections do not need to be authenticated.
7.7.3.1.4: What do the log files show?
Check the log files for further information, see Section 7.7.3, “Troubleshooting Advanced Load Management” for details.
After creating a load balancing properties file for an application server, you must do a warm restart of the primary SGD server. Run the following command as superuser (root):
# tarantella restart sgd --warm
Ensure that no users are logged in to the SGD server, and that there are no application sessions, including suspended application sessions, running on the SGD server.
If one of the application servers is never picked to run applications, check the following.
Questions
7.7.3.3.1: Is the load balancing service running on the application server?
7.7.3.3.2: Is the application server available to run applications?
7.7.3.3.3: What do the log files show?
Questions and Answers
7.7.3.3.1: Is the load balancing service running on the application server?
See Section 7.7.3.1, “The Load Balancing Service Is Not Working”.
7.7.3.3.2: Is the application server available to run applications?
Check the application server object in the Administration Console. Ensure the Application Start check box is selected on the General tab for the application server object.
Check that the application server is up.
7.7.3.3.3: What do the log files show?
Check the log files for further information, see Section 7.7.3, “Troubleshooting Advanced Load Management” for details.
If one application server is always picked to run applications regardless of its load, check the following.
Questions
7.7.3.4.1: Is more than one application server configured to run the application?
7.7.3.4.2: Are the other application servers available to run applications?
7.7.3.4.3: Is the correct load balancing method selected?
7.7.3.4.4: Are you using server affinity?
7.7.3.4.5: Is the load balancing service running on the application server?
7.7.3.4.6: What do the log files show?
Questions and Answers
7.7.3.4.1: Is more than one application server configured to run the application?
Check the Hosting Application Servers tab for the application object.
7.7.3.4.2: Are the other application servers available to run applications?
Check the application server objects in the Administration Console. Ensure the Application Start check box is selected on the General tab
Check that all the application servers are up.
7.7.3.4.3: Is the correct load balancing method selected?
In the Administration Console, check that either Most Free Memory or Least CPU Usage is selected as the load balancing method on the Performance tab for the application object, or on the Global Settings → Performance tab.
7.7.3.4.4: Are you using server affinity?
Server affinity means that, if possible, SGD starts an application on the same application server as the last application started by the user. Server affinity is on by default, see Section 7.2.5.5, “Server Affinity”.
7.7.3.4.5: Is the load balancing service running on the application server?
See Section 7.7.3.1, “The Load Balancing Service Is Not Working”.
7.7.3.4.6: What do the log files show?
Check the log files for further information, see Section 7.7.3, “Troubleshooting Advanced Load Management” for details.
Check that the server weighting value for the servers are the same. See Section 7.2.7.1, “Application Server's Relative Power”.
The SGD server log file might show an information message containing the following text:
Got an update for unknownid
from machineapplicationserver
This message can be ignored. It occurs only when the primary SGD server is restarted.
If SGD is using a lot of network bandwidth, set the Bandwidth Limit attribute for a user profile to reduce the maximum allowable bandwidth the user can use.
Reducing the available bandwidth might have implications for application usability.
In the Administration Console, go to the User Profiles tab and select the user profile object you want to configure. Go to the Performance tab and select a value from the Bandwidth Limit list.
Alternatively, use the following command:
$ tarantella object edit --nameobj
--bandwidthbandwidth
The following are the available bandwidths:
Administration Console | Command Line |
---|---|
2400 bps | 2400 |
4800 bps | 4800 |
9600 bps | 9600 |
14.4 Kbps | 14400 |
19.2 Kbps | 19200 |
28.8 Kbps | 28800 |
33.6 Kbps | 33600 |
38.8 Kbps | 38800 |
57.6 Kbps | 57600 |
64 Kbps | 64000 |
128 Kbps | 128000 |
256 Kbps | 256000 |
512 Kbps | 512000 |
768 Kbps | 768000 |
1 Mbps | 1000000 |
1.5 Mbps | 1500000 |
10 Mbps | 10000000 |
None | 0 |
None is the default. This means there is no limit on bandwidth usage.
If users cannot connect to an SGD server when it is in firewall traversal mode, this is usually caused by starting the SGD server before the SGD web server.
In firewall traversal mode, an SGD server listens
on port 443 and forwards any web connections to the
SGD web server, which is configured to listen on
localhost
port 443
(127.0.0.1:443
).
If an SGD server is started before the SGD web server, the SGD server binds to all the available interfaces and this means that the SGD server forwards any web connections to itself in an infinite loop.
One solution is to always start the SGD web server before the SGD server. If you use the tarantella start command, the SGD server and web server are always started in the correct order.
Another solution is to configure SGD so that it
never binds to the localhost
interface. To do
this, use the following command:
$ tarantella config edit \ --tarantella-config-server-bindaddresses-external "!127.0.0.1"
On some shells you cannot use straight quotation marks,
"!127.0.0.1"
, as the
!127
might be substituted. Use single
straight quotation marks instead,
'!127.0.0.1'
.
You can also use this command to specify exactly which interfaces you do want SGD to bind to. You do this by typing a comma-separated list of DNS names or IP addresses.
See Section 1.5.2, “Firewall Traversal” for more details about running SGD in firewall traversal mode.
When a user logs in to an SGD server without logging out of another, normally the user's session is relocated to the new server. This is sometimes called session moving, or session grabbing.
If the clocks on all SGD servers in the array are not synchronized, user sessions might not relocate successfully.
SGD uses the time stamps on user sessions to determine which is newer. The newer user session is considered to be current. If clocks are not synchronized, the time stamps might give misleading information.
Because time synchronization is important, use Network Time Protocol (NTP) software to synchronize clocks. Alternatively, use the rdate command.
See also Section 7.4.2, “User Sessions and Application Sessions” for more information about user sessions in SGD.