Admin meeting minutes

From: Dies Koper <diesk_at_fast.au.fujitsu.com>
Date: Wed, 10 Mar 2010 12:42:51 +1100

Hi,

Does someone keep minutes at the meetings?

At http://wiki.glassfish.java.net/Wiki.jsp?page=AdminIteam I found a one
line minute for Kedar's last meeting, but no info about last week's.

I made some notes that I can share if you can put them online.
I didn't keep a record of action items and attendees, and they only
cover the parts of conversations I could understand and/or was
interested in, but maybe better than nothing.

Here's for today's meeting:

The topic was a continuance of last week's cluster support discussion.

The following will be the same between GF V2 and V3 clustering:
- The DAS manages all the data of all nodes.
- It sends out this data to the instances using some synchronization
mechanism.

What will be different in GF V3.1 is:
- There will be no Node Agent at first. The main functionality of the
node agent was to start and stop remote instances. This can be done by
the OS's facilities.

Some of the reasons of the changes are:
- The node agent was basically duplicating the OS's services functionality.
- The were scalability issues in GFv2.x where the DAS was suffering when
many nodes started to synchronise.
- A rewrite of the related code is required anyway because the way the
domain.xml information is handled has changed radically in GF V3 anyway.
- There were also issues because the code that was running in the
cluster and non-cluster cases was different.

One thing that needs to be investigated (tried out) is whether and how
much optimizations are required for the synchronization process between
nodes and DAS. For example, Ericsson used 40 instances on different
machines.
How to do the synchronization is still being considered. Using a
database or SVN repository is a bad idea because of performance issues
and SVN is not Java-only. Mature rsync implementations in Java do not
seem available.

There will be two timings of synchronization:

1.
When an instance starts. This could even be done just before the
instance starts (from the asadmin command that starts the instance, as
it has all the infrastructure to connect to the DAS anyway), instead of
from the VM process of the starting instance.

2.
When operations are done on the DAS, which need to be propagated to the
instances. The DAS could have a global unique incrementing number
counting its state changes, so this number can be compared with the
remote instances's numbers to quickly see if they are up to date.
(Currently the plan is to use the timestamp of the DAS's domain.xml for
this).

In case of an instance restart, 1. could be skipped. The window of the
restart is short, so no need to sync everything again?

Two things that need to be taken care with is:

- What should happen if a user tries to run an asadmin command on the
DAS while the DAS is sync'ing its state with remote nodes? Queue the
command? Does that mean the command won't come back until the command is
completed? What if the synchronization brings down the DAS, how will the
user know whether the command has been executed or not?

- The DAS will propagate state changes to the remote instance. What if
an instance is starting but not yet ready to accept the DAS's commands?
How does the DAS know?
-> Maybe we can make use of GMS here.

Regards,
Dies