admin@glassfish.java.net

Re: Command Replication in 3.1 - details

From: Bill Shannon <bill.shannon_at_oracle.com>
Date: Thu, 29 Apr 2010 16:06:51 -0700

Vijay Ramachandran wrote on 04/14/2010 10:54 AM:
> I have put together the details of how command replication feature will
> work Glassfish 3.1 in this wiki page
> <http://wiki.glassfish.java.net/Wiki.jsp?page=ClusterDynamicReconfig>.
> Your comments / feedback will be deeply appreciated. We can probably use
> a few minutes of the next team meeting to give your feedback.

This looks good. I had a few comments...

In the table "Command replication results and action taken", the
first entry for "Failure on one or more instances", the action taken
includes "set server-restart". What exactly does this mean and how do
you plan to do this? Are you assuming the server is up and you have
reliable communication with the server?

In general, the failure cases don't seem to distinguish "I sent the
command to the instance and it returned a failure response" from
"I wasn't able to send the command to the instance, e.g., because
it was down" or "I sent the command to the instance but I never got
a response". How do you plan to detect and handle these different
cases?

Also, how do you plan to handle intermittent network failures?
When you're next able to talk to the server instance, will you be
able to detect that it is out of date? Will you depend on GMS to
detect such cases? What if GMS says the instance is up but you can't
talk to it?