Bongjae Chang,
Thanks for your feedback.
My comments are inline below.
I hope that it help explains why the change was made more.
Bongjae Chang wrote:
Hi.
I have a question about this change.
GroupCommunicationProvider.sendMessage() can be used in DistributedStateCache.addToCache API
Assume that A, B and C are members in group.
When A adds data to DSC, first A's local cache is updated then A tries to send data to B and C.
B receives the data correctly but C doesn't receive the data because the connection between A and C has some problems.
At this case, DSC's data is not synchronized because C doesn't have A's data.
Before the following change, I thought if DistributedStateCache.addToCache() threw an exception, caller should do rollback about the data for data's synchronization among members.
The sending of messages is not transacted so I am uncertain how a
rollback could be done.
Given that it takes 10 seconds to identify that an instance has
failed, it is quite likely in scenarios that multiple instances
fail at same time that IOException occurs in PTP transmission between
an instance and a newly failed instance.
In other words, I understood caller had synchronization role because API threw an exception. But API doesn't throw an exception like the following change, I think it is better that SHOAL should have rollback's logic.
cf) Before codes, only MemberNotInViewException was ignored. But though MemberNotInViewException was ignored, SHOAL could synchronize DSC's data with syncCache(). In other words, when the member that was not in view joins the group later, the master will be aware of the member in view, so the master will call DSC.synCache() method. But IOException is different from MemberNotInViewException because the member is already in view, so DSC.synCache() can't be called. Of course, if the member will be failed, there is no problem. But IOException can be occurred without member failure. ex) in the firewall
There is an outstanding shoal issue to handle IOException better in
future when trying to send a message.
(See
https://shoal.dev.java.net/issues/show_bug.cgi?id=72) However,
it is best when one is implementing
udp broadcast using ptp messaging to all instances in the cluster, that
the implementation logic should closely
simulate the functionality it is trying to implement via an alternative
mechanism. That is why this change was made.
There is no transaction semantics associated with sending of the
message to all instances in the cluster.
The use of point-to-point communication with all instances was to
improve upon the reliability of group-wide (udp) based message.
But that reliability should not compromise attempting to send the
message at least once to ALL instances in the cluster.
The functionality should closely approximate the udp broadcast.
Allowing the first failed send to abort iterating over all instances in
the
cluster was incorrect. If the exception did get thrown, how can the
handler of the exception recover from the aborted broadcast to
all members of the cluster. There is no record of which instances
received the message and which ones did not.
If it truely is interesting to know when not all instances of the
cluster (based on the current local view of instances in the cluster
which is out of
date in the scenario I made this change for since the scenario killed 4
instances at one time.) receive a message, one could collect the names
of the instances that did not receive the message while sending the
message to all the other instances of the cluster.
After completing attempting to send message to all instances of the
cluster, then one could throw a GMSException that it was unable to
send messages to some instances and those instances names are the
following.
This fix was made for a scenario where 4 instances of an 8 instance
cluster where killed at same time.
Thus, there was many cases where a sendMessage failed to one of the
killed instances with an
IOException: unable to create messenger. All instances in the
iteration after the failure never were sent the
message. This was an incorrect approximation of implementing udp
broadcast with ptp to each instance in the cluster.
Thus, the following change was made.
public void sendMessage(final String targetMemberIdentityToken,
final Serializable message,
final boolean synchronous) throws
GMSException, MemberNotInViewException {
try {
if (targetMemberIdentityToken == null) { // indicates send
"message" to ALL instances in the cluster.
if (synchronous) {
/*
Use point-to-point communication with all instances
instead of the group-wide (udp) based message.
Since we don't have reliable multicast yet, this
approach will ensure reliability.
Ideally, when a message is sent to the group via
point-to-point,
the message to each member should be on a separate
thread to get concurrency.
*/
List<SystemAdvertisement> currentMemberAdvs =
clusterManager.getClusterViewManager().
getLocalView().getView();
for (SystemAdvertisement currentMemberAdv :
currentMemberAdvs) {
final ID id = currentMemberAdv.getID();
// deleted some code
try {
// I do agree with your analysis in
another email that the return value of sendMessage should be checked
and it false, that the send should be retried.
clusterManager.send(id, message);
} catch (MemberNotInViewException e) {
if (logger.isLoggable(Level.FINE)) {
logger.fine("MemberNotInViewException :
" + e.toString());
}
} catch (IOException ioe) {
// don't allow an exception sending to one
instance of the cluster to prevent ptp multicast to all other instances
of
// of the cluster. Catch this exception,
record it and continue sending to rest of instances in the cluster.
if (logger.isLoggable(Level.FINE)) {
logger.log(Level.FINE,
"IOException in reliable
synchronous ptp multicast sending to instance " +
currentMemberAdv.getName() +
". Perhaps this instance has
failed but that has not been detected yet. Peer id=" +
id.toString(),
ioe);
}
} catch (Throwable t) {
// don't allow an exception sending to one
instance of the cluster prevent ptp broadcast to all other instances of
// of the cluster. Catch this exception,
record it and continue sending to rest of instances in the cluster.
if (logger.isLoggable(Level.FINE)) {
logger.log(Level.FINE,
"Exception in reliable
synchronous ptp multicast sending to instance " +
currentMemberAdv.getName() +
", peer id=" + id.toString(),
t);
}
}
}
} else {
// send "message" to all members of the group that
are listening.
// NOTE: Failed instances DO NOT receive the
message. There is no record of which instances receive message and
which do not.
clusterManager.send(null, message);//sends to whole
group
}
To address your concern of notifying caller that send to all instances
in cluster failed,
a future revision of this method could throw a GMSException if unable
to send a message to one of the instances of the cluster could be
thrown.
I am still uncertain on how a caller would handle the proposed
exception other than to print it out. Amounting to same as logging
individually when the sends failed.
Thus, there would need to be a use case identified to justify making
the following type of change. (New code highlighted in italics)
public void sendMessage(final String targetMemberIdentityToken,
final Serializable message,
final boolean synchronous) throws
GMSException, MemberNotInViewException {
try {
if (targetMemberIdentityToken == null) { // indicates send
"message" to ALL instances in the cluster.
if (synchronous) {
/*
Use point-to-point communication with all instances
instead of the group-wide (udp) based message.
Since we don't have reliable multicast yet, this
approach will ensure reliability.
Ideally, when a message is sent to the group via
point-to-point,
the message to each member should be on a separate
thread to get concurrency.
*/
List<SystemAdvertisement> currentMemberAdvs =
clusterManager.getClusterViewManager().
getLocalView().getView();
boolean oneOrMoreSendsFailed = false;
Throwable firstException = null;
for (SystemAdvertisement currentMemberAdv :
currentMemberAdvs) {
final ID id = currentMemberAdv.getID();
// deleted some code
try {
clusterManager.send(id, message);
} catch (MemberNotInViewException e) {
if (logger.isLoggable(Level.FINE)) {
logger.fine("MemberNotInViewException :
" + e.toString());
}
} catch (IOException ioe) {
oneOrMoreSendsFailed = true;
if (firstException != null) {
firstException = ioe;
}
// don't allow an exception sending to one
instance of the cluster to prevent ptp multicast to all other instances
of
// of the cluster. Catch this exception,
record it and continue sending to rest of instances in the cluster.
if (logger.isLoggable(Level.FINE)) {
logger.log(Level.FINE,
"IOException in reliable
synchronous ptp multicast sending to instance " +
currentMemberAdv.getName() +
". Perhaps this instance has
failed but that has not been detected yet. Peer id=" +
id.toString(),
ioe);
}
} catch (Throwable t) {
oneOrMoreSendsFailed = true;
if (firstException != null) {
firstException = ioe;
}
// don't allow an exception sending to one
instance of the cluster prevent ptp broadcast to all other instances of
// of the cluster. Catch this exception,
record it and continue sending to rest of instances in the cluster.
if (logger.isLoggable(Level.FINE)) {
logger.log(Level.FINE,
"Exception in reliable
synchronous ptp multicast sending to instance " +
currentMemberAdv.getName() +
", peer id=" + id.toString(),
t);
}
}
}
if (oneOrMoreSendsFailed) {
GMSException gmsException = new
GMSException("one or more sends failed.");
gmsException.noteFailure(firstException);
throw gmsException;
}
} else {
// send "message" to all members of the group that
are listening.
// NOTE: Failed instances DO NOT receive the
message. There is no record of which instances receive message and
which do not.
clusterManager.send(null, message);//sends to whole
group
}
-Joe Fialli, Sun Microsystems
Thanks.
--
Bongjae Chang
----- Original Message -----
From: <jfialli@dev.java.net>
To: <cvs@shoal.dev.java.net>
Sent: Wednesday, August 20, 2008 6:37 AM
Subject: CVS update: /shoal/gms/src/java/com/sun/enterprise/ee/cms/impl/jxta/GroupCommunicationProviderImpl.java
User: jfialli
Date: 2008-08-19 21:37:07+0000
Modified:
shoal/gms/src/java/com/sun/enterprise/ee/cms/impl/jxta/GroupCommunicationProviderImpl.java
Log:
Fix for sendMessage when it is using reliable multicast mode. This mode is selected when sendMessage parameter
target is null (indicating to broadcast the msg parameter to instances in the cluster) and synchronous parameter is
true.
Benefit of this fix is that when one a send to one of the instances in the cluster fails, the code will now log that
failure but it will proceed to send messages to rest of instances in the cluster. Prior to this fix, failing to send to
one instance in the cluster meant that ptp multicast would abruptly stop due to the exception, and potentially
one or more instances in the cluster would not receive the synchronous ptp multicast message.
Reviewed by: Sheetal Vartak
File Changes:
Directory: /shoal/gms/src/java/com/sun/enterprise/ee/cms/impl/jxta/
===================================================================
File [changed]: GroupCommunicationProviderImpl.java
Url: https://shoal.dev.java.net/source/browse/shoal/gms/src/java/com/sun/enterprise/ee/cms/impl/jxta/GroupCommunicationProviderImpl.java?r1=1.26&r2=1.27
Delta lines: +24 -3
--------------------
--- GroupCommunicationProviderImpl.java 2008-08-13 20:36:25+0000 1.26
+++ GroupCommunicationProviderImpl.java 2008-08-19 21:37:05+0000 1.27
@@ -66,7 +66,7 @@
*
* @author Shreedhar Ganapathy
* Date: Jun 26, 2006
- * @version $Revision: 1.26 $
+ * @version $Revision: 1.27 $
*/
public class GroupCommunicationProviderImpl implements
GroupCommunicationProvider,
@@ -212,11 +212,32 @@
task.setMessage(message);
msgSendPool.submit(task);
*/
- logger.log(Level.FINE, "sending message to member: " + currentMemberAdv.getName());
+ logger.log(Level.FINER, "sending message to member: " + currentMemberAdv.getName());
try {
clusterManager.send(id, message);
} catch (MemberNotInViewException e) {
- logger.warning("MemberNotInViewException : " + e.toString());
+ if (logger.isLoggable(Level.FINE)) {
+ logger.fine("MemberNotInViewException : " + e.toString());
+ }
+ } catch (IOException ioe) {
+ // don't allow an exception sending to one instance of the cluster to prevent ptp multicast to all other instances of
+ // of the cluster. Catch this exception, record it and continue sending to rest of instances in the cluster.
+ if (logger.isLoggable(Level.FINE)) {
+ logger.log(Level.FINE,
+ "IOException in reliable synchronous ptp multicast sending to instance " + currentMemberAdv.getName() +
+ ". Perhaps this instance has failed but that has not been detected yet. Peer id=" +
+ id.toString(),
+ ioe);
+ }
+ } catch (Throwable t) {
+ // don't allow an exception sending to one instance of the cluster prevent ptp broadcast to all other instances of
+ // of the cluster. Catch this exception, record it and continue sending to rest of instances in the cluster.
+ if (logger.isLoggable(Level.FINE)) {
+ logger.log(Level.FINE,
+ "Exception in reliable synchronous ptp multicast sending to instance " + currentMemberAdv.getName() +
+ ", peer id=" + id.toString(),
+ t);
+ }
}
}
} else {
---------------------------------------------------------------------
To unsubscribe, e-mail: cvs-unsubscribe@shoal.dev.java.net
For additional commands, e-mail: cvs-help@shoal.dev.java.net