users@shoal.java.net

Re: [Shoal-Users] Re: send message after join

From: Joseph Fialli <Joseph.Fialli_at_Sun.COM>
Date: Mon, 22 Jun 2009 12:46:10 -0400

Jerry Raj wrote:
> Joseph Fialli wrote:
>
>> Bongjae Chang wrote:
>>
>>> Hi,
>>>
>>> Joe wrote:
>>>
>>>
>>>> It is not that join is not complete for the sender, the sender is
>>>> definitely part of the group when join returns.
>>>>
>>>>
>> The answer below assumes that Jerry is using the sendMessage that
>> broadcasts a message to all instances of the
>> gms group.
>> Bongjae's comments reminded me that the send message to all instances in
>> the cluster is no longer a broadcast in Shoal 1.5 as
>> it was in Shoal 1.0. So to answer Jerry's original question, that is
>> what has changed between Shoal 1.0 and Shoal 1.5 that he
>> is noticing.
>>
>> GroupHandle.sendMessage(String targetComponent, byte[]) is going down a
>> different code path in Shoal 1.5 that
>> would be impacted by the comments that Bongjae has mentioned below.
>> Instead of broadcasting the message
>> depending solely on udp broadcast, the message is being sent to each
>> instance one at a time (over TCP) based on
>> occurring in clusterview manager view. It takes much longer for an
>> instance to get into the clusterview than
>> it does for a GMS client to "join" a multicast group. (which is the
>> question I originally answered to
>> Jerry. ) I had overlooked the new broadcast mechanism added to shoal
>> 1.5 to increase reliablity of
>> of broadcasting.
>>
>> The following is a workaround to get the Shoal 1.0 behavior for sending
>> messages.
>> (which is just a multicast broadcast.)
>>
>> Use GroupHandle.sendMessage(String destinationName, String
>> targetComponent, byte[] msg).
>> Calling /gh.sendMessage((String)null,
>> "SubstituteYourTargetComponentName", msg)
>> /will result in a udp broadcast rather than going down the code path
>> that was introduced in
>> Shoal 1.5 for broadcasting to cluster.
>>
>> I have verified by altering the MultiThreadedMessageSender test that
>> this workaround works.
>> Calling the first sendMessage() mentioned in this email, the first 928
>> messages do not get sent
>> between the two MultiThreadSendMessage test. Switching to the
>> workaround, all messages get
>> sent and I commented out the sleeps that were in MultiThreadSendMessage.
>>
>> Please confirm that this addresses your issue. We will write up a shoal
>> message sending test to
>> capture this behavior and work on documenting this better.
>>
>
>
> Many thanks for everyone's help. This works perfectly. I simply changed the call
> from
> gh.sendMessage(topic, msg);
> to
> gh.sendMessage((String)null, topic, msg);
>
> Now the message is reliably received by existing peers.
>
> Is this the recommended way, though? Because, if I understand correctly, the
> reason to go from UDP multicast to individual TCP sends is to increase
> reliability? Are we losing some reliability be going back to UDP?
>
> I guess what we really need is either a callback or a method in gms to figure
> out whether the clusterview formation is complete?
>
Jerry,

Glad to at least point out the difference from Shoal 1.0 to 1.5 for your
send message call.

I have worked in past on Java Message Service(JMS) and it is always
challenging to get coordination
between loosely coupled processes to broadcast a message and ensure all
processes get the first message.

You are correct that the switch from UDP to individual TCP sends was to
increase reliability of message
delivery. The workaround I provided was not planned, but just something
I noted as a way to allow
users to have a workaround to get the Shoal 1.0 behavior.

I will put together a test case that shows how we propose users deal
with the issue you are reporting.
However, how does your system know when the clusterview formation is
complete. I am pretty certain you
mentioned in your last message that you don't know how many members are
in the GMS group.

All solutions that I had with synchronizing broadcasting messages in JMS
included knowledge in the sender
of how many other processes it was waiting for. The initial message
sent out was not a content message, but
a STARTUP message. When all the other processes replied back to the
STARTUP message, the application knew
that all loosely coupled processes all had their JMS subscriptions
initialized and were ready to proceed.


-Joe









> Thanks
> -Jerry
>
>
>
>> -Joe Fialli
>>
>>
>>> I have a little different opinion.
>>>
>>> I think that the discovery should also be finished for completing join.
>>> "Join" logic includes a lot of initializations which include
>>> MasterNode and ClusterViewManager's init.
>>>
>>> When gms.join() is called, MasterNode's init will be executed in
>>> separate thread and MasterNode's init starts ClusterViewManager.
>>>
>>> ClusterViewManager could be related to GroupHandler#sendMessage().
>>>
>>> What do you think, Joe?
>>>
>>> And Jerry,
>>>
>>>
>>>
>>>>> Do you see a view change with the two members in it ?
>>>>>
>>>>>
>>>> Yes, I do.
>>>>
>>>>
>>> Though you could see a view change with the two members, if you send a
>>> message at that time, I think that the message could be lost. I am not
>>> sure.
>>>
>>> But maybe I think you set the shoal logger to be Level.FINE, the
>>> problem's reason could be found.
>>>
>>> And at first, you would like to check whether join is complete or not.
>>>
>>> Sometimes I used the following trick for this.
>>> GroupManagementService#reportJoinedAndReadyState() will be returned
>>> after the discovery time.
>>>
>>> So I am curious to know the following test's result.
>>>
>>> <snip>
>>> gms.join();
>>>
>>> gms.reportJoinedAndReadyState();
>>>
>>> // Commented: Thread.sleep(10000);
>>> GroupHandle gh = gms.getGroupHandle();
>>> gh.sendMessage(blah);
>>> </snip>
>>>
>>> Could you please test it again?
>>>
>>> Thanks.
>>>
>>> --
>>> Bongjae Chang
>>>
>>>
>>> ----- Original Message ----- From: "Jerry Raj" <jerryr_at_sun.com>
>>> To: <users_at_shoal.dev.java.net>
>>> Sent: Friday, June 19, 2009 7:46 PM
>>> Subject: Re: [Shoal-Users] Re: send message after join
>>>
>>>
>>>
>>>
>>>> Shreedhar Ganapathy wrote:
>>>>
>>>>
>>>>>>> Do you see a view change with the two members in it ?
>>>>>>>
>>>>>>>
>>>>>> Yes, I do.
>>>>>>
>>>>>>
>>>>>>> If your send() is around the same time as the time the view change
>>>>>>> happens then you will have message loss.
>>>>>>>
>>>>>>>
>>>>>> Ah. This must be it. Since the join and send happen immediately one
>>>>>> after the
>>>>>> other, its quite likely that the send happens while the existing
>>>>>> peers are
>>>>>> processing the view change.
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Also, with Shoal 1.1 we have made a lot of synchronization
>>>>>>> improvements
>>>>>>> for correctness, which may have cost somewhat at the time of
>>>>>>> startup of
>>>>>>> members.
>>>>>>>
>>>>>>> Could you try out the code snippet that Joe provided as a way to
>>>>>>> ensure
>>>>>>> that message sending happens only when requisite memberships in the
>>>>>>> group are in place ?
>>>>>>>
>>>>>>>
>>>>>> I have no pre-existing knowledge of the number of expected peers.
>>>>>> So waiting for
>>>>>> "all" of them to join has no real meaning for me. The idea here is
>>>>>> that peers
>>>>>> can join and leave as they please. The message they send as soon as
>>>>>> they join is
>>>>>> used by existing peers (if any) to gain certain data about the new
>>>>>> peer (its IP
>>>>>> address, port its listening on etc). So the code from below will
>>>>>> not really work
>>>>>> for me, since I have no idea if any more peers will join or not.
>>>>>>
>>>>>> I can live with having a sleep between join and send for now,
>>>>>> except it seems
>>>>>> rather non-deterministic: are we always sure that 5 secs is enough?
>>>>>> Under load
>>>>>> will this go up? A clear callback or signal that says "It is now
>>>>>> safe to send
>>>>>> messages" will be much better.
>>>>>>
>>>>>>
>>>>> Timing is always a challenge when there are distributed systems
>>>>> involved
>>>>> and larger the number of members, the more involved it gets to ensure
>>>>> virtual synchrony especially on asynchronous systems. We could
>>>>> eventually look at an ack based system for membership lifecycle events'
>>>>> view change messages, but there are costs with that as well when
>>>>> memberships are large.
>>>>>
>>>>> One suggestion that may make it a bit better is that you could use the
>>>>> joined and ready construct for letting the group know that you are
>>>>> ready
>>>>> to receive and send messages. i.e when a member has connected to the
>>>>> group you get a join notification signal. After this when the member is
>>>>> ready to send messages, it can call the reportJoinedAndReadyState() API
>>>>> from the GroupManagementService reference
>>>>> http://fisheye5.cenqua.com/browse/~raw,r=1.16/shoal/gms/src/java/com/sun/enterprise/ee/cms/core/GroupManagementService.java
>>>>>
>>>>>
>>>>> Members who have registered for receiving the joined and ready
>>>>> notification signal and have joined the group would get that
>>>>> notification and then can send out messages to that member.
>>>>>
>>>>> Hope this helps you get a bit closer to deterministic knowledge of when
>>>>> a member(or set of members is/are ready to receive messages.
>>>>>
>>>>>
>>>> This is the opposite of my problem. In my case, I have two nodes, and
>>>> the
>>>> following is the order of execution in chronological order:
>>>>
>>>> Node1:
>>>> t0: join()
>>>> t1: send() --> goes nowhere, as expected
>>>>
>>>> Node2:
>>>> t2: join()
>>>> t3: send() --> goes nowhere, not expected
>>>>
>>>> In this case Node1 can register for a joined and ready notification,
>>>> but it will
>>>> not help, since node1 is not going to send anything.
>>>>
>>>> -Jerry
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>> -Jerry
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Thanks
>>>>>>> Shreedhar
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> The same identical code worked fine with Shoal 1.0.
>>>>>>>>
>>>>>>>> I hope this clarifies the use-case.
>>>>>>>> -Jerry
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> However, there is not enough information in your original post to
>>>>>>>>> confirm or deny this.
>>>>>>>>>
>>>>>>>>> You could delay sending a message to the group until there is
>>>>>>>>> certain
>>>>>>>>> number of members joined or
>>>>>>>>> you could wait for all members to have joined via
>>>>>>>>> JoinNotification event.
>>>>>>>>>
>>>>>>>>> Pull API:
>>>>>>>>>
>>>>>>>>> List<String> members = gms.getGroupHandle().getAllCurrentMembers();
>>>>>>>>> <wait to send first message until all expected number of members
>>>>>>>>> have
>>>>>>>>> joined>
>>>>>>>>>
>>>>>>>>> Event driven API:
>>>>>>>>>
>>>>>>>>> gms.addActionFactory( new JoinNotificationActionFactoryImpl( new
>>>>>>>>> JoinNotificationCallBack( serverName ) ) );
>>>>>>>>> gms.join();
>>>>>>>>> <wait till all expected instances have joined before sending
>>>>>>>>> message;
>>>>>>>>> use info calculated from JoinNotificationCallback>
>>>>>>>>>
>>>>>>>>> private class JoinNotificationCallBack implements CallBack {
>>>>>>>>>
>>>>>>>>> private String serverName;
>>>>>>>>>
>>>>>>>>> public JoinNotificationCallBack( String serverName ) {
>>>>>>>>> this.serverName = serverName;
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> // called for every instance joining the gms group.
>>>>>>>>> public void processNotification( Signal notification ) {
>>>>>>>>> <record instance has joined>;
>>>>>>>>> }
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> A non-coding way to check this out is to start all your receiving
>>>>>>>>> clients first.
>>>>>>>>> Wait 10 seconds (like your initial test).
>>>>>>>>> Then start your sending gms client.
>>>>>>>>> There is no need a sleep between the join and send since all the
>>>>>>>>> other
>>>>>>>>> members
>>>>>>>>> will have already joined. Hope this helps.
>>>>>>>>>
>>>>>>>>> -Joe
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>> -Jerry
>>>>>>>>>>
>>>>>>>>>> Jerry Raj wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Hello,
>>>>>>>>>>> I have code like this:
>>>>>>>>>>> <snip>
>>>>>>>>>>> gms.join();
>>>>>>>>>>> // Commented: Thread.sleep(10000);
>>>>>>>>>>> GroupHandle gh = gms.getGroupHandle();
>>>>>>>>>>>
>>>>>>>>>>> gh.sendMessage(blah);
>>>>>>>>>>>
>>>>>>>>>>> </snip>
>>>>>>>>>>>
>>>>>>>>>>> This used to work fine in Shoal 1.0. The node would join the
>>>>>>>>>>> group
>>>>>>>>>>> and the
>>>>>>>>>>> message would be recd by other members in the group. But this
>>>>>>>>>>> does
>>>>>>>>>>> not happen
>>>>>>>>>>> with Shoal 1.1 unless I uncomment the sleep(10000) between
>>>>>>>>>>> join() and
>>>>>>>>>>> send(). I
>>>>>>>>>>> expect this is because the join operation has not completed
>>>>>>>>>>> successfully when
>>>>>>>>>>> send() is called. Is there a way to be notified when join is
>>>>>>>>>>> complete? I tried
>>>>>>>>>>> looking at JoinedAndReadyNotificationActionImpl but that does not
>>>>>>>>>>> seem to work?
>>>>>>>>>>>
>>>>>>>>>>> I'm using Shoal 1.1 from the download link on the front page
>>>>>>>>>>> of the
>>>>>>>>>>> Shoal website.
>>>>>>>>>>>
>>>>>>>>>>> -Jerry
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>
>>>>>>>>>> To unsubscribe, e-mail: users-unsubscribe_at_shoal.dev.java.net
>>>>>>>>>> <mailto:users-unsubscribe_at_shoal.dev.java.net>
>>>>>>>>>> <mailto:users-unsubscribe_at_shoal.dev.java.net>
>>>>>>>>>> For additional commands, e-mail: users-help_at_shoal.dev.java.net
>>>>>>>>>> <mailto:users-help_at_shoal.dev.java.net>
>>>>>>>>>> <mailto:users-help_at_shoal.dev.java.net>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>
>>>>>>>>> To unsubscribe, e-mail: users-unsubscribe_at_shoal.dev.java.net
>>>>>>>>> <mailto:users-unsubscribe_at_shoal.dev.java.net>
>>>>>>>>> <mailto:users-unsubscribe_at_shoal.dev.java.net>
>>>>>>>>> For additional commands, e-mail: users-help_at_shoal.dev.java.net
>>>>>>>>> <mailto:users-help_at_shoal.dev.java.net>
>>>>>>>>> <mailto:users-help_at_shoal.dev.java.net>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>
>>>>>>>> To unsubscribe, e-mail: users-unsubscribe_at_shoal.dev.java.net
>>>>>>>> <mailto:users-unsubscribe_at_shoal.dev.java.net>
>>>>>>>> <mailto:users-unsubscribe_at_shoal.dev.java.net>
>>>>>>>> For additional commands, e-mail: users-help_at_shoal.dev.java.net
>>>>>>>> <mailto:users-help_at_shoal.dev.java.net>
>>>>>>>> <mailto:users-help_at_shoal.dev.java.net>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: users-unsubscribe_at_shoal.dev.java.net
>>>>>> <mailto:users-unsubscribe_at_shoal.dev.java.net>
>>>>>> For additional commands, e-mail: users-help_at_shoal.dev.java.net
>>>>>> <mailto:users-help_at_shoal.dev.java.net>
>>>>>>
>>>>>>
>>>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe_at_shoal.dev.java.net
>>>> For additional commands, e-mail: users-help_at_shoal.dev.java.net
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe_at_shoal.dev.java.net
>> For additional commands, e-mail: users-help_at_shoal.dev.java.net
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe_at_shoal.dev.java.net
> For additional commands, e-mail: users-help_at_shoal.dev.java.net
>
>