users@glassfish.java.net

Re: EJB timer service can not automatically migration

From: Shreedhar Ganapathy <Shreedhar.Ganapathy_at_Sun.COM>
Date: Mon, 23 Feb 2009 18:06:01 -0800

Its worth a double check.
I have added the following FAQ entry and linked it off of the GF FAQ
section for Clustering and High Availability.
http://wiki.glassfish.java.net/Wiki.jsp?page=FaqBasicMulticastTestBetweenMachines


shockwave_115_at_hotmail.com wrote:
> Hi Shreedhar
>
> We also think it is the multicast problem, so we test the network
> with jgroups which also use multicast, and the result shows it is OK.
> We even use a Router to connect these two machines, and it failed to
> migrate automatically.
>
> But I will try your test method to double check the multicast in
> our network.
>
> Thanks
> //Jason
>
> --------------------------------------------------
> From: "Shreedhar Ganapathy" <Shreedhar.Ganapathy_at_Sun.COM>
> Sent: Tuesday, February 24, 2009 9:10 AM
> To: <users_at_glassfish.dev.java.net>
> Subject: Re: EJB timer service can not automatically migration
>
>> Hi Jason
>> Although GUI status update shows the second node's state change, it
>> does not directly flow that there is no network problem. From the GMS
>> view changes log you shared, it is a symptom showing that instance 2
>> machine is on a different network or is on a machine on the same
>> network but connected to a different switch that does not pass
>> multicast messages to other switches within the same network.
>>
>> Here's a basic multicast test you can do without involvement of
>> appserver or shoal gms code :
>> Check out shoal sources on the two machines and do a build (takes a
>> few seconds) following instructions at
>> https://shoal.dev.java.net/servlets/ProjectDocumentView?documentID=43252&noNav=true
>> <https://shoal.dev.java.net/servlets/ProjectDocumentView?documentID=43252&noNav=true>
>>
>>
>> Once you have it built, open one terminal on each machine. On one of
>> the terminals, run the ant target
>> ant test-mcastsender This runs the MulticastSender sending
>> messages to the group
>>
>> On the other machine, in the terminal run
>> ant test-mcastsniffer This run MulticastSniffer
>>
>> You should see 9 messages on the sniffer to confirm multicast works
>> properly.
>>
>> If you don't see these messages then it means multicast traffic is
>> not enabled and you need to talk with your network admin to enable
>> multicast traffic within your subnet.
>>
>>
>> hth
>> Shreedhar
>>
>>
>>
>> shockwave_115_at_hotmail.com wrote:
>>> Hi
>>> The two nodes are in the same subnet, when I kill the instance on
>>> the second node, I can see the state changed from the admin GUI, I
>>> think it means the DAS know the state change on the second node, but
>>> the instance1 on node one can not get the failure event, so I don't
>>> think it's the network problem, because DAS and instance1 are on the
>>> same node.
>>>
>>> BRs
>>> //Jason
>>>
>>> --------------------------------------------------
>>> From: "vivekanandh sedhumadhavan" <Vivekanandh.Sedhumadhavan_at_Sun.COM>
>>> Sent: Tuesday, February 24, 2009 6:23 AM
>>> To: <users_at_glassfish.dev.java.net>
>>> Subject: Re: EJB timer service can not automatically migration
>>>
>>>> Hi Jason,
>>>> On Feb 22, 2009, at 9:49 PM, shockwave_115_at_hotmail.com wrote:
>>>>
>>>>> Hi
>>>>> It's very strange, when I restart the DAS, I can only found the
>>>>> memberID: server and instance1 in the log of instance1
>>>>>
>>>>> [#|2009-02-23T13:42:10.301+0800|INFO|sun-appserver9.1|ShoalLogger|
>>>>> _ThreadID=13;_ThreadName=ViewWindowThread;|GMS View Change
>>>>> Received: Members in view (before change analysis) are :
>>>>> 1: MemberId: server, MemberType: SPECTATOR, Address:
>>>>> urn:jxta:uuid-3E8A9E516D3C4E83910A81CAE3458DE02DE658F932AB436995B78E0CB3E080DA03
>>>>>
>>>>> 2: MemberId: instance1, MemberType: CORE, Address:
>>>>> urn:jxta:uuid-3E8A9E516D3C4E83910A81CAE3458DE07987FC1134E54090AB24B0C9E01AD7DF03
>>>>> |#]
>>>>>
>>>>> The membership information of instance2 is missing.
>>>>> Instance1 and server(DAS) are on the same node, instance2 is on
>>>>> the other node, is it the root cause of the failure of
>>>>> automatically migration?
>>>> Could be , Can you pls check that both nodes are in the same subnet.
>>>>
>>>> thanks
>>>> -vivek
>>>>>
>>>>>
>>>>> What may be the root cause of this problem?
>>>>>
>>>>>
>>>>> BRs
>>>>> //Jason
>>>>>
>>>>> --------------------------------------------------
>>>>> From: <shockwave_115_at_hotmail.com>
>>>>> Sent: Sunday, February 22, 2009 1:05 PM
>>>>> To: <users_at_glassfish.dev.java.net>; <Marina.Vatkina_at_Sun.COM>
>>>>> Subject: Re: EJB timer service can not automatically migration
>>>>>
>>>>>> How to enable these two features? I didn't see any specification
>>>>>> for this part.
>>>>>>
>>>>>> When I do prototype on one machine with two instances, I just to
>>>>>> use the default configuration, and the timer service can migrate
>>>>>> from the failure node to the other automatically.
>>>>>>
>>>>>> BRs
>>>>>> //Jason
>>>>>>
>>>>>> --------------------------------------------------
>>>>>> From: "Marina Vatkina" <Marina.Vatkina_at_Sun.COM>
>>>>>> Sent: Saturday, February 21, 2009 9:10 AM
>>>>>> To: <users_at_glassfish.dev.java.net>
>>>>>> Subject: Re: EJB timer service can not automatically migration
>>>>>>
>>>>>>> Jason,
>>>>>>>
>>>>>>> Did you enable timer migration? If yes, you might also need to
>>>>>>> enable delegated transaction recovery in order to automatically
>>>>>>> migrate timers.
>>>>>>>
>>>>>>> thanks,
>>>>>>> -marina
>>>>>>>
>>>>>>> shockwave_115_at_hotmail.com wrote:
>>>>>>>> Hi
>>>>>>>>
>>>>>>>> When I look into the source code, I found the timer migration
>>>>>>>> is invoked by
>>>>>>>> AdminEventMulticaster.multicastEvent(AdminEvent event), so
>>>>>>>> does it means the
>>>>>>>> node failure event is monitored by DAS instead of other
>>>>>>>> instance? If so how
>>>>>>>> can I achieve the HA if DAS crashed?
>>>>>>>>
>>>>>>>> And the automatically migration problem is still there, I don't
>>>>>>>> know how to
>>>>>>>> do further investigation, I stop one instance on node one, I
>>>>>>>> can see nothing
>>>>>>>> showed in the logs of instance two on the node two.
>>>>>>>>
>>>>>>>> For my understanding, at least instance two should have the
>>>>>>>> heartbeat with
>>>>>>>> instance one, so when instance one crashed, instance two can
>>>>>>>> know it. If so,
>>>>>>>> why use multicast?
>>>>>>>>
>>>>>>>> BRs
>>>>>>>> Jason
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>> To unsubscribe, e-mail: users-unsubscribe_at_glassfish.dev.java.net
>>>>>>>> For additional commands, e-mail: users-help_at_glassfish.dev.java.net
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>>
>>>>>>> To unsubscribe, e-mail: users-unsubscribe_at_glassfish.dev.java.net
>>>>>>> For additional commands, e-mail: users-help_at_glassfish.dev.java.net
>>>>>>>
>>>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: users-unsubscribe_at_glassfish.dev.java.net
>>>>> For additional commands, e-mail: users-help_at_glassfish.dev.java.net
>>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe_at_glassfish.dev.java.net
>>>> For additional commands, e-mail: users-help_at_glassfish.dev.java.net
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe_at_glassfish.dev.java.net
>>> For additional commands, e-mail: users-help_at_glassfish.dev.java.net
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe_at_glassfish.dev.java.net
>> For additional commands, e-mail: users-help_at_glassfish.dev.java.net
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe_at_glassfish.dev.java.net
> For additional commands, e-mail: users-help_at_glassfish.dev.java.net
>