dev@shoal.java.net

Re: [Shoal-Dev] Shutdown time is delayed for 30sec at "ABSTRACTING_TRANSPORT" branch

From: Bongjae Chang <bongjae.chang_at_gmail.com>
Date: Fri, 15 Jan 2010 13:04:23 +0900

Hi Joe,

I see.. you are right.

Joe wrote:
> Declare multicastSocket as volatile and check if multicastSocket == null
> before calling multicastSocket.receive() in the run method above.

Then, how about removing "multicastSocket = null" in the stop method rather than considering nulled multicastSocket?

I attached the proposed patch and would like to know your opinion.

Thanks.

Regards,
Bongjae Chang


----- Original Message -----
From: "Joseph Fialli" <Joseph.Fialli_at_Sun.COM>
To: <dev_at_shoal.dev.java.net>
Sent: Friday, January 15, 2010 1:55 AM
Subject: Re: [Shoal-Dev] Shutdown time is delayed for 30sec at "ABSTRACTING_TRANSPORT" branch


> Bongjae,
>
> See comments inline below.
>
> -Joe
>
> Bongjae Chang wrote:
>> Hi,
>> I could also see another problem.
>> When I tried to shut down the shoal based on "ABSTRACTING_TRANSPORT"
>> branch and using Grizzly transport, I could find the shutdown time is
>> always delayed for 30sec.
>> When I saw the thread dump, BlockingIOMulticastSender was waiting for
>> 30sec. Please see the following thread dump.(The code line number can
>> be different from current branch)
>> ---
>> "ServerControl [cts-76]" daemon prio=10 tid=0x09888400 nid=0x170d
>> waiting on condition [0x88f1f000]
>> java.lang.Thread.State: TIMED_WAITING (parking)
>> at sun.misc.Unsafe.park(Native Method)
>> - parking to wait for <0x94e2b6e0> (a
>> java.util.concurrent.CountDownLatch$Sync)
>> at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198)
>> at
>> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:947)
>> at
>> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1239)
>> * at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:253)
>> at
>> com.sun.enterprise.mgmt.transport.BlockingIOMulticastSender.stop(BlockingIOMulticastSender.java:180)
>> * - locked <0x94e2b148> (a
>> com.sun.enterprise.mgmt.transport.BlockingIOMulticastSender)
>> at
>> com.sun.enterprise.mgmt.transport.jeus.JeusNetworkManager.stop(JeusNetworkManager.java:292)
>> - locked <0x94e09de0> (a
>> com.sun.enterprise.mgmt.transport.jeus.JeusNetworkManager)
>> at com.sun.enterprise.mgmt.ClusterManager.stop(ClusterManager.java:343)
>> - locked <0x94e036c0> (a com.sun.enterprise.mgmt.ClusterManager)
>> at
>> com.sun.enterprise.ee.cms.impl.base.GroupCommunicationProviderImpl.leave(GroupCommunicationProviderImpl.java:209)
>> at
>> com.sun.enterprise.ee.cms.impl.base.GMSContextImpl.leave(GMSContextImpl.java:159)
>> at
>> com.sun.enterprise.ee.cms.impl.common.GroupManagementServiceImpl.leave(GroupManagementServiceImpl.java:389)
>> at
>> com.sun.enterprise.ee.cms.impl.common.GroupManagementServiceImpl.shutdown(GroupManagementServiceImpl.java:286)
>> ...
>> ---
>> Maybe, the problem is the regression of the following changes.
>> ---
>> File [changed]: BlockingIOMulticastSender.java
>> Url:
>> https://shoal.dev.java.net/source/browse/shoal/gms/src/java/com/sun/enterprise/mgmt/transport/BlockingIOMulticastSender.java?r1=1.1.2.5&r2=1.1.2.6
>> <https://shoal.dev.java.net/source/browse/shoal/gms/src/java/com/sun/enterprise/mgmt/transport/BlockingIOMulticastSender.java?r1=1.1.2.5&r2=1.1.2.6>
>> Delta lines: +6 -3
>> -------------------
>> --- BlockingIOMulticastSender.java 2009-07-21 05:50:58+0000 1.1.2.5
>> +++ BlockingIOMulticastSender.java 2009-12-10 17:04:59+0000 1.1.2.6
>> ...
>> @@ -175,17 +176,19 @@
>> multicastSocket.leaveGroup( multicastAddress );
>> } catch( IOException e ) {
>> }
>> *- multicastSocket.close();
>> - multicastSocket = null;
>> * }
>> boolean finished = false;
>> try {
>> *finished = endGate.await( shutdownTimeout, TimeUnit.MILLISECONDS );*
>> } catch( InterruptedException e ) {
>> }
>> - if( !finished && multicastThread != null )
>> + if( !finished && multicastThread != null ){
>> multicastThread.interrupt();
>> }
>> *+ multicastSocket.close();
>> + multicastSocket = null;
>> *...
>> + }
>>
>> public void run() {
>> try {
>> ---
>> If the MulticastSocket is closed after
>> endGate(CountDownLatch).await(), the endGate can almost never be waked
>> because of I/O read block.
>> See the following code:
>> In BlockingIOMulticastSender.java
>> ---
>> ...
>> private long shutdownTimeout = *30 * 1000*; // ms
>> ...
>> public void run() {
>> try {
>> while( running ) {
>> ...
>> try {
>> * multicastSocket.receive( packet );*
>> if( !running )
>> return;
>> ...
>> }
>> } catch(...) {
>> ...
>> } finally {
>> multicastThread = null;
>> * endGate.countDown();*
>> }
>> }
>> ---
>> So I think that the MulticastSocket should be closed before
>> endGate.await().
>> What do you think?
> The motivation behind closing the multicastSocket AFTER interrupting the
> MulticastSocket thread was I occasionally
> was observing an uncaught NPE in multicastThread accessing the nulled
> multicastsocket. (namely the line "multicastSocket.receive(packet)").
>
> I do agree with your assessment that my change has the undesirable
> effect of waiting 30 seconds.
> I neglected to comprehend the endGate wait. I do understand it now. Wait
> at least 30 seconds before interrupting the
> thread to allow processing of the the currently being handled message.
> Interrupt it after 30 seconds.
>
> So I agree that the 12/10/2009 change to move the close should be reverted.
>
> However, the concurrency issue on multicastSocket still needs to be
> addressed between the thread calling stop() and the multicastSocket
> thread both accessing
> the variable at same time.
>
> The following is close to addressing the concurrency issue but I am not
> completely satisfied with it (so I did not do it this way originally.)
>
> Declare multicastSocket as volatile and check if multicastSocket == null
> before calling multicastSocket.receive() in the run method above.
> The javadoc for the close states that another thread that is blocked
> receiving from the closed socket will receive a SocketException,
> so that sounds like it is okay.
>
> Perhaps you have a better idea than above. Thanks for catching this.
>
> -Joe
>> Thanks.
>>
>> Regards,
>> Bongjae Chang
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe_at_shoal.dev.java.net
> For additional commands, e-mail: dev-help_at_shoal.dev.java.net
>