users@shoal.java.net

Exception thrown when join group without network connection

From: <tim.shiu_at_ssc-ltd.com>
Date: Sat, 3 Dec 2011 05:16:54 +0000 (GMT)

Hi All,

I'm using Shoal for fail-over and would like to ask if anybody have any
idea on the issue I'm facing.

Assume that there are 2 machines, named A and B in the same group which
will monitor to each other by using the failure notification. When a
failure signal is received, the machine will check if the group only
remain 1 member (which is itself). If yes, it will perform a re-join
group action. i.e. leave the group and join the group again until the
group has more than 1 member. (My system require this mechanism to
ensure the data in DSC is up-to-date)

Everything is OK if the machine is failure other than network problem.
(Like kill the process, plug out the power supply, etc...)

Unfortunately, when I plug out the LAN cable of machine A, both
machines A and B will be notified the other machine is failed. For
machine B, it can leave and join to the group gracefully without any
problem. But for machine A, exception will be thrown during joining
group.

The following is a fragment on the stack trace of the mentioned
exception:
Caused by: com.sun.enterprise.ee.cms.core.GMSException: initialization
failure
        at
com.sun.enterprise.mgmt.ClusterManager.<init>(ClusterManager.java:142)
        at
com.sun.enterprise.ee.cms.impl.base.GroupCommunicationProviderImpl.init
ializeGroupCommunicationProvider(GroupCommunicationProviderImpl.java:16
4)
        at
com.sun.enterprise.ee.cms.impl.base.GMSContextImpl.join(GMSContextImpl.
java:175)
        ... 7 more
Caused by: java.net.SocketException: No such device
        at java.net.PlainDatagramSocketImpl.join(Native Method)
        at
java.net.PlainDatagramSocketImpl.join(PlainDatagramSocketImpl.java:181)
        at java.net.MulticastSocket.joinGroup(MulticastSocket.java:277)
        at
com.sun.enterprise.mgmt.transport.BlockingIOMulticastSender.start(Block
ingIOMulticastSender.java:201)
        at
com.sun.enterprise.mgmt.transport.grizzly.grizzly2.GrizzlyNetworkManage
r2.start(GrizzlyNetworkManager2.java:276)
        at
com.sun.enterprise.mgmt.ClusterManager.<init>(ClusterManager.java:140)
        ... 9 more

After retried around a hundred times, it changed into another exception
and no more socket can be open in the server until I restart the
webserver to release the resources.

The exception stack trace fragment:
Caused by: com.sun.enterprise.ee.cms.core.GMSException: initialization
failure
        at
com.sun.enterprise.mgmt.ClusterManager.<init>(ClusterManager.java:142)
        at
com.sun.enterprise.ee.cms.impl.base.GroupCommunicationProviderImpl.init
ializeGroupCommunicationProvider(GroupCommunicationProviderImpl.java:16
4)
        at
com.sun.enterprise.ee.cms.impl.base.GMSContextImpl.join(GMSContextImpl.
java:175)
        ... 7 more
Caused by: java.io.IOException: Too many open files
        at sun.nio.ch.IOUtil.initPipe(Native Method)
        at
sun.nio.ch.EPollSelectorImpl.<init>(EPollSelectorImpl.java:49)
        at
sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.jav
a:18)
        at java.nio.channels.Selector.open(Selector.java:209)
        at
org.glassfish.grizzly.nio.SelectorFactory$DefaultSelectorFactory.create
(SelectorFactory.java:74)
        at
org.glassfish.grizzly.nio.SelectorRunner.create(SelectorRunner.java:101
)
        at
org.glassfish.grizzly.nio.NIOTransport.startSelectorRunners(NIOTranspor
t.java:105)
        at
org.glassfish.grizzly.nio.transport.TCPNIOTransport.start(TCPNIOTranspo
rt.java:276)
        at
com.sun.enterprise.mgmt.transport.grizzly.grizzly2.GrizzlyNetworkManage
r2.start(GrizzlyNetworkManager2.java:195)
        at
com.sun.enterprise.mgmt.ClusterManager.<init>(ClusterManager.java:140)
        ... 9 more

So, I would like:
1) Is it possible to prevent the exception thrown while the network is
not available?
2) Can the resources be released after the No such device exception is
thrown by Grizzly?

Thanks a lot.

P.S. I'm using linux servers with JDK 1.6 Update 29 and using Shoal
1.6.13 and Network manager is using grizzly-framework 2.1.7.

Regards,
Tim.Shiu