users@glassfish.java.net

Re: Cluster sometimes stops without any exception

From: Fialli Joe <joe.fialli_at_oracle.com>
Date: Mon, 26 Nov 2012 10:56:09 -0500

On 11/26/12 9:03 AM, forums_at_java.net wrote:
> I have the following problem. Our Glassfish 3.1 server is configured
> with a
> cluster for some heavy processing. From time to time, quite rarely, the
> cluster shuts down without any known reason. I do not find any
> exceptions in
> the log, just the server logging shut down messages. I am attaching 2
> fragments of the log file, one from the DAS and the other from one of the
> cluster instances.
I did not see any ShoalLogger instance shutdown messages in the
submitted log file entries.
You should look over all log file entries that contain ShoalLogger to
track the life cycles of
each clustered instance.

> All the nodes are on the same machine. One issue that i
> find in the logs is related to the number of members listed when
> receiving
> GMS instances. We only have 3 memebers (DAS and two cluster instances)
> but 10
> are listed. Would this have anything to do with the cluster being
> stopped?
A fragment taken from the GMS View in the submitted log.

There is not enough info provided to answer your questions, but I can
provide an interpretation of the one GMS View that was submitted.

Note that the 3 clustered instances below all have same common instance
name "POD_Processing_Cl01:POD_Processing_Cl01_ins01"
but they have distinct Shoal GMS listener network address and ports.
Since all 3 instances are using same multicast group address and port,
they are joining the same Shoal GMS cluster POD_Processing_Cl01.

1: MemberId:
POD_Processing_Cl01_ins01, MemberType: CORE,
Address:
*10.220.10.67:9171*:/228.9.103.177:16084/:POD_Processing_Cl01:POD_Processing_Cl01_ins01


2: MemberId: POD_Processing_Cl01_ins01, MemberType: CORE,
Address:
*10.220.20.110:9181*:/228.9.103.177:16084/:POD_Processing_Cl01:POD_Processing_Cl01_ins01


3: MemberId: POD_Processing_Cl01_ins01, MemberType: CORE,
Address:
*10.220.20.195:9134*:/228.9.103.177:16084/:POD_Processing_Cl01:POD_Processing_Cl01_ins01



The bold ip address and port above, such as *10.220.10.67:9171, * is the
network interface address and port for shoal GMS Listener for the
clustered instance.
The italic ip address and port above, such as /228.9.103.177:16084, /is
the multicast group address and multicast port being used by Shoal GMS
on the clustered instance.

I am uncertain if someone else is running the exact same configuration
on your subnet (and they should not be sharing the same manually set
multicast group address and port) or
if you are running on one machine with virtual vms that are producing
the multiple ip addresses on one machine. Just be aware that there are
definitely 3 distinct clustered instances
with same clustered instance name, different port listeners and joining
the same cluster running in your network environment.

Shoal GMS will allow multiple clustered instances with exact same name
in same subnet AS LONG as the multicast group address and ports are
different.
A configuration error could lead to the above but is not the only reason
the above could happen.

-Joe


> Any help on this would most welcomed. Thanks DAS 11:46:45,322 |-INFO in
> ch.qos.logback.classic.LoggerContext[default] - Found resource
> [logback.xml]
> at
> [file:/E:/Glassfish3/glassfish/domains/domain1/generated/jsp/managementtool-1.12.0.20/loader_1608711619/logback.xml]
>
> 11:46:45,416 |-ERROR in
> ch.qos.logback.classic.joran.action.InsertFromJNDIAction -
> [SMTPServer] has
> null or empty value 11:46:45,416 |-INFO in
> ch.qos.logback.classic.joran.action.InsertFromJNDIAction - Setting
> context
> variable [LOG_LOCATION] to [E:\\v1\\logs] 11:46:45,416 |-INFO in
> ch.qos.logback.core.joran.action.AppenderAction - About to instantiate
> appender of type [ch.qos.logback.core.rolling.RollingFileAppender]
> 11:46:45,416 |-INFO in ch.qos.logback.core.joran.action.AppenderAction -
> Naming appender as [FILE] 11:46:45,478 |-INFO in
> c.q.l.core.rolling.TimeBasedRollingPolicy - No compression will be used
> 11:46:45,494 |-INFO in c.q.l.core.rolling.TimeBasedRollingPolicy -
> Will use
> the pattern E://v1//logs/managementtool/spm-%d{yyyy-MM-dd}.%i.log for the
> active file 11:46:45,494 |-INFO in
> ch.qos.logback.core.rolling.SizeAndTimeBasedFNATP_at_468b2c18 - The date
> pattern
> is 'yyyy-MM-dd' from file name pattern
> 'E://v1//logs/managementtool/spm-%d{yyyy-MM-dd}.%i.log'. 11:46:45,494
> |-INFO
> in ch.qos.logback.core.rolling.SizeAndTimeBasedFNATP_at_468b2c18 -
> Roll-over at
> midnight. 11:46:45,494 |-INFO in
> ch.qos.logback.core.rolling.SizeAndTimeBasedFNATP_at_468b2c18 - Setting
> initial
> period to Fri Jun 29 11:46:45 EEST 2012 11:46:45,494 |-INFO in
> ch.qos.logback.core.joran.action.NestedComplexPropertyIA - Assuming
> default
> type [ch.qos.logback.classic.encoder.PatternLayoutEncoder] for [encoder]
> property 11:46:45,588 |-INFO in
> ch.qos.logback.core.rolling.RollingFileAppender[FILE] - Active log
> file name:
> E://v1//logs/managementtool/spm-2012-06-29.0.log 11:46:45,588 |-INFO in
> ch.qos.logback.core.rolling.RollingFileAppender[FILE] - File property
> is set
> to [null] 11:46:45,588 |-INFO in
> ch.qos.logback.core.joran.action.AppenderAction - About to instantiate
> appender of type [ch.qos.logback.core.ConsoleAppender] 11:46:45,603
> |-INFO in
> ch.qos.logback.core.joran.action.AppenderAction - Naming appender as
> [STDOUT]
> 11:46:45,603 |-INFO in
> ch.qos.logback.core.joran.action.NestedComplexPropertyIA - Assuming
> default
> type [ch.qos.logback.classic.encoder.PatternLayoutEncoder] for [encoder]
> property 11:46:45,603 |-INFO in
> ch.qos.logback.classic.joran.action.RootLoggerAction - Setting level
> of ROOT
> logger to INFO 11:46:45,603 |-INFO in
> ch.qos.logback.core.joran.action.AppenderRefAction - Attaching
> appender named
> [STDOUT] to Logger[ROOT] 11:46:45,603 |-INFO in
> ch.qos.logback.classic.joran.action.LoggerAction - Setting level of
> logger
> [com.mypackage.central] to DEBUG 11:46:45,603 |-INFO in
> ch.qos.logback.classic.joran.action.LoggerAction - Setting level of
> logger
> [com.mypackage.managementtool] to DEBUG 11:46:45,603 |-INFO in
> ch.qos.logback.classic.joran.action.ConfigurationAction - End of
> configuration. 11:46:45,603 |-INFO in
> ch.qos.logback.classic.joran.JoranConfigurator_at_80e1ba6 - Registering
> current
> configuration as safe fallback
> [#|2012-06-29T11:47:09.479+0300|INFO|glassfish3.1.1|javax.enterprise.system.tools.admin.org.glassfish.deployment.admin|_ThreadID=111;_ThreadName=Thread-2;|managementtool-1.12.0.20
>
> was successfully deployed in 49,564 milliseconds.|#]
> [#|2012-06-29T11:49:57.263+0300|INFO|glassfish3.1.1|ShoalLogger|_ThreadID=19;_ThreadName=Thread-2;|GMS1015:
>
> Received Group Shutting down message from member: server of group:
> POD_Processing_Cl01|#]
> [#|2012-06-29T11:49:57.263+0300|INFO|glassfish3.1.1|ShoalLogger|_ThreadID=18;_ThreadName=Thread-2;|GMS1092:
>
> GMS View Change Received for group: POD_Processing_Cl01 : Members in
> view for
> MASTER_CHANGE_EVENT(before change analysis) are : 1: MemberId:
> POD_Processing_Cl01_ins01, MemberType: CORE, Address:
> 10.220.10.67:9171:228.9.103.177:16084:POD_Processing_Cl01:POD_Processing_Cl01_ins01
>
> 2: MemberId: POD_Processing_Cl01_ins01, MemberType: CORE, Address:
> 10.220.20.110:9181:228.9.103.177:16084:POD_Processing_Cl01:POD_Processing_Cl01_ins01
>
> 3: MemberId: POD_Processing_Cl01_ins01, MemberType: CORE, Address:
> 10.220.20.195:9134:228.9.103.177:16084:POD_Processing_Cl01:POD_Processing_Cl01_ins01
>
> 4: MemberId: POD_Processing_Cl01_ins02, MemberType: CORE, Address:
> 10.220.10.67:9106:228.9.103.177:16084:POD_Processing_Cl01:POD_Processing_Cl01_ins02
>
> 5: MemberId: POD_Processing_Cl01_ins02, MemberType: CORE, Address:
> 10.220.20.110:9152:228.9.103.177:16084:POD_Processing_Cl01:POD_Processing_Cl01_ins02
>
> 6: MemberId: POD_Processing_Cl01_ins02, MemberType: CORE, Address:
> 10.220.20.195:9090:228.9.103.177:16084:POD_Processing_Cl01:POD_Processing_Cl01_ins02
>
> 7: MemberId: server, MemberType: SPECTATOR, Address:
> 10.220.10.67:9157:228.9.103.177:16084:POD_Processing_Cl01:server 8:
> MemberId:
> server, MemberType: SPECTATOR, Address:
> 10.220.20.110:9154:228.9.103.177:16084:POD_Processing_Cl01:server 9:
> MemberId: server, MemberType: SPECTATOR, Address:
> 10.220.20.195:9149:228.9.103.177:16084:POD_Processing_Cl01:server 10:
> MemberId: server, MemberType: SPECTATOR, Address:
> 10.220.20.197:9129:228.9.103.177:16084:POD_Processing_Cl01:server and
> Cluster
> 2012-06-29 11:43:27,396 [com-mypackage-worker-Execution slot for task
> {runId=4121, environmentId=33}}] DEBUG envId{33} - sesId{} -
> com.mypackage.worker.taskprocessing.flow.stage.DispatcherNotificationStage[61]
>
> - Beginning stage WORKER_DISPATCHER_NOTIFICATION... 2012-06-29
> 11:43:27,396
> [com-mypackage-worker-Execution slot for task {runId=4121,
> environmentId=33}}] INFO envId{33} - sesId{} -
> com.mypackage.worker.communication.jms.JMSCompletionMessageSender[45] -
> Sending to Dispatcher JMS completion message for task {runId=4121,
> environmentId=33}, status=RUN_COMPLETED}... 2012-06-29 11:43:27,411
> [com-mypackage-worker-Execution slot for task {runId=4121,
> environmentId=33}}] INFO envId{33} - sesId{} -
> com.mypackage.worker.communication.jms.JMSCompletionMessageSender[45] -
> Sending to Dispatcher JMS completion message for task {runId=4121,
> environmentId=33}, status=RUN_COMPLETED}...DONE 2012-06-29 11:43:27,411
> [com-mypackage-worker-Execution slot for task {runId=4121,
> environmentId=33}}] DEBUG envId{33} - sesId{} -
> com.mypackage.worker.taskprocessing.flow.stage.DispatcherNotificationStage[61]
>
> - Ending stage WORKER_DISPATCHER_NOTIFICATION... 2012-06-29 11:43:27,411
> [com-mypackage-worker-Execution slot for task {runId=4121,
> environmentId=33}}] INFO envId{33} - sesId{} -
> com.mypackage.worker.taskprocessing.slotpool.SlotPool[49] - Releasing
> execution slot for completed task {runId=4121, environmentId=33}}...
> 2012-06-29 11:43:27,411 [com-mypackage-worker-Execution slot for task
> {runId=4121, environmentId=33}}] INFO envId{33} - sesId{} -
> com.mypackage.worker.taskprocessing.slotpool.SlotPool[49] - Releasing
> execution slot for completed task {runId=4121, environmentId=33}}...DONE
> 2012-06-29 11:43:27,411 [com-mypackage-worker-Execution slot for task
> {runId=4121, environmentId=33}}] DEBUG envId{33} - sesId{} -
> com.mypackage.worker.taskprocessing.slotpool.SlotPool[61] - Ending stage
> WORKER_SLOT_POOL...task {runId=4121, environmentId=33}} 2012-06-29
> 11:43:27,411 [com-mypackage-worker-Execution slot for task {runId=4121,
> environmentId=33}}] DEBUG envId{33} - sesId{} -
> com.mypackage.worker.taskprocessing.ProcessingSlot[61] - Ending method
> doProcess()... 2012-06-29 11:43:27,411 [com-mypackage-worker-Execution
> slot
> for task {runId=4121, environmentId=33}}] DEBUG envId{33} - sesId{} -
> com.mypackage.worker.taskprocessing.ProcessingSlot[61] - Ending stage
> WORKER_SLOT... 2012-06-29 11:44:00,443 [admin-thread-pool-24850(5)] DEBUG
> envId{} - sesId{} -
> com.mypackage.worker.service.WorkerInstancePropertiesService[61] -
> Beginning
> method destroy()... 2012-06-29 11:44:00,459 [admin-thread-pool-24850(5)]
> DEBUG envId{} - sesId{} -
> com.mypackage.worker.service.WorkerInstancePropertiesService[61] - Ending
> method destroy()... 2012-06-29 11:44:00,459 [admin-thread-pool-24850(5)]
> DEBUG envId{} - sesId{} - com.mypackage.worker.WorkerImpl[61] - Beginning
> method destroy()... 2012-06-29 11:44:00,459 [admin-thread-pool-24850(5)]
> DEBUG envId{} - sesId{} - com.mypackage.worker.WorkerImpl[61] - Ending
> method
> destroy()... 2012-06-29 11:44:00,459 [admin-thread-pool-24850(5)] DEBUG
> envId{} - sesId{} -
> com.mypackage.worker.taskrepository.TaskPreallocationQueue[61] -
> Beginning
> method shutdown()... 2012-06-29 11:44:00,459 [admin-thread-pool-24850(5)]
> DEBUG envId{} - sesId{} -
> com.mypackage.worker.taskrepository.TaskPreallocationQueue[61] - Ending
> method shutdown()... 2012-06-29 11:44:00,459 [admin-thread-pool-24850(5)]
> INFO envId{} - sesId{} -
> org.springframework.context.support.ClassPathXmlApplicationContext[1020]
> -
> Closing
> org.springframework.context.support.ClassPathXmlApplicationContext_at_aedb61f:
>
> startup date [Fri Jun 29 11:31:34 EEST 2012]; root of context hierarchy
> 2012-06-29 11:44:00,459 [admin-thread-pool-24850(5)] DEBUG envId{} -
> sesId{}
> - com.mypackage.appframework.persistence.PersistenceServiceImpl[74] -
> Destroying datasource for environment id : 33 2012-06-29 11:44:00,599
> [admin-thread-pool-24850(5)] INFO envId{} - sesId{} -
> com.mypackage.appframework.persistence.PersistenceServiceImpl[76] -
> Destroyed
> data source for environment id : 33 2012-06-29 11:44:00,599
> [admin-thread-pool-24850(5)] INFO envId{} - sesId{} -
> org.hibernate.impl.SessionFactoryImpl[925] - closing 2012-06-29
> 11:44:00,615
> [admin-thread-pool-24850(5)] INFO envId{} - sesId{} -
> org.springframework.jmx.export.MBeanExporter[429] - Unregistering
> JMX-exposed
> beans on shutdown 2012-06-29 11:44:00,646 [Duration logger] INFO
> envId{} -
> sesId{} - com.mypackage.appframework.util.Consumer[75] - Duration logger
> thread is shutting down 2012-06-29 11:44:00,646 [Alert reporter] INFO
> envId{}
> - sesId{} - com.mypackage.appframework.util.Consumer[75] - Alert reporter
> thread is shutting down 2012-06-29 11:44:00,709
> [admin-thread-pool-24850(5)]
> INFO envId{} - sesId{} -
> com.mypackage.appframework.cache.metadata.MetadataRegistrationService[78]
> -
> Successfully unregistered for environment with id 33 and name Env1
> 2012-06-29
> 11:44:00,724 [admin-thread-pool-24850(5)] INFO envId{} - sesId{} -
> org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean[441] -
> Closing JPA EntityManagerFactory for persistence unit 'central'
> 2012-06-29
> 11:44:00,740 [admin-thread-pool-24850(5)] INFO envId{} - sesId{} -
> org.hibernate.impl.SessionFactoryImpl[925] - closing 2012-06-29
> 11:44:00,740
> [admin-thread-pool-24850(5)] DEBUG envId{} - sesId{} -
> com.mypackage.central.persistence.CentralDataSource[57] - Destroying
> central
> data source.
>
> --
>
> [Message sent by forum member 'sebigavril']
>
> View Post: http://forums.java.net/node/892700
>
>