dev@glassfish.java.net

Re: Timeout when stop the domain with --force=false after deployed the ejb application

From: ÂÀËÎƽ <lvsongping_at_gmail.com>
Date: Fri, 3 May 2013 23:57:53 +0800

Hi, Tom:

     As I was out of my office so that I can't provide the information
about ejb right now, but I will try to simulate a new one later so that you
can isolate this problem.

Thanks

Jeremy


2013/5/3 Tom Mueller <Tom.Mueller_at_oracle.com>

> Yes, Jeremy has pointed out the difference. What this really means is
> that when force=false, we are depending on the GlassFish.stop() method to
> cause all non-daemon threads to terminate. What is probably happening is
> that there is some code somewhere that is creating a long running thread,
> perhaps in a thread pool, which is not a daemon thread.
>
> In the jstack.txt that Jeremy sent, the non-daemon thread is this one:
>
> "Thread-30" prio=6 tid=0x34584800 nid=0xb28 in Object.wait() [0x33ddf000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> at java.lang.Object.wait(Object.java:503)
> at com.sun.corba.ee.impl.javax.rmi.CORBA.KeepAlive.run(Util.java:818)
> - locked <0x12130ba8> (a
> com.sun.corba.ee.impl.javax.rmi.CORBA.KeepAlive)
>
>
> If I just run start-domain on domain with no applications deployed, and
> then run stop-domain --force=false, I'm seeing this non-daemon thread there:
>
> "pool-9-thread-1" prio=5 tid=0x00007faf480ba000 nid=0xa203 waiting on
> condition [0x00000001405d4000]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x000000012dc504c0> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
> at
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:722)
>
> In some cases, this thread does exit before stop-domain times out, but in
> some cases it does not.
>
> I suspect that these are two different problems. The Hello.jar problem is
> probably related to some service that the application is using that is
> causing this CORBA thread to be created. The "pool-9-thread-1" problem is
> something else. I've created the following issue for this second problem:
> https://java.net/jira/browse/GLASSFISH-20463
>
> Jeremy, can you provide more information about what is in Hello.jar so
> that we can isolate this problem. If you like, please file a bug on this
> and include the details about Hello.jar there.
>
> Thanks.
>
> Tom
>
>
> On 5/3/13 8:51 AM, ÂÀËÎƽ wrote:
>
> Hi, Hong:
> I found some differences about the logical between --force=true and
> --force=false as follows:
>
> StopServer.doExecute
>
> protected final void doExecute(ServiceLocator habitat,
> ServerEnvironment env, Logger logger, boolean force) {
>
> try {
>
> logger.info(localStrings.getLocalString("stop.domain.init", "Server
> shutdown initiated"));
>
> // Don't shutdown GlassFishRuntime, as that can bring the
> OSGi framework down which is wrong
>
> // when we are embedded inside an existing runtime. So, just
> stop the glassfish instance that
>
> // we are supposed to stop. Leave any cleanup to some other
> code.
>
>
> // get the GlassFish object - we have to wait in case
> startup is still in progress
>
> // This is a temporary work-around until HK2 supports waiting
> for the service to
>
> // show up in the ServiceLocator.
>
> GlassFish gfKernel = habitat.getService(GlassFish.class);
>
> while (gfKernel == null) {
>
> Thread.sleep(1000);
>
> gfKernel = habitat.getService(GlassFish.class);
>
> }
>
> // gfKernel is absolutely positively for-sure not null.
>
> gfKernel.stop();
>
> }
>
> catch (Throwable t) {
>
> // ignore
>
> }
>
>
>
> if(force)
>
> System.exit(0);
>
> else
>
> deletePidFile(env);
>
> }
>
>
> after we type as --force=true option, it will execute the
> the System.exit(0); and the server or cluster will stop as expected.
>
>
> Thanks
>
>
> Jeremy
>
>
>
>
> 2013/5/3 Hong Zhang <hong.hz.zhang_at_oracle.com>
>
>> Hi, Jeremy
>> I could see the stop-domain command also hang for me when I used the
>> option force=false. The stop-domain command executed successfully when I
>> did not specify the force option.
>>
>> Tom: what's the difference between when --force option is false versus
>> true? When could user specify the option value as false? Should they just
>> always stick with the default "true" value?
>>
>> Thanks,
>>
>> - Hong
>>
>>
>> On 5/3/2013 2:41 AM, lvsongping wrote:
>>
>> Hi, Hong, Marina:
>>
>> Cc: Tom, dev:
>>
>>
>>
>> I have found a strange situation that it will be timeout if I stop the
>> DAS or instance with ¨Cforce=false after I have deployed an ejb application.
>> Here¡¯s my reproduced steps:
>>
>>
>>
>> 1). asadmin start-domain
>>
>>
>>
>> 2). asadmin deploy Hello.jar
>>
>> Application deployed with name Hello.
>>
>> Command deploy executed successfully.
>>
>>
>>
>> 3). asadmin stop-domain ¨Cforce=false
>>
>> Waiting for the domain to stop .......................................
>>
>> Timed out (60 seconds) waiting for the domain to stop.
>>
>> Command stop-domain failed.
>>
>>
>>
>> 4).jstack jvm_pid > jstack.txt(I have attached the jstack file).
>>
>>
>>
>> 5). asadmin start-domain
>>
>> Waiting for domain1 to start .Error starting domain domain1.
>>
>> The server exited prematurely with exit code 1.
>>
>> Before it died, it produced the following output:
>>
>>
>>
>> FATAL ERROR in native method: JDWP No transports initialized,
>> jvmtiError=AGENT_E
>>
>> RROR_TRANSPORT_INIT(197)
>>
>> ERROR: transport error 202: bind failed: Address already in use
>>
>> ERROR: JDWP Transport dt_socket failed to initialize, TRANSPORT_INIT(510)
>>
>> JDWP exit error AGENT_ERROR_TRANSPORT_INIT(197): No transports
>> initialized [../.
>>
>> ./../src/share/back/debugInit.c:750]
>>
>>
>>
>> Command start-domain failed.
>>
>>
>>
>> Then the domain can¡¯t be start normally, I think somewhere must lock the
>> file because of deploy the ejb application. But I don¡¯t the exactly reason
>> about this.
>>
>>
>>
>> BTW: <1>. The exception will not come out if we stop the domain or
>> cluster with default option of ¨Cforce.
>>
>> <2>. The exception will not come out if stop the DAS and cluster after
>> only deployed the web application.
>>
>>
>>
>>
>>
>> Thanks
>>
>>
>>
>> -Jeremy
>>
>>
>>
>>
>>
>
>