dev@glassfish.java.net

Re: Server is up, but asadmin restart-domain times out

From: Ming Zhang <ming.zhang_at_oracle.com>
Date: Tue, 29 Jun 2010 14:49:20 -0700

I have filed issue 12420 for the problem related to "asadmin
restart-domain" command.

The current "restart-server-unix" or "restart-server-windows" targets in
QL are not tests since they don't report status. They were checked in
without my review. Next time, please let me know when anyone checks in
targets to the top level build scripts since they affect the whole QL.
Meanwhile, I'll try to create a test for restart-domain.

Thanks,
Ming

On 6/29/2010 2:06 PM, Jane Young wrote:
> Sahoo,
>
> Ming is looking at fixing the QL test to fail with the restart-domain
> command.
> If he commits the fix for QL, I will revert HK2 1.0.26 integration
> since QL tests will start failing.
>
> Thanks,
> Jane
>
>
> Amy Roh wrote:
>> I've seen this running QL also. Web devtests [1] fail ~50% due to
>> failing to restart. However, when I check, the server is actually
>> running.
>>
>> startDomainUnix:
>> [echo] Starting DAS, ENABLE_REPLICATION=false
>> [exec] Error starting domain: domain1. It didn't start in 600
>> seconds
>> [exec] Waiting for the server to start

>>
>> [exec] Command start-domain failed.
>>
>> [1] http://hudson.sfbay.sun.com/job/webtier-dev-tests-v3
>>
>> Sanjeeb Sahoo wrote:
>>> This is interesting. My QL test failed to detect that server has
>>> restarted fine. Given below is the QL output...
>>>
>>> restart-server-unix:
>>> [echo] restarting server
>>> [exec] Timed out waiting for the server to restart
>>> [exec] Command restart-domain failed.
>>> [exec] Result: 1
>>> [exec] Waiting for the domain to stop .
>>> [exec] Command stop-domain executed successfully.
>>>
>>>
>>> While it was waiting for the server to restart, I ran a jps and
>>> found the following Java processes running:
>>>
>>> ss141213_at_Sahoo:/space/ss141213/WS/gf/v3$ jps
>>> 23093 Jps
>>> 20153 DerbyControl
>>> 20033 Launcher
>>> 22489 admin-cli.jar
>>> 10312 Main
>>> 22538 ASMain
>>>
>>> What surprised me was that the server was actually up. I could load
>>> admin console and run admin commands. For some reason,
>>> restart-domain failed to detect the same. I checked the pid file in
>>> domain1/config/ and that contained the right value. jstack output
>>> for admin-cli.jar is shown below:
>>>
>>> ss141213_at_Sahoo:/space/ss141213/WS/gf/v3$ jstack 22489
>>> 2010-06-30 01:19:55
>>> Full thread dump Java HotSpot(TM) Server VM (14.2-b01 mixed mode):
>>>
>>> "Attach Listener" daemon prio=10 tid=0x085d1c00 nid=0x5a57 waiting
>>> on condition [0x00000000]
>>> java.lang.Thread.State: RUNNABLE
>>>
>>> "Low Memory Detector" daemon prio=10 tid=0x7fd15c00 nid=0x57e7
>>> runnable [0x00000000]
>>> java.lang.Thread.State: RUNNABLE
>>>
>>> "CompilerThread1" daemon prio=10 tid=0x7fd13800 nid=0x57e6 waiting
>>> on condition [0x00000000]
>>> java.lang.Thread.State: RUNNABLE
>>>
>>> "CompilerThread0" daemon prio=10 tid=0x7fd12000 nid=0x57e5 waiting
>>> on condition [0x00000000]
>>> java.lang.Thread.State: RUNNABLE
>>>
>>> "Signal Dispatcher" daemon prio=10 tid=0x7fd10800 nid=0x57e4
>>> runnable [0x00000000]
>>> java.lang.Thread.State: RUNNABLE
>>>
>>> "Finalizer" daemon prio=10 tid=0x7fd00800 nid=0x57e3 in
>>> Object.wait() [0x7fe96000]
>>> java.lang.Thread.State: WAITING (on object monitor)
>>> at java.lang.Object.wait(Native Method)
>>> - waiting on <0x845b4780> (a java.lang.ref.ReferenceQueue$Lock)
>>> at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
>>> - locked <0x845b4780> (a java.lang.ref.ReferenceQueue$Lock)
>>> at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
>>> at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
>>>
>>> "Reference Handler" daemon prio=10 tid=0x08343400 nid=0x57e2 in
>>> Object.wait() [0x7fee7000]
>>> java.lang.Thread.State: WAITING (on object monitor)
>>> at java.lang.Object.wait(Native Method)
>>> - waiting on <0x845b4808> (a java.lang.ref.Reference$Lock)
>>> at java.lang.Object.wait(Object.java:485)
>>> at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
>>> - locked <0x845b4808> (a java.lang.ref.Reference$Lock)
>>>
>>> "main" prio=10 tid=0x082c3000 nid=0x57de waiting on condition
>>> [0xb6aea000]
>>> java.lang.Thread.State: TIMED_WAITING (sleeping)
>>> at java.lang.Thread.sleep(Native Method)
>>> at
>>> com.sun.enterprise.admin.cli.LocalServerCommand.waitForRestart(LocalServerCommand.java:307)
>>>
>>> at
>>> com.sun.enterprise.admin.cli.RestartDomainCommand.doCommand(RestartDomainCommand.java:87)
>>>
>>> at
>>> com.sun.enterprise.admin.cli.StopDomainCommand.executeCommand(StopDomainCommand.java:130)
>>>
>>> at
>>> com.sun.enterprise.admin.cli.CLICommand.execute(CLICommand.java:255)
>>> at
>>> com.sun.enterprise.admin.cli.AsadminMain.executeCommand(AsadminMain.java:229)
>>>
>>> at
>>> com.sun.enterprise.admin.cli.AsadminMain.main(AsadminMain.java:167)
>>>
>>> "VM Thread" prio=10 tid=0x0833f400 nid=0x57e1 runnable
>>>
>>> "GC task thread#0 (ParallelGC)" prio=10 tid=0x082ca000 nid=0x57df
>>> runnable
>>>
>>> "GC task thread#1 (ParallelGC)" prio=10 tid=0x082cb400 nid=0x57e0
>>> runnable
>>>
>>> "VM Periodic Task Thread" prio=10 tid=0x7fd17c00 nid=0x57e8 waiting
>>> on condition
>>>
>>> JNI global references: 1199
>>>
>>> Has anyone notices such behavior?
>>>
>>> Thanks,
>>> Sahoo
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe_at_glassfish.dev.java.net
>>> For additional commands, e-mail: dev-help_at_glassfish.dev.java.net
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe_at_glassfish.dev.java.net
>> For additional commands, e-mail: dev-help_at_glassfish.dev.java.net
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe_at_glassfish.dev.java.net
> For additional commands, e-mail: dev-help_at_glassfish.dev.java.net
>