dev@glassfish.java.net

Re: Server is up, but asadmin restart-domain times out

From: Jane Young <jane.young_at_oracle.com>
Date: Tue, 29 Jun 2010 14:06:49 -0700

Sahoo,

Ming is looking at fixing the QL test to fail with the restart-domain
command.
If he commits the fix for QL, I will revert HK2 1.0.26 integration
since QL tests will start failing.

Thanks,
Jane


Amy Roh wrote:
> I've seen this running QL also. Web devtests [1] fail ~50% due to
> failing to restart. However, when I check, the server is actually
> running.
>
> startDomainUnix:
> [echo] Starting DAS, ENABLE_REPLICATION=false
> [exec] Error starting domain: domain1. It didn't start in 600
> seconds
> [exec] Waiting for the server to start
> ......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
>
> [exec] Command start-domain failed.
>
> [1] http://hudson.sfbay.sun.com/job/webtier-dev-tests-v3
>
> Sanjeeb Sahoo wrote:
>> This is interesting. My QL test failed to detect that server has
>> restarted fine. Given below is the QL output...
>>
>> restart-server-unix:
>> [echo] restarting server
>> [exec] Timed out waiting for the server to restart
>> [exec] Command restart-domain failed.
>> [exec] Result: 1
>> [exec] Waiting for the domain to stop .
>> [exec] Command stop-domain executed successfully.
>>
>>
>> While it was waiting for the server to restart, I ran a jps and found
>> the following Java processes running:
>>
>> ss141213_at_Sahoo:/space/ss141213/WS/gf/v3$ jps
>> 23093 Jps
>> 20153 DerbyControl
>> 20033 Launcher
>> 22489 admin-cli.jar
>> 10312 Main
>> 22538 ASMain
>>
>> What surprised me was that the server was actually up. I could load
>> admin console and run admin commands. For some reason, restart-domain
>> failed to detect the same. I checked the pid file in domain1/config/
>> and that contained the right value. jstack output for admin-cli.jar
>> is shown below:
>>
>> ss141213_at_Sahoo:/space/ss141213/WS/gf/v3$ jstack 22489
>> 2010-06-30 01:19:55
>> Full thread dump Java HotSpot(TM) Server VM (14.2-b01 mixed mode):
>>
>> "Attach Listener" daemon prio=10 tid=0x085d1c00 nid=0x5a57 waiting on
>> condition [0x00000000]
>> java.lang.Thread.State: RUNNABLE
>>
>> "Low Memory Detector" daemon prio=10 tid=0x7fd15c00 nid=0x57e7
>> runnable [0x00000000]
>> java.lang.Thread.State: RUNNABLE
>>
>> "CompilerThread1" daemon prio=10 tid=0x7fd13800 nid=0x57e6 waiting on
>> condition [0x00000000]
>> java.lang.Thread.State: RUNNABLE
>>
>> "CompilerThread0" daemon prio=10 tid=0x7fd12000 nid=0x57e5 waiting on
>> condition [0x00000000]
>> java.lang.Thread.State: RUNNABLE
>>
>> "Signal Dispatcher" daemon prio=10 tid=0x7fd10800 nid=0x57e4 runnable
>> [0x00000000]
>> java.lang.Thread.State: RUNNABLE
>>
>> "Finalizer" daemon prio=10 tid=0x7fd00800 nid=0x57e3 in Object.wait()
>> [0x7fe96000]
>> java.lang.Thread.State: WAITING (on object monitor)
>> at java.lang.Object.wait(Native Method)
>> - waiting on <0x845b4780> (a java.lang.ref.ReferenceQueue$Lock)
>> at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
>> - locked <0x845b4780> (a java.lang.ref.ReferenceQueue$Lock)
>> at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
>> at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
>>
>> "Reference Handler" daemon prio=10 tid=0x08343400 nid=0x57e2 in
>> Object.wait() [0x7fee7000]
>> java.lang.Thread.State: WAITING (on object monitor)
>> at java.lang.Object.wait(Native Method)
>> - waiting on <0x845b4808> (a java.lang.ref.Reference$Lock)
>> at java.lang.Object.wait(Object.java:485)
>> at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
>> - locked <0x845b4808> (a java.lang.ref.Reference$Lock)
>>
>> "main" prio=10 tid=0x082c3000 nid=0x57de waiting on condition
>> [0xb6aea000]
>> java.lang.Thread.State: TIMED_WAITING (sleeping)
>> at java.lang.Thread.sleep(Native Method)
>> at
>> com.sun.enterprise.admin.cli.LocalServerCommand.waitForRestart(LocalServerCommand.java:307)
>>
>> at
>> com.sun.enterprise.admin.cli.RestartDomainCommand.doCommand(RestartDomainCommand.java:87)
>>
>> at
>> com.sun.enterprise.admin.cli.StopDomainCommand.executeCommand(StopDomainCommand.java:130)
>>
>> at
>> com.sun.enterprise.admin.cli.CLICommand.execute(CLICommand.java:255)
>> at
>> com.sun.enterprise.admin.cli.AsadminMain.executeCommand(AsadminMain.java:229)
>>
>> at
>> com.sun.enterprise.admin.cli.AsadminMain.main(AsadminMain.java:167)
>>
>> "VM Thread" prio=10 tid=0x0833f400 nid=0x57e1 runnable
>>
>> "GC task thread#0 (ParallelGC)" prio=10 tid=0x082ca000 nid=0x57df
>> runnable
>>
>> "GC task thread#1 (ParallelGC)" prio=10 tid=0x082cb400 nid=0x57e0
>> runnable
>>
>> "VM Periodic Task Thread" prio=10 tid=0x7fd17c00 nid=0x57e8 waiting
>> on condition
>>
>> JNI global references: 1199
>>
>> Has anyone notices such behavior?
>>
>> Thanks,
>> Sahoo
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe_at_glassfish.dev.java.net
>> For additional commands, e-mail: dev-help_at_glassfish.dev.java.net
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe_at_glassfish.dev.java.net
> For additional commands, e-mail: dev-help_at_glassfish.dev.java.net
>