dev@glassfish.java.net

Re: deployment devtests hang

From: Bill Shannon <bill.shannon_at_oracle.com>
Date: Fri, 19 Nov 2010 19:42:45 -0800

With some debugging tips from Tim, I was able to track down the root
cause of this problem. Some of you might find this amusing...

I was having trouble getting the tests to run with my changes, so I had
switched back to using an unchanged build of GlassFish. I have a cron
job that does an "svn update" and build every night, so I just used that
build.

Well, it turns out my svn update was failing for some reason, perhaps
because the certificate had expired:

+ svn update
Error validating server certificate for 'https://svn.dev.java.net:443':
  - The certificate is not issued by a trusted authority. Use the
    fingerprint to validate the certificate manually!
Certificate information:
  - Hostname: *.dev.java.net
  - Valid: from Sep 13 09:52:17 2010 GMT until Nov 15 14:38:09 2011 GMT
  - Issuer: Equifax Secure Certificate Authority, Equifax, US
  - Fingerprint: a3:69:1e:c2:66:f7:5b:3f:b5:63:da:5e:9e:09:cf:70:dd:51:b8:7b
(R)eject, accept (t)emporarily or accept (p)ermanently? svn: PROPFIND request fa
iled on '/svn/glassfish-svn/trunk/v3'
svn: PROPFIND of '/svn/glassfish-svn/trunk/v3': Server certificate verification
failed: issuer is not trusted (https://svn.dev.java.net)

Ok, so my build was a few days old. No big deal, right?

The build I was using included this change to NetUtils.getCanonicalHostName
from Byron:

r41933 | bnevins | 2010-10-21 11:32:18 -0700 (Thu, 21 Oct 2010) | 4 lines

Made the obnoxiously long wait for getCanonicalHostname 3 seconds. If your DNS
is munged up - it would take ~~30 seconds.
If it takes more than 3 seconds - then it is not attempted in the future...


That change caused getCanonicalHostName to use an ExecutorService, which
created threads that weren't daemon threads, and thus prevented app clients
and standalone clients from exiting when they were done.

Argh!

Fortunately, Byron came to his senses and removed this code in this commit
a day later:

r41991 | bnevins | 2010-10-22 12:38:14 -0700 (Fri, 22 Oct 2010) | 5 lines

Totally weird. A thread was left running with yesterday's changes.
Changed the technique. If you want to skip reverse-dns lookup then set this env
. variable to true
AS_NO_REVERSE_DNS


Lucky me, my svn update started failing right in that window between those
two changes! (Why? I don't know. The cert problem shouldn't have started
then.)

Argh! Argh! Argh!



Bill Shannon wrote on 11/18/2010 02:22 PM:
> I'm running the deployment devtests on my OpenSolaris x86 machine
> using an unmodified 3.1 build from last night. I've tried several
> build, they all hang.
>
> The tests hang running the statelesshello test. The ant output stops here:
>
> runclient:
> [javac]
> /home/shannon/appserv-tests/devtests/deployment/ejb/statelesshello/build.xml:140: warning:
> 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to
> false for repeatable builds
> [javac] Compiling 1 source file to
> /home/shannon/appserv-tests/devtests/deployment/build/ejb/statelesshello
> [javaWithResult] Test expects successful result
> [javaWithResult] Nov 18, 2010 11:26:34 AM com.sun.logging.LogDomains$1 log
> [javaWithResult] INFO: Cannot find javadb client jar file, derby jdbc driver
> will not be available by default.
> [javaWithResult] Nov 18, 2010 11:26:34 AM com.sun.logging.LogDomains$1 log
> [javaWithResult] INFO: enterprise_used_delegate_name
> [javaWithResult] Looked up home!!
> [javaWithResult] Narrowed home!!
> [javaWithResult] Got the EJB!!
> [javaWithResult] Statelesshello EJB - checked environment properties for user scott
> [javaWithResult] and for number which is = 22.25
> [javaWithResult]
> [javaWithResult] Client's sayStatelesshello() method succeeded
> [javaWithResult]
> [javaWithResult] Client now getting a User Defined Exception
> [javaWithResult] Success! Caught StatelesshelloException
> [javaWithResult] Client's getUserDefinedException() method succeeded
> [javaWithResult]
> [javaWithResult] Client is now trying to remove the session bean
> [javaWithResult]
> [javaWithResult] Session bean was successfully removed by the client.
> [javaWithResult]
>
> The server.log file ends with this:
>
> [#|2010-11-18T11:26:38.758-0800|INFO|glassfish3.1|null|_ThreadID=16;_ThreadName=Thread-1;|Now
> going forward with reading the environment properties
> |#]
>
> [#|2010-11-18T11:26:38.766-0800|INFO|glassfish3.1|null|_ThreadID=16;_ThreadName=Thread-1;|Statelesshello
> EJB - checked environment properties for user scott
> and for number which is = 22.25
> |#]
>
> [#|2010-11-18T11:26:38.778-0800|INFO|glassfish3.1|null|_ThreadID=16;_ThreadName=Thread-1;|Throwing
> a User Defined Exception|#]
>
>
> The client seems to be hung. jstack shows this:
>
> 2010-11-18 14:09:48
> Full thread dump Java HotSpot(TM) Client VM (17.0-b16 mixed mode):
>
> "Attach Listener" daemon prio=3 tid=0x08299000 nid=0x11 waiting on condition
> [0x00000000]
> java.lang.Thread.State: RUNNABLE
>
> "DestroyJavaVM" prio=3 tid=0x0807d000 nid=0x2 waiting on condition [0x00000000]
> java.lang.Thread.State: RUNNABLE
>
> "SelectorThread" daemon prio=3 tid=0x08682000 nid=0xc runnable [0xe6c23000]
> java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.DevPollArrayWrapper.poll0(Native Method)
> at sun.nio.ch.DevPollArrayWrapper.poll(DevPollArrayWrapper.java:170)
> at sun.nio.ch.DevPollSelectorImpl.doSelect(DevPollSelectorImpl.java:68)
> at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
> - locked <0xeed68338> (a sun.nio.ch.Util$1)
> - locked <0xeed68348> (a java.util.Collections$UnmodifiableSet)
> - locked <0xeed682f0> (a sun.nio.ch.DevPollSelectorImpl)
> at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
> at com.sun.corba.ee.impl.transport.SelectorImpl.run(SelectorImpl.java:307)
>
> "transaction-manager" daemon prio=3 tid=0x09b56800 nid=0xb in Object.wait()
> [0xe6ef9000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0xee1845f8> (a java.util.TaskQueue)
> at java.lang.Object.wait(Object.java:485)
> at java.util.TimerThread.mainLoop(Timer.java:483)
> - locked <0xee1845f8> (a java.util.TaskQueue)
> at java.util.TimerThread.run(Timer.java:462)
>
> "pool-1-thread-1" prio=3 tid=0x09b30000 nid=0xa waiting on condition [0xe6f4e000]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0xee184670> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
>
> at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
> at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
> at java.lang.Thread.run(Thread.java:619)
>
> "Low Memory Detector" daemon prio=3 tid=0x08115400 nid=0x8 runnable [0x00000000]
> java.lang.Thread.State: RUNNABLE
>
> "CompilerThread0" daemon prio=3 tid=0x08112000 nid=0x7 waiting on condition
> [0x00000000]
> java.lang.Thread.State: RUNNABLE
>
> "Signal Dispatcher" daemon prio=3 tid=0x08110c00 nid=0x6 runnable [0x00000000]
> java.lang.Thread.State: RUNNABLE
>
> "Finalizer" daemon prio=3 tid=0x080fbc00 nid=0x5 in Object.wait() [0xfc262000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0xed7802b8> (a java.lang.ref.ReferenceQueue$Lock)
> at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
> - locked <0xed7802b8> (a java.lang.ref.ReferenceQueue$Lock)
> at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
> at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
>
> "Reference Handler" daemon prio=3 tid=0x080fa400 nid=0x4 in Object.wait()
> [0xfe46e000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0xed780340> (a java.lang.ref.Reference$Lock)
> at java.lang.Object.wait(Object.java:485)
> at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
> - locked <0xed780340> (a java.lang.ref.Reference$Lock)
>
> "VM Thread" prio=3 tid=0x080f7c00 nid=0x3 runnable
>
> "VM Periodic Task Thread" prio=3 tid=0x08117c00 nid=0x9 waiting on condition
>
> JNI global references: 1535
>
>
> None of the threads in the client are the "main" thread, nor is it
> obvious that any of the threads are waiting for something from the
> server.
>
> The server is still running and responds to other requests.
>
> These same tests work fine on my MacOS machine.
>
> Any idea what's going on here?