Re: repeated hudson problems with undeploy/shutdown

From: Justin Lee <justin.d.lee_at_oracle.com>
Date: Mon, 14 Jun 2010 11:56:32 -0400

Indeed. I started noticing this bug about the ENABLE_REPLICATION change
below went in.

On 6/14/10 11:49 AM, Hong Zhang wrote:
> When I was looking into the NPE (which is related to the
> <system-applications> element), I noticed the create-cluster command
> somehow removes the <system-applications> element from the domain.xml.
>
> I installed a freshly built server, and the <system-applications>
> element was there as expected. Then I set the environment variable and
> started the domain, and created a cluster:
> export ENABLE_REPLICATION=true
> asadmin start-domain
> asadmin create-cluster cluster1
>
> I noticed after the last command, the <system-applications> element
> was gone from the domain.xml which caused the NPE in server shutdown.
>
> I can add a check in the ApplicationLoaderService to avoid the NPE,
> but someone should take a look at this as the <system-applications>
> element need to be there in the domain.xml.
>
> Thanks,
>
> - Hong
>
> On 6/14/2010 11:15 AM, Hong Zhang wrote:
>> Hi, Justin
>> I had trouble with starting server since Friday night, I don't
>> think it's related to the NPE you saw in the server.log as part of
>> the server shutdown. That part of the code has not changed for a
>> while, but I will fix the NPE.
>>
>> Thanks,
>>
>> - Hong
>>
>>
>> On 6/14/2010 11:01 AM, Justin Lee wrote:
>>> I'm seeing repeated failures with this:
>>>
>>> startDomainUnix:
>>> [echo] Starting DAS, ENABLE_REPLICATION=${ENABLE_REPLICATION}
>>> [exec] Waiting for the server to start
>>> ........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
>>>
>>> [exec] Command start-domain failed.
>>> [exec] Error starting domain: domain1. It didn't start in 600
>>> seconds
>>>
>>>
>>> Looking in the logs, I see this exception:
>>>
>>> [#|2010-06-12T16:28:58.477-0400|INFO|glassfish3.1|javax.enterprise.system.tools.admin.com.sun.enterprise.v3.admin|_ThreadID=92;_ThreadName=Thread-1;|Server
>>> shutdown initiated|#]
>>>
>>> [#|2010-06-12T16:28:58.509-0400|SEVERE|glassfish3.1|null|_ThreadID=11;_ThreadName=Thread-1;|java.lang.NullPointerException|#]
>>>
>>>
>>> [#|2010-06-12T16:28:58.508-0400|SEVERE|glassfish3.1|null|_ThreadID=11;_ThreadName=Thread-1;|
>>> at
>>> com.sun.enterprise.v3.server.ApplicationLoaderService.preDestroy(ApplicationLoaderService.java:404)|#]
>>>
>>>
>>> [#|2010-06-12T16:28:58.508-0400|SEVERE|glassfish3.1|null|_ThreadID=11;_ThreadName=Thread-1;|
>>> at
>>> com.sun.hk2.component.AbstractWombInhabitantImpl.dispose(AbstractWombInhabitantImpl.java:74)|#]
>>>
>>>
>>> [#|2010-06-12T16:28:58.508-0400|SEVERE|glassfish3.1|null|_ThreadID=11;_ThreadName=Thread-1;|
>>> at
>>> com.sun.hk2.component.SingletonInhabitant.release(SingletonInhabitant.java:66)|#]
>>>
>>>
>>> [#|2010-06-12T16:28:58.508-0400|SEVERE|glassfish3.1|null|_ThreadID=11;_ThreadName=Thread-1;|
>>> at
>>> com.sun.hk2.component.LazyInhabitant.release(LazyInhabitant.java:112)|#]
>>>
>>>
>>> [#|2010-06-12T16:28:58.508-0400|SEVERE|glassfish3.1|null|_ThreadID=11;_ThreadName=Thread-1;|
>>> at
>>> com.sun.enterprise.v3.server.AppServerStartup.stop(AppServerStartup.java:393)|#]
>>>
>>>
>>> [#|2010-06-12T16:28:58.509-0400|SEVERE|glassfish3.1|null|_ThreadID=11;_ThreadName=Thread-1;|
>>> at
>>> org.jvnet.hk2.osgiadapter.HK2Main$StartupContextService.updated(HK2Main.java:91)|#]
>>>
>>>
>>> [#|2010-06-12T16:28:58.509-0400|SEVERE|glassfish3.1|null|_ThreadID=11;_ThreadName=Thread-1;|
>>> at
>>> org.apache.felix.cm.impl.ConfigurationManager$DeleteConfiguration.run(ConfigurationManager.java:1582)|#]
>>>
>>>
>>> [#|2010-06-12T16:28:58.510-0400|SEVERE|glassfish3.1|null|_ThreadID=11;_ThreadName=Thread-1;|
>>> at org.apache.felix.cm.impl.UpdateThread.run(UpdateThread.java:88)|#]
>>>
>>> At least 3 different jobs are failing right after the
>>> jsp-caching-instance-level test. Anyone have any ideas about that?
>>> The failing jobs are Grizzly_Integration, webtier-dev-tests-v3, and
>>> webtier-dev-tests-v3-source. Others might be as well, but I know
>>> for sure those are.
>>>
>