dev@glassfish.java.net

Another weird admin (?) problem

From: Ken <ken.cavanaugh_at_oracle.com>
Date: Mon, 24 Jan 2011 15:29:59 -0800

This is the same setup as in issue 15665.
I have a 5 instance cluster (instances in0-in4, IIOP listener ports
9037-13037 at intervals of 1000).
I have shutdown the cluster, then re-started in1, in2, and in4.
Then I am creating testInstance0 (IIOP listener should be 20037) and
testInstance1
(IIOP listener 21037).

I'm using create-instance to create a new instance in a running cluster:

        Command: create-instance --node apolloNA --systemproperties instance_name=testInstance1 --cluster c1 --portbase 21000 --checkports=true testInstance1

which results in:

        Using DAS host minas and port 4848 from existing das.properties for node
        apolloNA. To use a different DAS, create a new node using create-node-ssh or
        create-node-config. Create the instance with the new node and correct
        host and port:
        asadmin --host das_host --port das_port create-local-instance --node node_name instance_name.
        Command _create-instance-filesystem executed successfully.
        Port Assignments for server instance testInstance1:
        JMX_SYSTEM_CONNECTOR_PORT=21086
        JMS_PROVIDER_PORT=21076
        HTTP_LISTENER_PORT=21080
        ASADMIN_LISTENER_PORT=21048
        JAVA_DEBUGGER_PORT=21009
        IIOP_SSL_LISTENER_PORT=21038
        IIOP_LISTENER_PORT=21037
        OSGI_SHELL_TELNET_PORT=21066
        HTTP_SSL_LISTENER_PORT=21081
        IIOP_SSL_MUTUALAUTH_PORT=21039
        The instance, testInstance1, was created on host apollo
        WARNING: Instance in0 seems to be offline; command _register-instance-at-instance was not replicated to that instance
        WARNING: Instance in3 seems to be offline; command _register-instance-at-instance was not replicated to that instance
        Command create-instance executed successfully.

The domain.xml contents after the create-instance command completes look
fine:

   <servers>
     <server name="server" config-ref="server-config">
       <resource-ref ref="jdbc/__TimerPool"></resource-ref>
       <resource-ref ref="jdbc/__default"></resource-ref>
     </server>

     (instances similar to testInstance0 omitted here)

     <server name="testInstance0" node-ref="apolloNA" config-ref="c1-config">
       <system-property name="instance_name" value="testInstance0"></system-property>
       <system-property name="ASADMIN_LISTENER_PORT" value="20048"></system-property>
       <system-property name="HTTP_LISTENER_PORT" value="20080"></system-property>
       <system-property name="HTTP_SSL_LISTENER_PORT" value="20081"></system-property>
       <system-property name="IIOP_LISTENER_PORT" value="20037"></system-property>
       <system-property name="IIOP_SSL_MUTUALAUTH_PORT" value="20039"></system-property>
       <system-property name="IIOP_SSL_LISTENER_PORT" value="20038"></system-property>
       <system-property name="JMS_PROVIDER_PORT" value="20076"></system-property>
       <system-property name="JMX_SYSTEM_CONNECTOR_PORT" value="20086"></system-property>
       <system-property name="OSGI_SHELL_TELNET_PORT" value="20066"></system-property>
       <system-property name="JAVA_DEBUGGER_PORT" value="20009"></system-property>
       <application-ref ref="TestEJB" virtual-servers="server"></application-ref>
     </server>

which is very similar to all of the other server entries for the
existing in0-in4 instances.
The other configs contain related elements (I only care about
orb-listener-1 for FOLB):

server-config:

       <iiop-service>
         <orb use-thread-pool-ids="thread-pool-1"></orb>
         <iiop-listener port="3700" id="orb-listener-1" address="0.0.0.0" lazy-init="true"></iiop-listener>
         <iiop-listener port="3820" id="SSL" address="0.0.0.0" security-enabled="true">
           <ssl classname="com.sun.enterprise.security.ssl.GlassfishSSLImpl" cert-nickname="s1as"></ssl>
         </iiop-listener>
         <iiop-listener port="3920" id="SSL_MUTUALAUTH" address="0.0.0.0" security-enabled="true">
           <ssl classname="com.sun.enterprise.security.ssl.GlassfishSSLImpl" cert-nickname="s1as" client-auth-enabled="true"></ssl>
         </iiop-listener>
       </iiop-service>

default-config:

       <iiop-service>
         <orb use-thread-pool-ids="thread-pool-1"></orb>
         <iiop-listener port="${IIOP_LISTENER_PORT}" id="orb-listener-1" address="0.0.0.0"></iiop-listener>
         <iiop-listener port="${IIOP_SSL_LISTENER_PORT}" id="SSL" address="0.0.0.0" security-enabled="true">
           <ssl classname="com.sun.enterprise.security.ssl.GlassfishSSLImpl" cert-nickname="s1as"></ssl>
         </iiop-listener>
         <iiop-listener port="${IIOP_SSL_MUTUALAUTH_PORT}" id="SSL_MUTUALAUTH" address="0.0.0.0" security-enabled="true">
           <ssl classname="com.sun.enterprise.security.ssl.GlassfishSSLImpl" cert-nickname="s1as" client-auth-enabled="true"></ssl>
         </iiop-listener>
       </iiop-service>

       <system-property name="IIOP_LISTENER_PORT" value="23700"></system-property>

c1-config:

       <iiop-service>
         <orb use-thread-pool-ids="thread-pool-1"></orb>
         <iiop-listener id="orb-listener-1" port="${IIOP_LISTENER_PORT}" address="0.0.0.0"></iiop-listener>
         <iiop-listener id="SSL" port="${IIOP_SSL_LISTENER_PORT}" address="0.0.0.0" security-enabled="true">
           <ssl classname="com.sun.enterprise.security.ssl.GlassfishSSLImpl" cert-nickname="s1as"></ssl>
         </iiop-listener>
         <iiop-listener id="SSL_MUTUALAUTH" port="${IIOP_SSL_MUTUALAUTH_PORT}" address="0.0.0.0" security-enabled="true">
           <ssl classname="com.sun.enterprise.security.ssl.GlassfishSSLImpl" cert-nickname="s1as" client-auth-enabled="true"></ssl>
         </iiop-listener>
       </iiop-service>


       <system-property name="IIOP_LISTENER_PORT" value="23700"></system-property>

My problem is in PropertyResolver.getPropertyValue.
It checks the server config (I think that's the one that should have the
correct value),
and seems to find the IIOP_LISTENER_PORT property in the props with a
null value.
(By the way, we really need the config beans to have a generated useful
toString() method.
I can't easily log or debug this code, because I can't tell WHICH bean I
have at hand
until I start pulling things out of it).

The cluster props are empty.

The config props returns the 23700 value. This happens for both
testInstance0 and testInstance1.
I can tell from the request distribution that proper loadbalancing is
happening, but all
calls to testInstance0 and testInstance1 failover to in2, because the
port value is wrong.

This apparently works fine if an existing instance is re-started. The
failure only
occurs the first time I try to read the config after the instance has
been created.
I read the config from 3 other instances (in2, in1, and in4).

The code path on my side is in
orb/orb-iiop/src/main/java/org/glassfish/enterprise/iiop/impl/IiopFolbGmsClient.
The GMS event is handled in handleSignal, which calls addMember.
addMember calls getClusterInstanceInfo(String),
which proceeds -> getClusterInstanceInfo(Server, Config, boolean) ->
resolvePort -> PropertyResolver.getPropertyValue.
While there certainly could be an error in my code, I'm wondering if
there is an admin problem here,
because the same code works fine in all other cases of nodes stopping
and starting.

I've also attached the complete domain.xml from my test setup. The DAS
is minas, and all instances run on apollo.

Thanks,

Ken.