dev@glassfish.java.net

Re: Another weird admin (?) problem

From: Tom Mueller <tom.mueller_at_oracle.com>
Date: Tue, 25 Jan 2011 10:27:52 -0600

Ken,

I'm not quite sure what code is executing where in your description
below. However, since handleSignal is for a GMS event, I assume that it
is being called in one instance (say in1) when another instance is
started (say testInstance1). And the problem is that the code running in
in1 is not able to get the port value that testInstance1 is using.

The reason for this is that there is a bug in the code that replicates
the system property information for instances. As long as no
--systemproperties option is provided on the create-instance or
create-local-instance command, then the replication works fine. However,
if there is a --systemproperties option, then only that system property
is replicated to the other instances. A restart of the instance that
does a resync will bring over the rest of the properties.

I created issue 15683 for this:
http://java.net/jira/browse/GLASSFISH-15683

As with the other problem with system properties, the work-around is
again to not use the --systemproperties option on the create-instance
command.

Is 15683 a stopper for 3.1?

Tom


On 1/24/2011 5:29 PM, Ken wrote:
> This is the same setup as in issue 15665.
> I have a 5 instance cluster (instances in0-in4, IIOP listener ports
> 9037-13037 at intervals of 1000).
> I have shutdown the cluster, then re-started in1, in2, and in4.
> Then I am creating testInstance0 (IIOP listener should be 20037) and
> testInstance1
> (IIOP listener 21037).
>
> I'm using create-instance to create a new instance in a running cluster:
>
> Command: create-instance --node apolloNA --systemproperties
> instance_name=testInstance1 --cluster c1 --portbase 21000
> --checkports=true testInstance1
>
> which results in:
>
> Using DAS host minas and port 4848 from existing das.properties
> for node
> apolloNA. To use a different DAS, create a new node using
> create-node-ssh or
> create-node-config. Create the instance with the new node and correct
> host and port:
> asadmin --host das_host --port das_port create-local-instance
> --node node_name instance_name.
> Command _create-instance-filesystem executed successfully.
> Port Assignments for server instance testInstance1:
> JMX_SYSTEM_CONNECTOR_PORT=21086
> JMS_PROVIDER_PORT=21076
> HTTP_LISTENER_PORT=21080
> ASADMIN_LISTENER_PORT=21048
> JAVA_DEBUGGER_PORT=21009
> IIOP_SSL_LISTENER_PORT=21038
> IIOP_LISTENER_PORT=21037
> OSGI_SHELL_TELNET_PORT=21066
> HTTP_SSL_LISTENER_PORT=21081
> IIOP_SSL_MUTUALAUTH_PORT=21039
> The instance, testInstance1, was created on host apollo
> WARNING: Instance in0 seems to be offline; command
> _register-instance-at-instance was not replicated to that instance
> WARNING: Instance in3 seems to be offline; command
> _register-instance-at-instance was not replicated to that instance
> Command create-instance executed successfully.
>
> The domain.xml contents after the create-instance command completes
> look fine:
>
> <servers>
> <server name="server" config-ref="server-config">
> <resource-ref ref="jdbc/__TimerPool"></resource-ref>
> <resource-ref ref="jdbc/__default"></resource-ref>
> </server>
>
> (instances similar to testInstance0 omitted here)
>
> <server name="testInstance0" node-ref="apolloNA" config-ref="c1-config">
> <system-property name="instance_name"
> value="testInstance0"></system-property>
> <system-property name="ASADMIN_LISTENER_PORT"
> value="20048"></system-property>
> <system-property name="HTTP_LISTENER_PORT"
> value="20080"></system-property>
> <system-property name="HTTP_SSL_LISTENER_PORT"
> value="20081"></system-property>
> <system-property name="IIOP_LISTENER_PORT"
> value="20037"></system-property>
> <system-property name="IIOP_SSL_MUTUALAUTH_PORT"
> value="20039"></system-property>
> <system-property name="IIOP_SSL_LISTENER_PORT"
> value="20038"></system-property>
> <system-property name="JMS_PROVIDER_PORT"
> value="20076"></system-property>
> <system-property name="JMX_SYSTEM_CONNECTOR_PORT"
> value="20086"></system-property>
> <system-property name="OSGI_SHELL_TELNET_PORT"
> value="20066"></system-property>
> <system-property name="JAVA_DEBUGGER_PORT"
> value="20009"></system-property>
> <application-ref ref="TestEJB"
> virtual-servers="server"></application-ref>
> </server>
>
> which is very similar to all of the other server entries for the
> existing in0-in4 instances.
> The other configs contain related elements (I only care about
> orb-listener-1 for FOLB):
>
> server-config:
>
> <iiop-service>
> <orb use-thread-pool-ids="thread-pool-1"></orb>
> <iiop-listener port="3700" id="orb-listener-1" address="0.0.0.0"
> lazy-init="true"></iiop-listener>
> <iiop-listener port="3820" id="SSL" address="0.0.0.0"
> security-enabled="true">
> <ssl classname="com.sun.enterprise.security.ssl.GlassfishSSLImpl"
> cert-nickname="s1as"></ssl>
> </iiop-listener>
> <iiop-listener port="3920" id="SSL_MUTUALAUTH" address="0.0.0.0"
> security-enabled="true">
> <ssl classname="com.sun.enterprise.security.ssl.GlassfishSSLImpl"
> cert-nickname="s1as" client-auth-enabled="true"></ssl>
> </iiop-listener>
> </iiop-service>
>
> default-config:
>
> <iiop-service>
> <orb use-thread-pool-ids="thread-pool-1"></orb>
> <iiop-listener port="${IIOP_LISTENER_PORT}" id="orb-listener-1"
> address="0.0.0.0"></iiop-listener>
> <iiop-listener port="${IIOP_SSL_LISTENER_PORT}" id="SSL"
> address="0.0.0.0" security-enabled="true">
> <ssl classname="com.sun.enterprise.security.ssl.GlassfishSSLImpl"
> cert-nickname="s1as"></ssl>
> </iiop-listener>
> <iiop-listener port="${IIOP_SSL_MUTUALAUTH_PORT}" id="SSL_MUTUALAUTH"
> address="0.0.0.0" security-enabled="true">
> <ssl classname="com.sun.enterprise.security.ssl.GlassfishSSLImpl"
> cert-nickname="s1as" client-auth-enabled="true"></ssl>
> </iiop-listener>
> </iiop-service>
>
> <system-property name="IIOP_LISTENER_PORT"
> value="23700"></system-property>
>
> c1-config:
>
> <iiop-service>
> <orb use-thread-pool-ids="thread-pool-1"></orb>
> <iiop-listener id="orb-listener-1" port="${IIOP_LISTENER_PORT}"
> address="0.0.0.0"></iiop-listener>
> <iiop-listener id="SSL" port="${IIOP_SSL_LISTENER_PORT}"
> address="0.0.0.0" security-enabled="true">
> <ssl classname="com.sun.enterprise.security.ssl.GlassfishSSLImpl"
> cert-nickname="s1as"></ssl>
> </iiop-listener>
> <iiop-listener id="SSL_MUTUALAUTH" port="${IIOP_SSL_MUTUALAUTH_PORT}"
> address="0.0.0.0" security-enabled="true">
> <ssl classname="com.sun.enterprise.security.ssl.GlassfishSSLImpl"
> cert-nickname="s1as" client-auth-enabled="true"></ssl>
> </iiop-listener>
> </iiop-service>
>
>
> <system-property name="IIOP_LISTENER_PORT"
> value="23700"></system-property>
>
> My problem is in PropertyResolver.getPropertyValue.
> It checks the server config (I think that's the one that should have
> the correct value),
> and seems to find the IIOP_LISTENER_PORT property in the props with a
> null value.
> (By the way, we really need the config beans to have a generated
> useful toString() method.
> I can't easily log or debug this code, because I can't tell WHICH bean
> I have at hand
> until I start pulling things out of it).
>
> The cluster props are empty.
>
> The config props returns the 23700 value. This happens for both
> testInstance0 and testInstance1.
> I can tell from the request distribution that proper loadbalancing is
> happening, but all
> calls to testInstance0 and testInstance1 failover to in2, because the
> port value is wrong.
>
> This apparently works fine if an existing instance is re-started. The
> failure only
> occurs the first time I try to read the config after the instance has
> been created.
> I read the config from 3 other instances (in2, in1, and in4).
>
> The code path on my side is in
> orb/orb-iiop/src/main/java/org/glassfish/enterprise/iiop/impl/IiopFolbGmsClient.
> The GMS event is handled in handleSignal, which calls addMember.
> addMember calls getClusterInstanceInfo(String),
> which proceeds -> getClusterInstanceInfo(Server, Config, boolean) ->
> resolvePort -> PropertyResolver.getPropertyValue.
> While there certainly could be an error in my code, I'm wondering if
> there is an admin problem here,
> because the same code works fine in all other cases of nodes stopping
> and starting.
>
> I've also attached the complete domain.xml from my test setup. The
> DAS is minas, and all instances run on apollo.
>
> Thanks,
>
> Ken.
>
>