dev@glassfish.java.net

Re: Another weird admin (?) problem

From: Ken Cavanaugh <ken.cavanaugh_at_oracle.com>
Date: Tue, 25 Jan 2011 08:36:34 -0800

On Jan 25, 2011, at 8:27 AM, Tom Mueller wrote:

> Ken,
>
> I'm not quite sure what code is executing where in your description below. However, since handleSignal is for a GMS event, I assume that it is being called in one instance (say in1) when another instance is started (say testInstance1). And the problem is that the code running in in1 is not able to get the port value that testInstance1 is using.

Yes, that's the setup.

>
> The reason for this is that there is a bug in the code that replicates the system property information for instances. As long as no --systemproperties option is provided on the create-instance or create-local-instance command, then the replication works fine. However, if there is a --systemproperties option, then only that system property is replicated to the other instances. A restart of the instance that does a resync will bring over the rest of the properties.
>
> I created issue 15683 for this:
> http://java.net/jira/browse/GLASSFISH-15683
>
> As with the other problem with system properties, the work-around is again to not use the --systemproperties option on the create-instance command.
>
> Is 15683 a stopper for 3.1?

I'm not sure. I can probably rewrite the test framework to specify IIOP_LISTENER_PORT using create-system-properties,
and work around this. Will create-system-property --targetserver testInstance0 IIOP_LISTENER_PORT=20037
(not sure about the syntax) replicate the value of IIOP_LISTENER_PORT for testInstance0 to all other instances in
the cluster?
I'll try this later this morning and let you know.

If I have a workaround, this at least needs to be covered in the release notes, to keep other from failing into the same hole.
By the way, I think this also affects Server.isRunning, which is possibly getting the wrong admin port when it attempts
to verify the instance is up by checking for admin listening on a particular port.

Thanks,

Ken.

>
> Tom
>
>
> On 1/24/2011 5:29 PM, Ken wrote:
>> This is the same setup as in issue 15665.
>> I have a 5 instance cluster (instances in0-in4, IIOP listener ports 9037-13037 at intervals of 1000).
>> I have shutdown the cluster, then re-started in1, in2, and in4.
>> Then I am creating testInstance0 (IIOP listener should be 20037) and testInstance1
>> (IIOP listener 21037).
>>
>> I'm using create-instance to create a new instance in a running cluster:
>>
>> Command: create-instance --node apolloNA --systemproperties instance_name=testInstance1 --cluster c1 --portbase 21000 --checkports=true testInstance1
>>
>> which results in:
>>
>> Using DAS host minas and port 4848 from existing das.properties for node
>> apolloNA. To use a different DAS, create a new node using create-node-ssh or
>> create-node-config. Create the instance with the new node and correct
>> host and port:
>> asadmin --host das_host --port das_port create-local-instance --node node_name instance_name.
>> Command _create-instance-filesystem executed successfully.
>> Port Assignments for server instance testInstance1:
>> JMX_SYSTEM_CONNECTOR_PORT=21086
>> JMS_PROVIDER_PORT=21076
>> HTTP_LISTENER_PORT=21080
>> ASADMIN_LISTENER_PORT=21048
>> JAVA_DEBUGGER_PORT=21009
>> IIOP_SSL_LISTENER_PORT=21038
>> IIOP_LISTENER_PORT=21037
>> OSGI_SHELL_TELNET_PORT=21066
>> HTTP_SSL_LISTENER_PORT=21081
>> IIOP_SSL_MUTUALAUTH_PORT=21039
>> The instance, testInstance1, was created on host apollo
>> WARNING: Instance in0 seems to be offline; command _register-instance-at-instance was not replicated to that instance
>> WARNING: Instance in3 seems to be offline; command _register-instance-at-instance was not replicated to that instance
>> Command create-instance executed successfully.
>>
>> The domain.xml contents after the create-instance command completes look fine:
>>
>> <servers>
>> <server name="server" config-ref="server-config">
>> <resource-ref ref="jdbc/__TimerPool"></resource-ref>
>> <resource-ref ref="jdbc/__default"></resource-ref>
>> </server>
>>
>> (instances similar to testInstance0 omitted here)
>>
>> <server name="testInstance0" node-ref="apolloNA" config-ref="c1-config">
>> <system-property name="instance_name" value="testInstance0"></system-property>
>> <system-property name="ASADMIN_LISTENER_PORT" value="20048"></system-property>
>> <system-property name="HTTP_LISTENER_PORT" value="20080"></system-property>
>> <system-property name="HTTP_SSL_LISTENER_PORT" value="20081"></system-property>
>> <system-property name="IIOP_LISTENER_PORT" value="20037"></system-property>
>> <system-property name="IIOP_SSL_MUTUALAUTH_PORT" value="20039"></system-property>
>> <system-property name="IIOP_SSL_LISTENER_PORT" value="20038"></system-property>
>> <system-property name="JMS_PROVIDER_PORT" value="20076"></system-property>
>> <system-property name="JMX_SYSTEM_CONNECTOR_PORT" value="20086"></system-property>
>> <system-property name="OSGI_SHELL_TELNET_PORT" value="20066"></system-property>
>> <system-property name="JAVA_DEBUGGER_PORT" value="20009"></system-property>
>> <application-ref ref="TestEJB" virtual-servers="server"></application-ref>
>> </server>
>>
>> which is very similar to all of the other server entries for the existing in0-in4 instances.
>> The other configs contain related elements (I only care about orb-listener-1 for FOLB):
>>
>> server-config:
>>
>> <iiop-service>
>> <orb use-thread-pool-ids="thread-pool-1"></orb>
>> <iiop-listener port="3700" id="orb-listener-1" address="0.0.0.0" lazy-init="true"></iiop-listener>
>> <iiop-listener port="3820" id="SSL" address="0.0.0.0" security-enabled="true">
>> <ssl classname="com.sun.enterprise.security.ssl.GlassfishSSLImpl" cert-nickname="s1as"></ssl>
>> </iiop-listener>
>> <iiop-listener port="3920" id="SSL_MUTUALAUTH" address="0.0.0.0" security-enabled="true">
>> <ssl classname="com.sun.enterprise.security.ssl.GlassfishSSLImpl" cert-nickname="s1as" client-auth-enabled="true"></ssl>
>> </iiop-listener>
>> </iiop-service>
>>
>> default-config:
>>
>> <iiop-service>
>> <orb use-thread-pool-ids="thread-pool-1"></orb>
>> <iiop-listener port="${IIOP_LISTENER_PORT}" id="orb-listener-1" address="0.0.0.0"></iiop-listener>
>> <iiop-listener port="${IIOP_SSL_LISTENER_PORT}" id="SSL" address="0.0.0.0" security-enabled="true">
>> <ssl classname="com.sun.enterprise.security.ssl.GlassfishSSLImpl" cert-nickname="s1as"></ssl>
>> </iiop-listener>
>> <iiop-listener port="${IIOP_SSL_MUTUALAUTH_PORT}" id="SSL_MUTUALAUTH" address="0.0.0.0" security-enabled="true">
>> <ssl classname="com.sun.enterprise.security.ssl.GlassfishSSLImpl" cert-nickname="s1as" client-auth-enabled="true"></ssl>
>> </iiop-listener>
>> </iiop-service>
>>
>> <system-property name="IIOP_LISTENER_PORT" value="23700"></system-property>
>>
>> c1-config:
>>
>> <iiop-service>
>> <orb use-thread-pool-ids="thread-pool-1"></orb>
>> <iiop-listener id="orb-listener-1" port="${IIOP_LISTENER_PORT}" address="0.0.0.0"></iiop-listener>
>> <iiop-listener id="SSL" port="${IIOP_SSL_LISTENER_PORT}" address="0.0.0.0" security-enabled="true">
>> <ssl classname="com.sun.enterprise.security.ssl.GlassfishSSLImpl" cert-nickname="s1as"></ssl>
>> </iiop-listener>
>> <iiop-listener id="SSL_MUTUALAUTH" port="${IIOP_SSL_MUTUALAUTH_PORT}" address="0.0.0.0" security-enabled="true">
>> <ssl classname="com.sun.enterprise.security.ssl.GlassfishSSLImpl" cert-nickname="s1as" client-auth-enabled="true"></ssl>
>> </iiop-listener>
>> </iiop-service>
>>
>>
>> <system-property name="IIOP_LISTENER_PORT" value="23700"></system-property>
>>
>> My problem is in PropertyResolver.getPropertyValue.
>> It checks the server config (I think that's the one that should have the correct value),
>> and seems to find the IIOP_LISTENER_PORT property in the props with a null value.
>> (By the way, we really need the config beans to have a generated useful toString() method.
>> I can't easily log or debug this code, because I can't tell WHICH bean I have at hand
>> until I start pulling things out of it).
>>
>> The cluster props are empty.
>>
>> The config props returns the 23700 value. This happens for both testInstance0 and testInstance1.
>> I can tell from the request distribution that proper loadbalancing is happening, but all
>> calls to testInstance0 and testInstance1 failover to in2, because the port value is wrong.
>>
>> This apparently works fine if an existing instance is re-started. The failure only
>> occurs the first time I try to read the config after the instance has been created.
>> I read the config from 3 other instances (in2, in1, and in4).
>>
>> The code path on my side is in orb/orb-iiop/src/main/java/org/glassfish/enterprise/iiop/impl/IiopFolbGmsClient.
>> The GMS event is handled in handleSignal, which calls addMember. addMember calls getClusterInstanceInfo(String),
>> which proceeds -> getClusterInstanceInfo(Server, Config, boolean) -> resolvePort -> PropertyResolver.getPropertyValue.
>> While there certainly could be an error in my code, I'm wondering if there is an admin problem here,
>> because the same code works fine in all other cases of nodes stopping and starting.
>>
>> I've also attached the complete domain.xml from my test setup. The DAS is minas, and all instances run on apollo.
>>
>> Thanks,
>>
>> Ken.
>>
>>