users@glassfish.java.net

Communication trouble DAS - Node Agent

From: <glassfish_at_javadesktop.org>
Date: Tue, 24 Nov 2009 17:19:29 PST

Behavior:
Windows NA binds to a Linux DAS but cannot be monitored nor managed by the DAS. The NA shows up in the DAS admin console and when the NA is not running, it shows a status of stopped. When the NA is started, the DAS admin console shows an empty cell in the status column. When the NA is stopped again, the status returns to stopped on the console.

Investigations:
1) The logs are unhelpful on both DAS and NA. There is an entry on the DAS that says the NA could not be notified. But that is not repeated for subsequent starts and stops of the NA.

2) Running a TCP monitor on both the DAS and NA shows that the NAS and DAS are communicating over the NA jmx port. There is a suspicious packet, though, transmitted from the NA to the DAS. It contains a string something like 'UnicastRef2 10.97.20.23'. Now, the DAS and NA communicate on a different net, '10.97.30.0/24', and the net that contains 10.97.20.23 is only accessible by the NA, not the DAS. Thus, I suspect that this is the trouble. 10.97.20.23 is an interface on the NA but I never use it in any configuration of the NA and it is not mentioned in any of the .properties files.

3) By accident, I discovered that the 10.97.20.23 address IS contained in a serialized Java objectfile in the node agent config directory--only when the node agent is running. The file, 'admch', is created at NA start and deleted at NA stop. The file looks to me like a plain serialization of an RMI socket factory stub. The address appears to be attached to an object of type UnicastRef2. hmm. I checked similar files on working Linux NAs and in those cases the address associated with UnicastRef2 designates the correct interface for NA-DAS communication.

Question:
How can I resolve this?. The Windows NA seems to be pulling an arbitrary address from those available on the host and handing that to the DAS. I do not know what else to investigate, searches on the web for UnicastRef2 produce little and I cannot seem to find anyone else with a similar problem.

BTW, I have no idea if that address is actually the cause of the behavior, I just suspect it strongly. If it is the problem, then I don't believe this is an issue with the Windows/Linux mix itself. I think the selection of that address by the NA is independent of whether or not it is binding to a Linux DAS.

Many TIA,

  -=greg
[Message sent by forum member 'gjwiley' ]

http://forums.java.net/jive/thread.jspa?messageID=373239