Re: In-memory replication issues

From: <glassfish_at_javadesktop.org>
Date: Fri, 06 Jun 2008 07:02:50 PDT

I'm having a related issue, but not exactly the same.

I have a test cluster that is configured for in-memory replication and this is not a problem when I used file based persistence. Though with file based persistence using an nfs mounted directory we end up with different issues which I may post in another thread.

domain: domain1
das: test
instance1: prweb2-test
instance2: prweb3-test

General Environment Info:
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_15-b04)
Java HotSpot(TM) Server VM (build 1.5.0_15-b04, mixed mode)

RedHat AS: Linux test.domain.com 2.6.9-55.0.2.ELsmp #1 SMP Tue Jun 12 17:58:20 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux

asadmin version yields
Version = Sun Java System Application Server 9.1_02
Command version executed successfully.
I am running the open source version without hadb.

All 3 of the physical servers are on the same subnet
In this scenario the the simple clusterjsp works correctly with a simple round robin lb without sticky sessions.

When I try to replicate this configuration to another set of servers/nodeagents the replication is not working even in the clusterjsp application.

This is what I am seeing in the logs when I turn logging up to FINE

[#|2008-06-06T03:25:21.869-0400|FINE|sun-appserver9.1|org.apache.jasper.servlet.JspServlet|_ThreadID=34;_ThreadName=httpSSLWorkerThread-20080-0;ClassName=org.apache.jasper.servlet.JspServlet;MethodName=service;_RequestID=4889d2d5-a4dd-458f-ac67-bb8f4afb3331;|JspEngine --> [/HaJsp.jsp] ServletPath: [/HaJsp.jsp] PathInfo: [null] RealPath: [/usr/local/glassfish/nodeagents/prweb2/www-hostb2/applications/j2ee-apps/clusterjsp/clusterjsp_war/HaJsp.jsp] RequestURI: [/clusterjsp/HaJsp.jsp] QueryString: [null]|#]

[#|2008-06-06T03:25:21.870-0400|FINE|sun-appserver9.1|org.apache.coyote.tomcat5.InputBuffer|_ThreadID=34;_ThreadName=httpSSLWorkerThread-20080-0;ClassName=org.apache.coyote.tomcat5.InputBuffer;MethodName=realReadBytes;_RequestID=4889d2d5-a4dd-458f-ac67-bb8f4afb3331;|realRead() R( /clusterjsp/HaJsp.jsp)|#]

[#|2008-06-06T03:25:21.871-0400|INFO|sun-appserver9.1|javax.enterprise.system.stream.out|_ThreadID=34;_ThreadName=httpSSLWorkerThread-20080-0;|
Add to session: test = test|#]

[#|2008-06-06T03:25:21.872-0400|FINE|sun-appserver9.1|org.apache.coyote.tomcat5.OutputBuffer|_ThreadID=34;_ThreadName=httpSSLWorkerThread-20080-0;ClassName=org.apache.coyote.tomcat5.OutputBuffer;MethodName=setConverter;_RequestID=4889d2d5-a4dd-458f-ac67-bb8f4afb3331;|Got encoding: ISO-8859-1|#]

[#|2008-06-06T03:25:21.872-0400|FINE|sun-appserver9.1|javax.enterprise.system.container.web|_ThreadID=34;_ThreadName=httpSSLWorkerThread-20080-0;ClassName=com.sun.enterprise.ee.web.sessmgmt.JxtaBackingStoreImpl;MethodName=saveSimple;_RequestID=4889d2d5-a4dd-458f-ac67-bb8f4afb3331;|JxtaBackingStore>>saveSimple():id = cbe68c81c3f45d9ecab547876ef7unable to proceed due to health check|#]

[#|2008-06-06T03:25:21.873-0400|FINE|sun-appserver9.1|org.apache.coyote.tomcat5.OutputBuffer|_ThreadID=34;_ThreadName=httpSSLWorkerThread-20080-0;ClassName=org.apache.coyote.tomcat5.OutputBuffer;MethodName=realWriteBytes;_RequestID=4889d2d5-a4dd-458f-ac67-bb8f4afb3331;|realWrite(b, 0, 1633) org.apache.coyote.Response_at_1aa8d4|#]

[#|2008-06-06T03:25:21.873-0400|FINE|sun-appserver9.1|org.apache.coyote.tomcat5.OutputBuffer|_ThreadID=34;_ThreadName=httpSSLWorkerThread-20080-0;ClassName=org.apache.coyote.tomcat5.OutputBuffer;MethodName=recycle;_RequestID=4889d2d5-a4dd-458f-ac67-bb8f4afb3331;|recycle()|#]

The key problem on this 2nd domain seems to be this line:

[#|2008-06-06T03:25:21.872-0400|FINE|sun-appserver9.1|javax.enterprise.system.container.web|_ThreadID=34;_ThreadName=httpSSLWorkerThread-20080-0;ClassName=com.sun.enterprise.ee.web.sessmgmt.JxtaBackingStoreImpl;MethodName=saveSimple;_RequestID=4889d2d5-a4dd-458f-ac67-bb8f4afb3331;|JxtaBackingStore>>saveSimple():id = cbe68c81c3f45d9ecab547876ef7unable to proceed due to health check|#]

I don't have health check turned on, so I don't know why it should fail that check.
I have looked at some source from the diffs that I found through google and found this interesting bit of code at: http://72.14.205.104/search?q=cache:asgc77-i9jQJ:fisheye5.cenqua.com/browse/glassfish/appserv-core-ee/http-session-persistence/src/java/com/sun/enterprise/ee/web/sessmgmt/JxtaBackingStoreImpl.java%3Fr%3D1.19+unable+to+proceed+due+to+health+check+glassfish&hl=en&ct=clnk&cd=1&gl=us&client=firefox-a
which lead me to the class and method:
ReplicationHealthChecker.isOkToProceed()

Which leads me to this method along with the interesting comments that surround the HealthCheckingEnabled flag check logic
    /**
     * return boolean reflecting whether it is ok to proceed
     * with replication processing
     */
    public static boolean isOkToProceed() {
        /* FIXME we can put this back later
        if( !isHealthCheckingEnabled() ) {
            return true;
        }
         */
        //flushing time is treated specially
        if(isFlushing()) {
            return true;
        }
        //cluster stopping time is treated specially
        if(isClusterStopping()) {
            return false;
        }
        //in the midst of attempting connection
        if(isAttemptingConnection()) {
            return false;
        }
        boolean condition = isReplicationPartnerOperational()
            && isReplicationCommunicationOperational();
        if(condition) {
            return true;
        }
        synchronized(_monitor) {
            if(!condition) {
                reportError("ReplicationHealthChecker:health failure " +
                            " isReplicationPartnerOperational()=" + isReplicationPartnerOperational() +
                            " isReplicationCommunicationOperational()=" + isReplicationCommunicationOperational());
            }
        }
        return condition;
    }

This again was gotten from google cache at:
http://72.14.205.104/search?q=cache:uzWGXB7H3K0J:fisheye5.cenqua.com/browse/~raw,r%3D1.9.2.13/glassfish/appserv-core-ee/http-session-persistence/src/java/com/sun/enterprise/ee/web/sessmgmt/ReplicationHealthChecker.java+ReplicationHealthChecker&hl=en&ct=clnk&cd=1&gl=us&client=firefox-a

Any help in what could be triggering the HealthCheck to not be ok in a memory based replication scenario would be greatly appreciated.

I am continuing to rebuild clusters trying to make them behave/work like my single working cluster.

Thanks
[Message sent by forum member 'awizardly' (awizardly)]

http://forums.java.net/jive/thread.jspa?messageID=278790