Looked into server-bawweb4.log.
The following set of log entries show that a load request for version 6 arrives before the save for version 6 arrives!!
This is only possible if you had deployed the app with asyncreplication=true (default). Can you confirm that you deployed the app with asynreplication=true?
<log-entries>
[#|2011-10-10T23:42:02.633-0400|FINE|glassfish3.1.1|org.shoal.ha.cache.command|_ThreadID=23;_ThreadName=Thread-2;ClassName=org.shoal.ha.cache.impl.interceptor.CommandHandlerInterceptor;MethodName=onReceive;|/stage: Received LoadRequestCommand:35(11362da14db1da505ec8bfba8654) from bawweb3-inst|#]
[#|2011-10-10T23:42:02.633-0400|FINE|glassfish3.1.1|org.shoal.ha.cache.command.load_request|_ThreadID=23;_ThreadName=Thread-2;ClassName=org.shoal.adapter.store.commands.LoadRequestCommand;MethodName=execute;|bawweb4-instLoadRequestCommand:35 received load_request command for 11362da14db1da505ec8bfba8654from bawweb3-inst|#]
[#|2011-10-10T23:42:02.634-0400|FINE|glassfish3.1.1|org.shoal.ha.cache.command.save|_ThreadID=23;_ThreadName=Thread-2;ClassName=org.shoal.ha.cache.impl.store.SimpleStoreableDataStoreEntryUpdater;MethodName=createLoadResponseCommand;|SimpleStoreableDataStoreEntryUpdater.createLoadResp 5 >= 6; rawV.length = [B_at_1e96744|#]
[#|2011-10-10T23:42:02.748-0400|FINE|glassfish3.1.1|org.shoal.ha.cache.stats|_ThreadID=29;_ThreadName=Thread-2;ClassName=org.shoal.ha.cache.impl.interceptor.ReplicationCommandTransmitterWithMap$BatchedCommandMapDataFrame;MethodName=flushAndTransmit;|flushAndTransmit will flush data because lastTS = 1318304522634; timeStamp = 1318304522634; lastTS = 1318304522634; map.size() = 1; removedKeys.size() = 0|#]
[#|2011-10-10T23:42:02.749-0400|FINE|glassfish3.1.1|org.shoal.ha.cache.stats|_ThreadID=29;_ThreadName=Thread-2;ClassName=org.shoal.ha.cache.impl.interceptor.ReplicationCommandTransmitterWithMap$BatchedCommandMapDataFrame;MethodName=doAddOrRemove;|doAddOrRemove batchThresholdReached.get()=true; inFlightCount = 0; |#]
[#|2011-10-10T23:42:02.749-0400|FINE|glassfish3.1.1|org.shoal.ha.cache.stats|_ThreadID=29;_ThreadName=Thread-2;ClassName=org.shoal.ha.cache.impl.interceptor.ReplicationCommandTransmitterWithMap$BatchedCommandMapDataFrame;MethodName=doAddOrRemove;|Sending batch# 6 to bawweb3-inst; wasActive for (115 millis|#]
[#|2011-10-10T23:42:02.750-0400|FINE|glassfish3.1.1|org.shoal.ha.cache.command.load_response|_ThreadID=35;_ThreadName=Thread-2;ClassName=org.shoal.adapter.store.commands.LoadResponseCommand;MethodName=writeObject;|bawweb4-instLoadResponseCommand:37 sending load_response command for 11362da14db1da505ec8bfba8654 to bawweb3-inst; version = -9223372036854775808; state = NOT_FOUND|#]
…..
……
[#|2011-10-10T23:42:02.933-0400|FINE|glassfish3.1.1|org.shoal.ha.cache.command|_ThreadID=24;_ThreadName=Thread-2;ClassName=org.shoal.ha.cache.impl.interceptor.CommandHandlerInterceptor;MethodName=onReceive;|/stage: Received SaveCommand:33(11362da14db1da505ec8bfba8654) from bawweb3-inst|#]
[#|2011-10-10T23:42:02.934-0400|FINE|glassfish3.1.1|org.shoal.ha.cache.command.save|_ThreadID=24;_ThreadName=Thread-2;ClassName=org.shoal.adapter.store.commands.SaveCommand;MethodName=execute;|/stageSaveCommand:33 received save_command for key = 11362da14db1da505ec8bfba8654 from bawweb3-inst|#]
[#|2011-10-10T23:42:02.934-0400|FINE|glassfish3.1.1|org.shoal.ha.cache.command.save|_ThreadID=24;_ThreadName=Thread-2;ClassName=org.shoal.ha.cache.impl.store.SimpleStoreableDataStoreEntryUpdater;MethodName=executeSave;|SimpleStoreableDataStoreEntryUpdater.executeSave. SAVING ... entry = org.shoal.ha.cache.impl.store.DataStoreEntry_at_2f7d55; entry.version = 5; cmd.version = 6|#]
[#|2011-10-10T23:42:02.934-0400|FINE|glassfish3.1.1|org.shoal.ha.cache.command.save|_ThreadID=24;_ThreadName=Thread-2;ClassName=org.shoal.ha.cache.impl.store.DataStoreEntryUpdater;MethodName=printEntryInfo;|executeSave:SimpleStoreableDataStoreEntryUpdater:Updated key = 11362da14db1da505ec8bfba8654; entry.version = 6 ; entry.lastAccess = 1318304580115; entry.maxIdle = 7200000|#]
</log-entries>
Again, I don't understand why the web container is performing a load when the requests are sticky. Will talk to the web container team and will let you know.
--Mahesh
On Oct 10, 2011, at 2:44 PM, forums_at_java.net wrote:
> Well, the same exact load balancer configuration works perfectly with
> Glassfish availability turned off.
>
> It should be the other way around :) sticky sesisons are there because
> session replication is not :)
>
> I can reproduce the error with small sessions this way, if the sticky
> sessions are turned off, it's very hard to reproduce the timeout and session
> loss.
>
> BTW the session loss happens EVERY time the TimeoutException happens.
>
> I wouldn't put the blame on load balancer, but perhaps the combination of
> it, sticky sessions, JK support in Glassfish and Shoal contribute to tease
>
> this bug to the surface.
>
> I attached th elog files from shoal debugging again.
>
> What I did is just loaded a single page, with no session through sticky
> session load balancer and 'available' web app.
>
> You get a tons of timeout exceptions from shoal. Someone from the shoal
> team should be able to parse these, I hope.
>
> Thanks again for your help Mahesh!
>
> If there is trouble uploading the logs, they are at
> http://hope.nyc.ny.us/~lprimak/files/server_logs.zip
>
>
> --
>
> [Message sent by forum member 'lprimak']
>
> View Post: http://forums.java.net/node/778666
>
>
>
> <server_logs.zip>