users@glassfish.java.net

Re: glassfish 3.1.1, VPS (max sockets 1024 + 1024 local sockets), config suggestions to avoid Too many open files problem

From: Mladen Adamovic <mladen.adamovic_at_gmail.com>
Date: Tue, 24 Jul 2012 11:58:38 +0200

Hi guys, currently with 1 max alive request per connection, the webserver
started to refuse connections, connection queue with max 500 is full.

server.log looks clean and this is summary of the netstat:

root_at_lvps176-28-13-94:~# netstat -an|awk '/tcp/ {print $6}'|sort|uniq -c
     96 CLOSE_WAIT
    143 ESTABLISHED
      1 FIN_WAIT1
      1 FIN_WAIT2
     10 LISTEN
     29 TIME_WAIT
root_at_lvps176-28-13-94:~# netstat -an|awk '/tcp/ {print $6}'|sort|uniq -c
     96 CLOSE_WAIT
    138 ESTABLISHED
      6 FIN_WAIT2
     10 LISTEN
     20 TIME_WAIT
root_at_lvps176-28-13-94:~# netstat -an|awk '/tcp/ {print $6}'|sort|uniq -c
     99 CLOSE_WAIT
    134 ESTABLISHED
      1 FIN_WAIT2
     10 LISTEN
     28 TIME_WAIT
root_at_lvps176-28-13-94:~# uptime
 11:52:46 up 15:27, 1 user, load average: 0.30, 0.06, 0.02
root_at_lvps176-28-13-94:~# netstat -an|awk '/tcp/ {print $6}'|sort|uniq -c
    111 CLOSE_WAIT
    223 ESTABLISHED
      1 FIN_WAIT2
     10 LISTEN
      2 SYN_RECV
     17 TIME_WAIT


and the number of connections seems to increase:
root_at_lvps176-28-13-94:~# netstat -an|awk '/tcp/ {print $6}'|sort|uniq -c
    110 CLOSE_WAIT
    251 ESTABLISHED
      2 FIN_WAIT2
     10 LISTEN
      1 SYN_RECV
    404 TIME_WAIT


Connection queue seems to be full although processing time is 3ms:


onitor (34 Statistics)
 File Cache Statistics : http-listener-1
Name
Value
Start Time
Last Sample Time
Details
Description
ContentMissesCount 0 count Jul 23, 2012 9:57:57 PM -- -- Number of misses
on cached file content
OpenCacheEntriesCount 0 count Jul 23, 2012 9:57:57 PM -- -- Number of
current open cache entries
MaxMappedMemorySize 0 byte(s) Jul 24, 2012 9:54:48 AM Jul 24, 2012 9:54:48
AM -- Maximum memory map size used for caching
HitsCount 0 count Jul 23, 2012 9:57:57 PM -- -- Number of cache lookup hits
InfoMissesCount 0 count Jul 23, 2012 9:57:57 PM -- -- Number of misses on
cached file info
ContentHitsCount 0 count Jul 23, 2012 9:57:57 PM -- -- Number of hits on
cached file content
HeapSize 0 byte(s) Jul 24, 2012 9:54:48 AM Jul 24, 2012 9:54:48 AM -- Current
cache size in bytes
MaxHeapSize 0 byte(s) Jul 24, 2012 9:54:48 AM Jul 24, 2012 9:54:48 AM
-- Maximum
heap space used for cache
InfoHitsCount 0 count Jul 23, 2012 9:57:57 PM -- -- Number of hits on
cached file info
MappedMemorySize 0 byte(s) Jul 24, 2012 9:54:48 AM Jul 24, 2012 9:54:48 AM
-- Size of mapped memory used for caching
MissesCount 0 count Jul 23, 2012 9:57:57 PM -- -- Number of cache lookup
misses
 Keep Alive Statistics : http-listener-1
Name
Value
Start Time
Last Sample Time
Details
Description
CountTimeouts 48452 count Jul 23, 2012 9:57:57 PM Jul 24, 2012 9:47:58
AM -- Number
of keep-alive connections that timed out
MaxRequests 1 count Jul 23, 2012 9:57:57 PM Jul 23, 2012 9:57:58 PM -- Maximum
number of requests allowed on a single keep-alive connection
CountHits 217839 count Jul 23, 2012 9:57:57 PM Jul 24, 2012 9:47:59 AM
-- Number
of requests received by connections in keep-alive mode
SecondsTimeouts 5 seconds Jul 23, 2012 9:57:57 PM Jul 23, 2012 9:57:58
PM -- Keep-alive
timeout value in seconds
CountConnections 188 count Jul 23, 2012 9:57:57 PM Jul 24, 2012 9:47:59 AM
-- Number of connections in keep-alive mode
CountFlushes 96184 count Jul 23, 2012 9:57:57 PM Jul 24, 2012 9:47:52
AM -- Number
of keep-alive connections that were closed
CountRefusals 73011 count Jul 23, 2012 9:57:57 PM Jul 24, 2012 9:47:43
AM -- Number
of keep-alive connections that were rejected
 Connection Queue Statistics : http-listener-1
Name
Value
Start Time
Last Sample Time
Details
Description
CountQueued15MinutesAverage 3365 count Jul 24, 2012 9:54:48 AM Jul 24, 2012
9:54:48 AM -- Average number of connections queued in the last 15 minutes
CountQueued5MinutesAverage 203 count Jul 24, 2012 9:54:48 AM Jul 24, 2012
9:54:48 AM -- Average number of connections queued in the last 5 minutes
CountQueued1MinuteAverage 0 count Jul 24, 2012 9:54:48 AM Jul 24, 2012
9:54:48 AM -- Average number of connections queued in the last 1 minute
TicksTotalQueued 0 count Jul 23, 2012 9:57:57 PM Jul 23, 2012 9:57:57
PM -- (Unsupported)
Total number of ticks that connections have spent in the queue
PeakQueued 500 count Jul 23, 2012 9:57:57 PM Jul 24, 2012 9:51:45 AM -- Largest
number of connections that were in the queue simultaneously
CountOpenConnections 202 count Jul 24, 2012 9:54:48 AM Jul 24, 2012 9:54:48
AM -- The number of open/active connections
CountTotalConnections 163494 count Jul 23, 2012 9:57:57 PM Jul 24, 2012
9:54:48 AM -- Total number of connections that have been accepted
CountTotalQueued 248658 count Jul 23, 2012 9:57:57 PM Jul 24, 2012 9:51:45
AM -- Total number of connections that have been queued
CountQueued 500 count Jul 23, 2012 9:57:57 PM Jul 24, 2012 9:51:45 AM -- Number
of connections currently in the queue
CountOverflows 888 count Jul 23, 2012 9:57:57 PM Jul 24, 2012 9:54:48
AM -- Number
of times the queue has been too full to accommodate a connection
MaxQueued 500 count Jul 23, 2012 9:57:57 PM Jul 23, 2012 9:57:58 PM -- Maximum
size of the connection queue
 Thread Pool Statistics : http-listener-1
Name
Value
Start Time
Last Sample Time
Details
Description
CoreThreads 5 count Jul 23, 2012 9:57:57 PM Jul 23, 2012 9:57:58 PM -- Core
number of threads in the thread pool
CurrentThreadsBusy 200 count Jul 23, 2012 9:57:57 PM Jul 24, 2012 9:47:59 AM
-- Provides the number of request processing threads currently in use in
the listener thread pool serving requests
TotalExecutedTasksCount 247958 count Jul 23, 2012 9:57:57 PM Jul 24, 2012
9:47:52 AM -- Provides the total number of tasks, which were executed by
the thread pool
MaxThreads 200 count Jul 23, 2012 9:57:57 PM Jul 23, 2012 9:57:58 PM -- Maximum
number of threads allowed in the thread pool
CurrentThreadCount 200 count Jul 23, 2012 9:57:57 PM Jul 24, 2012 9:47:59 AM
-- Provides the number of request processing threads currently in the
listener thread pool



Monitor (18 Statistics)
 Web Container: Servlet Statistics : LivingCost
Name
Value
Start Time
Last Sample Time
Details
Description
ActiveServletsLoaded 12count Jul 23, 2012 9:58:08 PM Jul 24, 2012
8:50:52 AM High
Water Mark: 12 count
Low Water Mark: 0 count
Number of Servlets loaded
TotalServletsLoaded 12 count Jul 23, 2012 9:58:08 PM Jul 24, 2012 8:50:52 AM
-- Total number of Servlets ever loaded
ServletProcessingTimes 867166 millisecond Jul 23, 2012 9:58:08 PM Jul 24,
2012 9:47:58 AM -- Cumulative Servlet processing times
 Web Container: Session Statistics : LivingCost
Name
Value
Start Time
Last Sample Time
Details
Description
PersistedSessionsTotal 0 count Jul 23, 2012 9:58:08 PM -- -- Total number
of sessions ever persisted
RejectedSessionsTotal 0 count Jul 23, 2012 9:58:08 PM -- -- Total number of
sessions ever rejected
SessionsTotal 51635 count Jul 23, 2012 9:58:08 PM Jul 24, 2012 9:47:58
AM -- Total
number of sessions ever created
ActiveSessions 2074count Jul 23, 2012 9:58:08 PM Jul 24, 2012 9:55:31 AM High
Water Mark: 4468 count
Low Water Mark: 0 count
Number of active sessions
ExpiredSessionsTotal 49561 count Jul 23, 2012 9:58:08 PM Jul 24, 2012
9:55:31 AM -- Total number of sessions ever expired
ActivatedSessionsTotal 0 count Jul 23, 2012 9:58:08 PM -- -- Total number
of sessions ever activated
PassivatedSessionsTotal 0 count Jul 23, 2012 9:58:08 PM -- -- Total number
of sessions ever passivated
 Web Container: Request Statistics : LivingCost
Name
Value
Start Time
Last Sample Time
Details
Description
RequestCount 222664 count Jul 23, 2012 9:58:08 PM Jul 24, 2012 9:56:20
AM -- Cumulative
number of requests processed so far
MaxTime 3896 millisecond Jul 23, 2012 9:58:08 PM Jul 24, 2012 9:56:20
AM -- Longest
response time for a request; not a cumulative value, but the largest
response time from among the response times
ProcessingTime 2 millisecond Jul 23, 2012 9:58:08 PM Jul 24, 2012 9:56:20 AM
-- Average request processing time
ErrorCount 958 count Jul 23, 2012 9:58:08 PM Jul 24, 2012 9:47:50 AM
-- Cumulative
value of the error count, with error count representing the number of cases
where the response code was greater than or equal to 400
 Web Container: JSP Statistics : LivingCost
Name
Value
Start Time
Last Sample Time
Details
Description
JspReloadedCount 166 count Jul 23, 2012 9:58:08 PM Jul 24, 2012 9:01:43 AM
-- Total number of JSP pages that were reloaded
TotalJspCount 166 count Jul 23, 2012 9:58:08 PM Jul 24, 2012 9:01:43
AM -- Total
number of JSP pages ever loaded
JspCount 166count Jul 23, 2012 9:58:08 PM Jul 24, 2012 9:01:43 AM High
Water Mark: 166 count
Low Water Mark: 0 count
Number of active JSP pages
JspErrorCount 0 count Jul 23, 2012 9:58:08 PM -- -- Total number of errors
triggered by JSP page invocations





On Tue, Jul 24, 2012 at 12:08 AM, Mladen Adamovic <mladen.adamovic_at_gmail.com
> wrote:

>
>> When you disable keep-alive connections, does it change anything w.r.t.
>> a number of CLOSE_WAIT connections?
>>
>>
> Well, I cannot always reproduce the problem, the last time it ran for 12
> hours before crashing. Note that this is production server so it doesn't
> make sense for me to run "experiments against the real user base".
>
> Currently the webserver is running with modified configuration, which
> includes keep-alive max requests to 20, connection time out to 15 seconds
> and this is how number of connections did look like recently:
> root_at_lvps176-28-13-94:~# netstat -an|awk '/tcp/ {print $6}'|sort|uniq -c
> 105 ESTABLISHED
> 1 FIN_WAIT1
> 42 FIN_WAIT2
> 2 LAST_ACK
> 10 LISTEN
> 4 SYN_RECV
> 112 TIME_WAIT
> root_at_lvps176-28-13-94:~# netstat -an|awk '/tcp/ {print $6}'|sort|uniq -c
> 102 ESTABLISHED
> 1 FIN_WAIT1
> 33 FIN_WAIT2
> 2 LAST_ACK
> 10 LISTEN
> 4 SYN_RECV
> 123 TIME_WAIT
> root_at_lvps176-28-13-94:~# netstat -an|awk '/tcp/ {print $6}'|sort|uniq -c
> 88 ESTABLISHED
> 1 FIN_WAIT1
> 20 FIN_WAIT2
> 2 LAST_ACK
> 10 LISTEN
> 8 SYN_RECV
> 137 TIME_WAIT
> root_at_lvps176-28-13-94:~# netstat -a | grep "CLOSE_WAIT" | wc -l
> 0
> root_at_lvps176-28-13-94:~# netstat -a | grep "CLOSE_WAIT" | wc -l
> 0
> root_at_lvps176-28-13-94:~# netstat -a | grep "CLOSE_WAIT" | wc -l
> 2
> root_at_lvps176-28-13-94:~# netstat -an|awk '/tcp/ {print $6}'|sort|uniq -c
> 2 CLOSE_WAIT
> 72 ESTABLISHED
> 2 FIN_WAIT1
> 39 FIN_WAIT2
> 10 LISTEN
> 2 SYN_RECV
> 110 TIME_WAIT
> root_at_lvps176-28-13-94:~# netstat -an|awk '/tcp/ {print $6}'|sort|uniq -c
> 35 ESTABLISHED
> 7 FIN_WAIT2
> 10 LISTEN
> 31 SYN_RECV
> 88 TIME_WAIT
> root_at_lvps176-28-13-94:~# netstat -an|awk '/tcp/ {print $6}'|sort|uniq -c
> 1 CLOSE_WAIT
> 70 ESTABLISHED
> 26 FIN_WAIT2
> 1 LAST_ACK
> 10 LISTEN
> 18 SYN_RECV
> 88 TIME_WAIT
> root_at_lvps176-28-13-94:~# netstat -an|awk '/tcp/ {print $6}'|sort|uniq -c
> 71 ESTABLISHED
> 2 FIN_WAIT1
> 28 FIN_WAIT2
> 10 LISTEN
> 6 SYN_RECV
> 128 TIME_WAIT
> root_at_lvps176-28-13-94:~#
>
>
> OK, now checking with max request per keep alive connection set to 1:
> root_at_lvps176-28-13-94:~# netstat -an|awk '/tcp/ {print $6}'|sort|uniq -c
> 28 ESTABLISHED
> 1 FIN_WAIT1
> 3 FIN_WAIT2
> 10 LISTEN
> 5 SYN_RECV
> 241 TIME_WAIT
> root_at_lvps176-28-13-94:~# netstat -an|awk '/tcp/ {print $6}'|sort|uniq -c
> 17 ESTABLISHED
> 2 FIN_WAIT1
> 5 FIN_WAIT2
> 10 LISTEN
> 5 SYN_RECV
> 250 TIME_WAIT
> root_at_lvps176-28-13-94:~# netstat -an|awk '/tcp/ {print $6}'|sort|uniq -c
> 1 CLOSING
> 30 ESTABLISHED
> 9 FIN_WAIT2
> 10 LISTEN
> 5 SYN_RECV
> 267 TIME_WAIT
> root_at_lvps176-28-13-94:~# netstat -an|awk '/tcp/ {print $6}'|sort|uniq -c
> 29 ESTABLISHED
> 6 FIN_WAIT1
> 15 FIN_WAIT2
> 10 LISTEN
> 5 SYN_RECV
> 221 TIME_WAIT
> root_at_lvps176-28-13-94:~# netstat -an|awk '/tcp/ {print $6}'|sort|uniq -c
> 26 ESTABLISHED
> 8 FIN_WAIT1
> 16 FIN_WAIT2
> 10 LISTEN
> 6 SYN_RECV
> 227 TIME_WAIT
> root_at_lvps176-28-13-94:~# netstat -an|awk '/tcp/ {print $6}'|sort|uniq -c
> 27 ESTABLISHED
> 3 FIN_WAIT1
> 19 FIN_WAIT2
> 10 LISTEN
> 10 SYN_RECV
> 183 TIME_WAIT
> root_at_lvps176-28-13-94:~# netstat -an|awk '/tcp/ {print $6}'|sort|uniq -c
> 1 CLOSE_WAIT
> 21 ESTABLISHED
> 4 FIN_WAIT1
> 21 FIN_WAIT2
> 10 LISTEN
> 10 SYN_RECV
> 196 TIME_WAIT
> root_at_lvps176-28-13-94:~# netstat -an|awk '/tcp/ {print $6}'|sort|uniq -c
> 1 CLOSE_WAIT
> 32 ESTABLISHED
> 4 FIN_WAIT1
> 15 FIN_WAIT2
> 10 LISTEN
> 7 SYN_RECV
> 239 TIME_WAIT
>
>
> Looks worser. Around ~200 (oscillating) connections in TIME_WAIT state.
> When not using keep-alive many connections are created and dropped and
> sometimes connection can stay in TIME_WAIT for 240ms. It's easy to go out
> of network sockets, it seems.
>
>
>
>
> Thanks.
>>
>> WBR,
>> Alexey.
>>
>>
>>
>> Hm, I've seen one problem with the Virtuozzo server in the log:
>> quotaugidlimit
>> Number of user/group IDs allowed for the Container internal disk quota.
>> If set to 0, UID/GID quota will not be enabled.
>>
>> This is set to limit 2000, but I don't understand what it has with
>> files, it shall be number of UID/GIDs, and at the moment is has been steady
>> at 43 at VPS.
>> Perhaps these are problem with Virtuozzo setup but it has to be proven
>> somehow.
>>
>>
>>
>> --
>> Mladen Adamovic
>> Numbeo
>> Drziceva 9, 11120 Belgrade-Zvezdara, Serbia
>> (Business Registration Number 62612240)
>> Tel. +381-66-058-595
>> email: mladen.adamovic_at_gmail.com
>> web: http://www.numbeo.com
>>
>>
>>
>
>
> --
> Mladen Adamovic
> Numbeo
> Drziceva 9, 11120 Belgrade-Zvezdara, Serbia
> (Business Registration Number 62612240)
> Tel. +381-66-058-595
> email: mladen.adamovic_at_gmail.com
> web: http://www.numbeo.com
>



-- 
Mladen Adamovic
Numbeo
Drziceva 9, 11120 Belgrade-Zvezdara, Serbia
(Business Registration Number 62612240)
Tel. +381-66-058-595
email: mladen.adamovic_at_gmail.com
web: http://www.numbeo.com