users@glassfish.java.net

Re: File Descriptors leaking

From: Oleksiy Stashok <oleksiy.stashok_at_oracle.com>
Date: Mon, 10 Sep 2012 15:18:59 +0200

Hi,

On 09/10/2012 01:48 PM, Lachezar Dobrev wrote:
> 1. Can I Ğupgradeğ to 3.1.2.2 using the update manager?
> If not, where can I read about the procedure for upgrading a
> production server?
trying to figure that out, will let you know. Glassfish 3.1.2.2 has
fixes for JK connector [1], so it would really make sense for you, even
though it doesn't have any file-descriptor related fixes.

> 2. domain.xml attached.
> Please be advised, that sensitive information inside has been
> obfuscated. Only values have been obfuscated. Structure has not been
> altered.
Looks fine, not sure if you really need 200 threads for thread-pool-1,
but it shouldn't cause any problems.

Does you server machine run only Glassfish server, or Apache front-end
is also installed on the same machine?
When you see the problem, pls. run netstat, it will help us to
understand the type of connections, which consume descriptors.

Thanks.

WBR,
Alexey.

[1] http://java.net/jira/browse/GLASSFISH-18446
>
> 2012/9/10 Oleksiy Stashok <oleksiy.stashok_at_oracle.com>:
>> Hi,
>>
>> can you pls. try GF version 3.1.2.2?
>> Also, if it's possible, pls. attach GF domain.xml.
>>
>> Thanks.
>>
>> WBR,
>> Alexey.
>>
>>
>> On 09/10/2012 10:45 AM, Lachezar Dobrev wrote:
>>> Hello all...
>>> I have received no responses on this problem.
>>>
>>> I am still having this issue once or twice a week.
>>>
>>> After a number of searches in the past weeks I've gained little in
>>> terms of understanding what happens.
>>>
>>> In my search I found out a defect report against Oracle's JVM, that
>>> might be connected to the issue:
>>>
>>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7118373
>>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=2223521
>>>
>>> I also came up with a mention in some blog:
>>>
>>> http://blog.fuseyism.com/index.php/category/openjdk/
>>> (sorry, could not come up with a more legit source).
>>>
>>> From the blog post I can see, that the mentioned defect is noted in
>>> 'Release 2.3.0 (2012-08-15)', which is a funny two days after my post.
>>> May I have your comments? Does this sound like a OpenJDK defect? Is it
>>> possible that it has been fixed in the meantime? From the looks on my
>>> machine it seems it still uses OpenJDK 2.1.1 (openjdk-7-jdk
>>> 7~u3-2.1.1~pre1-1ubuntu3).
>>>
>>> Please advise!
>>>
>>> 2012/8/29 Lachezar Dobrev <l.dobrev_at_gmail.com>:
>>>> Hello colleagues,
>>>>
>>>> Recently we switched from Tomcat to Glassfish.
>>>> However I noticed, that at certain point (unknown as of yet) the
>>>> Glassfish server stops responding. I can't even stop it correctly
>>>> (asadmin stop-domain hangs!).
>>>>
>>>> - Ubuntu Server - 12.04 (precise)
>>>> - Intel Xeon (x64 arch)
>>>> - java version "1.7.0_03"
>>>> - OpenJDK 64-Bit Server VM (build 22.0-b10, mixed mode)
>>>> - Glassfish 3.1.2 (no upgrades pending)
>>>>
>>>> The server serves via a JK Connector with a façade Apache server using
>>>> mod_jk.
>>>>
>>>> The server runs only three applications (and the admin interface).
>>>> All applications use Spring Framework. One uses JPA to a PostgreSQL on
>>>> the local host, one uses an ObjectDB JPA, two use JDBC pool
>>>> connections to a remote Microsoft SQL Server.
>>>>
>>>> The culprit seems to be some kind of File Descriptor leak.
>>>> Initially the server died within a day or two. I had to increase the
>>>> open files limit (s1024/h4096) to (s65536/h65536) thinking this may be
>>>> just because too many files need to be opened. However that just
>>>> postponed the server death to about one week uptime.
>>>>
>>>> I was able to make some checks at the latest crash, since I was
>>>> awake in 3AM. What I found out was that there were an unbelievable
>>>> number of lost (unclosed) pipes:
>>>>
>>>>> java 30142 glassfish 467r FIFO 0,8 0t0 4659245 pipe
>>>>> java 30142 glassfish 468w FIFO 0,8 0t0 4659245 pipe
>>>>> java 30142 glassfish 469u 0000 0,9 0 6821 anon_inode
>>>>> java 30142 glassfish 487r FIFO 0,8 0t0 4676297 pipe
>>>>> java 30142 glassfish 488w FIFO 0,8 0t0 4676297 pipe
>>>>> java 30142 glassfish 489u 0000 0,9 0 6821 anon_inode
>>>> The logs show a very long quiet period, just before the failure the
>>>> log shows a normal log line from the actual server working (one of the
>>>> applications).
>>>> Then the log rolls and starts rolling every second. The failures
>>>> start with (attached error_one.txt)
>>>>
>>>> The only line that has been obfuscated is the one with .... in it.
>>>> The com.planetj... is a filter used to implement gzip compression
>>>> (input and output) since I could not find how to configure that in
>>>> Glassfish.
>>>> The org.springframework... is obviously the Spring Framework.
>>>>
>>>> The log has an enormous amount (2835 for 19 seconds) of those
>>>> messages. The messages are logged from within the same thread (same
>>>> _ThreadID and _ThreadName), which leads me to believe all messages are
>>>> a result of the processing of a single request.
>>>> Afterwards the server begins dumping a lot of messages like
>>>> (attached error_two.txt).
>>>> The server is effectively blocked from that time on.
>>>>
>>>> At that point lsof shows 64K open files from Glassfish, the enormous
>>>> majority being open popes (three descriptors each).
>>>>
>>>> I am at a loss here... The server currently needs either a periodic
>>>> restart, or I need to 'kill' it when it blocks.
>>>>
>>>> I've been digging for this error around the Internet, and the
>>>> closest I've seen has been due to not closing (leaking) Selectors.
>>>> Please advise!
>>