users@glassfish.java.net

Re: File Descriptors leaking

From: Lachezar Dobrev <l.dobrev_at_gmail.com>
Date: Mon, 10 Sep 2012 14:48:12 +0300

  1. Can I Ğupgradeğ to 3.1.2.2 using the update manager?
     If not, where can I read about the procedure for upgrading a
production server?

  2. domain.xml attached.
     Please be advised, that sensitive information inside has been
obfuscated. Only values have been obfuscated. Structure has not been
altered.

2012/9/10 Oleksiy Stashok <oleksiy.stashok_at_oracle.com>:
> Hi,
>
> can you pls. try GF version 3.1.2.2?
> Also, if it's possible, pls. attach GF domain.xml.
>
> Thanks.
>
> WBR,
> Alexey.
>
>
> On 09/10/2012 10:45 AM, Lachezar Dobrev wrote:
>>
>> Hello all...
>> I have received no responses on this problem.
>>
>> I am still having this issue once or twice a week.
>>
>> After a number of searches in the past weeks I've gained little in
>> terms of understanding what happens.
>>
>> In my search I found out a defect report against Oracle's JVM, that
>> might be connected to the issue:
>>
>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7118373
>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=2223521
>>
>> I also came up with a mention in some blog:
>>
>> http://blog.fuseyism.com/index.php/category/openjdk/
>> (sorry, could not come up with a more legit source).
>>
>> From the blog post I can see, that the mentioned defect is noted in
>> 'Release 2.3.0 (2012-08-15)', which is a funny two days after my post.
>> May I have your comments? Does this sound like a OpenJDK defect? Is it
>> possible that it has been fixed in the meantime? From the looks on my
>> machine it seems it still uses OpenJDK 2.1.1 (openjdk-7-jdk
>> 7~u3-2.1.1~pre1-1ubuntu3).
>>
>> Please advise!
>>
>> 2012/8/29 Lachezar Dobrev <l.dobrev_at_gmail.com>:
>>>
>>> Hello colleagues,
>>>
>>> Recently we switched from Tomcat to Glassfish.
>>> However I noticed, that at certain point (unknown as of yet) the
>>> Glassfish server stops responding. I can't even stop it correctly
>>> (asadmin stop-domain hangs!).
>>>
>>> - Ubuntu Server - 12.04 (precise)
>>> - Intel Xeon (x64 arch)
>>> - java version "1.7.0_03"
>>> - OpenJDK 64-Bit Server VM (build 22.0-b10, mixed mode)
>>> - Glassfish 3.1.2 (no upgrades pending)
>>>
>>> The server serves via a JK Connector with a façade Apache server using
>>> mod_jk.
>>>
>>> The server runs only three applications (and the admin interface).
>>> All applications use Spring Framework. One uses JPA to a PostgreSQL on
>>> the local host, one uses an ObjectDB JPA, two use JDBC pool
>>> connections to a remote Microsoft SQL Server.
>>>
>>> The culprit seems to be some kind of File Descriptor leak.
>>> Initially the server died within a day or two. I had to increase the
>>> open files limit (s1024/h4096) to (s65536/h65536) thinking this may be
>>> just because too many files need to be opened. However that just
>>> postponed the server death to about one week uptime.
>>>
>>> I was able to make some checks at the latest crash, since I was
>>> awake in 3AM. What I found out was that there were an unbelievable
>>> number of lost (unclosed) pipes:
>>>
>>>> java 30142 glassfish 467r FIFO 0,8 0t0 4659245 pipe
>>>> java 30142 glassfish 468w FIFO 0,8 0t0 4659245 pipe
>>>> java 30142 glassfish 469u 0000 0,9 0 6821 anon_inode
>>>> java 30142 glassfish 487r FIFO 0,8 0t0 4676297 pipe
>>>> java 30142 glassfish 488w FIFO 0,8 0t0 4676297 pipe
>>>> java 30142 glassfish 489u 0000 0,9 0 6821 anon_inode
>>>
>>> The logs show a very long quiet period, just before the failure the
>>> log shows a normal log line from the actual server working (one of the
>>> applications).
>>> Then the log rolls and starts rolling every second. The failures
>>> start with (attached error_one.txt)
>>>
>>> The only line that has been obfuscated is the one with .... in it.
>>> The com.planetj... is a filter used to implement gzip compression
>>> (input and output) since I could not find how to configure that in
>>> Glassfish.
>>> The org.springframework... is obviously the Spring Framework.
>>>
>>> The log has an enormous amount (2835 for 19 seconds) of those
>>> messages. The messages are logged from within the same thread (same
>>> _ThreadID and _ThreadName), which leads me to believe all messages are
>>> a result of the processing of a single request.
>>> Afterwards the server begins dumping a lot of messages like
>>> (attached error_two.txt).
>>> The server is effectively blocked from that time on.
>>>
>>> At that point lsof shows 64K open files from Glassfish, the enormous
>>> majority being open popes (three descriptors each).
>>>
>>> I am at a loss here... The server currently needs either a periodic
>>> restart, or I need to 'kill' it when it blocks.
>>>
>>> I've been digging for this error around the Internet, and the
>>> closest I've seen has been due to not closing (leaking) Selectors.
>>> Please advise!
>
>