I think I figured out a bit more about this.
It seems that the only part where the message is misdirecting, is the one
in the logs. This has nothing to do with "Idle thread timeout". The "Idle
thread timeout" works as it should, and (according to documentation, and
looks so), is used to terminate an idle thread from a thread pool, to
shrink the thread pool size (until it reaches the minimum).
The reason the thread is being interrupted during execution, is because of
the request time, a "request-timeout-seconds" property of a protocol,
which, in my environment, was coincidentally also 900 seconds (I think
those are defaults for both "Idle thread timeout" and "Request timeout").
Though the message says "Interrupting idle thread", it should say
"Interrupting busy thread due to protocol <insert protocol name here for
clarity> request timeout".
Setting request timeout to -1 helped me to prevent threads that execute
longer (than 15min) from being cancelled.
Also, it makes it possible to create a secondary protocol with a -1 request
time, an additional listener (on an extra port) to use that protocol, and
then use that port for executing long-time operations (even on the same
thread pool), leaving the other port for requests that should be guarded
from being stuck (whenever possible).
On Tue, May 13, 2014 at 1:01 PM, Pawel Veselov <pawel.veselov_at_gmail.com>wrote:
>
> Hi Noah.
>
> I wonder if there is a GF-specific API method that a thread may call to
> extend the timeout if it knows it's gonna run longer than normal.
> It's more problematic on some of my servlets that constantly emit data as
> they run, as they really get interrupted before the work is done. Also, any
> call that invokes an "interruptible" operation (e.g. Object.wait()), and
> doesn't ignore, or re-try after getting InterruptedException will most
> likely fail the request.
>
> At the same time, I do see validity in having the server trying to cancel
> threads that may legitimately run astray, and would hate to disable the
> timeout all together...
>
> Thank you,
> Pawel.
>
> On Thu, May 8, 2014 at 8:02 PM, Noah White <emailnbw_at_gmail.com> wrote:
>
>> The threads are considered IDLE and that's why you are running into this.
>> The IDLE timeout (despite its name) has no concern if there is actually
>> activity being done by the thread. It only looks at the amount if time it's
>> been assigned from the pool.
>>
>> IMO it's at best poor verbiage to call it and IDLE timeout value. They
>> should either change the verbiage or implement functionality so that times
>> out on the lack of actual thread activity.
>>
>> A while back I filed an issue in the GF jira about this - feel free to
>> throw a vote on it.
>>
>> -Noah
>>
>> Sent from my iPhone
>>
>> > On May 8, 2014, at 8:17 PM, Pawel Veselov <pawel.veselov_at_gmail.com>
>> wrote:
>> >
>> > Hi.
>> >
>> > I'm being hit with the flood of "Interrupting idle thread" messages in
>> my GF logs, for threads that aren't idle, but are just taking long (hours)
>> time to execute.
>> >
>> > It's normal for these particular requests to take that long time. The
>> thread pool idle clean-up timeout is 900sec, and I assume the interruption
>> starts after that time. It's reflected in the responses, when trying with
>> curl, for example, it would wait for all the time that the thread will take
>> to execute the job, but the job is completed, the response fails; that.
>> Because the HTTP request body is partially written by that time (before the
>> job's started), curl just says "curl: (18) transfer closed with outstanding
>> read data remaining". This only happens when response time exceeds 15
>> minutes (900sec). So, that pending interrupt actually cancels the request,
>> once GF classes can get their hands on the thread.
>> >
>> > Shouldn't this thread be considered "in use", and shouldn't be
>> interrupted?
>>
>
> [skipped]
>