users@grizzly.java.net

Re: Upload a large file without oom with Grizzly

From: Ryan Lubke <ryan.lubke_at_oracle.com>
Date: Thu, 29 Aug 2013 09:52:54 -0700

Probably not a good idea. The cloner will be invoked if the write queue
can't write the message immediately and needs to add the message to the
queue. To is an added measure of safety as the upstream buffer could be
re-used or recycled.


Sébastien Lorber wrote:
>
>
> By chance do you if I can remove the MessageCloner used in the SSL filter?
> SSLBaseFilter$OnWriteCopyCloner
>
> It seems to allocate a lot of memory.
> I don't really understand why messages have to be cloned, can I remove
> this? How?
>
>
> 2013/8/29 Sébastien Lorber <lorber.sebastien_at_gmail.com
> <mailto:lorber.sebastien_at_gmail.com>>
>
>
> I'm trying to send a 500m file for my tests with a heap of 400m.
>
> In our real use cases we would probably have files under 20mo but
> we want to reduce the memory consumption because we can have x
> parallel uploads on the same server according to the user activity.
>
> I'll try to check if using this BodyGenerator reduced the memory
> footprint or if it's almost like before.
>
>
> 2013/8/28 Ryan Lubke <ryan.lubke_at_oracle.com
> <mailto:ryan.lubke_at_oracle.com>>
>
> At this point in time, as far as the SSL buffer allocation is
> concerned, it's untunable.
>
> That said, feel free to open a feature request.
>
> As to your second question, there is no suggested size. This
> is all very application specific.
>
> I'm curious, how large of a file are you sending?
>
>
>
> Sébastien Lorber wrote:
>> I have seen a lot of buffers which have a size of 33842 and
>> it seems the limit is near half the capacity.
>>
>> Perhaps there's a way to tune that buffer size so that it
>> consumes less memory?
>> Is there an ideal Buffer size to send to the feed method?
>>
>>
>> 2013/8/28 Ryan Lubke <ryan.lubke_at_oracle.com
>> <mailto:ryan.lubke_at_oracle.com>>
>>
>> I'll be reviewing the PR today, thanks again!
>>
>> Regarding the OOM: as it stands now, for each new buffer
>> that is passed to the SSLFilter, we allocate a buffer
>> twice the size in order to
>> accommodate the encrypted result. So there's an increase.
>>
>> Depending on the socket configurations of both endpoints,
>> and how fast the remote is reading data, it could
>> be the write queue is becoming too large. We do have a
>> way to detect this situation, but I'm pretty sure
>> the Grizzly internals are currently shielded here. I
>> will see what I can do to allow users to leverage this.
>>
>>
>>
>>
>> Sébastien Lorber wrote:
>>> Hello,
>>>
>>> I've made my pull request.
>>> https://github.com/AsyncHttpClient/async-http-client/pull/367
>>>
>>> With my usecase it works, the file is uploaded like before.
>>>
>>>
>>>
>>> But I didn't notice a big memory improvement.
>>>
>>> Is it possible that SSL doesn't allow to stream the body
>>> or something like that?
>>>
>>>
>>>
>>> In memory, I have a lot of:
>>> - HeapByteBuffer
>>> Which are hold by SSLUtils$3
>>> Which are hold by BufferBuffers
>>> Which are hold by WriteResult
>>> Which are hold by AsyncWriteQueueRecord
>>>
>>>
>>> Here is an exemple of the OOM stacktrace:
>>>
>>> java.lang.OutOfMemoryError: Java heap space
>>> at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
>>> at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
>>> at
>>> org.glassfish.grizzly.ssl.SSLUtils.allocateOutputBuffer(SSLUtils.java:342)
>>> at
>>> org.glassfish.grizzly.ssl.SSLBaseFilter$2.grow(SSLBaseFilter.java:117)
>>> at
>>> org.glassfish.grizzly.ssl.SSLConnectionContext.ensureBufferSize(SSLConnectionContext.java:392)
>>> at
>>> org.glassfish.grizzly.ssl.SSLConnectionContext.wrap(SSLConnectionContext.java:272)
>>> at
>>> org.glassfish.grizzly.ssl.SSLConnectionContext.wrapAll(SSLConnectionContext.java:227)
>>> at
>>> org.glassfish.grizzly.ssl.SSLBaseFilter.wrapAll(SSLBaseFilter.java:404)
>>> at
>>> org.glassfish.grizzly.ssl.SSLBaseFilter.handleWrite(SSLBaseFilter.java:319)
>>> at
>>> org.glassfish.grizzly.ssl.SSLFilter.accurateWrite(SSLFilter.java:255)
>>> at
>>> org.glassfish.grizzly.ssl.SSLFilter.handleWrite(SSLFilter.java:143)
>>> at
>>> com.ning.http.client.providers.grizzly.GrizzlyAsyncHttpProvider$SwitchingSSLFilter.handleWrite(GrizzlyAsyncHttpProvider.java:2503)
>>> at
>>> org.glassfish.grizzly.filterchain.ExecutorResolver$8.execute(ExecutorResolver.java:111)
>>> at
>>> org.glassfish.grizzly.filterchain.DefaultFilterChain.executeFilter(DefaultFilterChain.java:288)
>>> at
>>> org.glassfish.grizzly.filterchain.DefaultFilterChain.executeChainPart(DefaultFilterChain.java:206)
>>> at
>>> org.glassfish.grizzly.filterchain.DefaultFilterChain.execute(DefaultFilterChain.java:136)
>>> at
>>> org.glassfish.grizzly.filterchain.DefaultFilterChain.process(DefaultFilterChain.java:114)
>>> at
>>> org.glassfish.grizzly.ProcessorExecutor.execute(ProcessorExecutor.java:77)
>>> at
>>> org.glassfish.grizzly.filterchain.FilterChainContext.write(FilterChainContext.java:853)
>>> at
>>> org.glassfish.grizzly.filterchain.FilterChainContext.write(FilterChainContext.java:720)
>>> at
>>> com.ning.http.client.providers.grizzly.FeedableBodyGenerator.flushQueue(FeedableBodyGenerator.java:132)
>>> at
>>> com.ning.http.client.providers.grizzly.FeedableBodyGenerator.feed(FeedableBodyGenerator.java:101)
>>> at
>>> com.ning.http.client.providers.grizzly.MultipartBodyGeneratorFeeder$FeedBodyGeneratorOutputStream.write(MultipartBodyGeneratorFeeder.java:222)
>>> at
>>> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
>>> at
>>> java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
>>> at
>>> com.ning.http.multipart.FilePart.sendData(FilePart.java:179)
>>> at com.ning.http.multipart.Part.send(Part.java:331)
>>> at com.ning.http.multipart.Part.sendParts(Part.java:397)
>>> at
>>> com.ning.http.client.providers.grizzly.MultipartBodyGeneratorFeeder.feed(MultipartBodyGeneratorFeeder.java:144)
>>>
>>>
>>>
>>>
>>> Any idea?
>>>
>>>
>>>
>>> 2013/8/27 Ryan Lubke <ryan.lubke_at_oracle.com
>>> <mailto:ryan.lubke_at_oracle.com>>
>>>
>>> Excellent! Looking forward to the pull request!
>>>
>>>
>>> Sébastien Lorber wrote:
>>>> Ryan thanks, it works fine, I'll make a pull
>>>> request on AHC tomorrow with a better code using
>>>> the same Part classes that already exist.
>>>>
>>>> I created an OutputStream that redirects to the
>>>> BodyGenerator feeder.
>>>>
>>>> The problem I currently have is that the feeder
>>>> feeds the queue faster than the async thread
>>>> polling it :)
>>>> I need to expose a limit to that queue size or
>>>> something, will work on that, it will be better
>>>> than a thread sleep to slow down the filepart reading
>>>>
>>>>
>>>> 2013/8/27 Ryan Lubke <ryan.lubke_at_oracle.com
>>>> <mailto:ryan.lubke_at_oracle.com>>
>>>>
>>>> Yes, something like that. I was going to
>>>> tackle adding something like this today. I'll
>>>> follow up with something you can test out.
>>>>
>>>>
>>>> Sébastien Lorber wrote:
>>>>> Ok thanks!
>>>>>
>>>>> I think I see what I could do, probably
>>>>> something like that:
>>>>>
>>>>>
>>>>> FeedableBodyGenerator bodyGenerator = new
>>>>> FeedableBodyGenerator();
>>>>> MultipartBodyGeneratorFeeder
>>>>> bodyGeneratorFeeder = new
>>>>> MultipartBodyGeneratorFeeder(bodyGenerator);
>>>>> Request uploadRequest1 = new
>>>>> RequestBuilder("POST")
>>>>> .setUrl("url")
>>>>> .setBody(bodyGenerator)
>>>>> .build();
>>>>>
>>>>> ListenableFuture<Response> asyncRes =
>>>>> asyncHttpClient
>>>>> .prepareRequest(uploadRequest1)
>>>>> .execute(new
>>>>> AsyncCompletionHandlerBase());
>>>>>
>>>>>
>>>>> bodyGeneratorFeeder.append("param1","value1");
>>>>> bodyGeneratorFeeder.append("param2","value2");
>>>>>
>>>>> bodyGeneratorFeeder.append("fileToUpload",fileInputStream);
>>>>> bodyGeneratorFeeder.end();
>>>>>
>>>>> Response uploadResponse = asyncRes.get();
>>>>>
>>>>>
>>>>> Does it seem ok to you?
>>>>>
>>>>> I guess it could be interesting to provide
>>>>> that MultipartBodyGeneratorFeeder class to AHC
>>>>> or Grizzly since some other people may want to
>>>>> achieve the same thing
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> 2013/8/26 Ryan Lubke <ryan.lubke_at_oracle.com
>>>>> <mailto:ryan.lubke_at_oracle.com>>
>>>>>
>>>>>
>>>>>
>>>>> Sébastien Lorber wrote:
>>>>>
>>>>> Hello,
>>>>>
>>>>> I would like to know if it's possible
>>>>> to upload a file with AHC / Grizzly in
>>>>> streaming, I mean without loading the
>>>>> whole file bytes in memory.
>>>>>
>>>>> The default behavior seems to allocate
>>>>> a byte[] which contans the whole file,
>>>>> so it means that my server can be OOM
>>>>> if too many users upload a large file
>>>>> in the same time.
>>>>>
>>>>>
>>>>> I've tryied with a Heap and ByteBuffer
>>>>> memory managers, with
>>>>> reallocate=true/false but no more success.
>>>>>
>>>>> It seems the whole file content is
>>>>> appended wto the BufferOutputStream,
>>>>> and then the underlying buffer is written.
>>>>>
>>>>> At least this seems to be the case
>>>>> with AHC integration:
>>>>> https://github.com/AsyncHttpClient/async-http-client/blob/6faf1f316e5546110b0779a5a42fd9d03ba6bc15/providers/grizzly/src/main/java/org/asynchttpclient/providers/grizzly/bodyhandler/PartsBodyHandler.java
>>>>>
>>>>>
>>>>> So, is there a way to patch AHC to
>>>>> stream the file so that I could
>>>>> eventually consume only 20mo of heap
>>>>> while uploading a 500mo file?
>>>>> Or is this simply impossible with Grizzly?
>>>>> I didn't notice anything related to
>>>>> that in the documentation.
>>>>>
>>>>> It's possible with the
>>>>> FeedableBodyGenerator. But if you're tied
>>>>> to using Multipart uploads, you'd have to
>>>>> convert the multipart data to Buffers
>>>>> manually and send using the
>>>>> FeedableBodyGenerator.
>>>>> I'll take a closer look to see if this
>>>>> area can be improved.
>>>>>
>>>>>
>>>>> Btw in my case it is a file upload. I
>>>>> receive a file with CXF and have to
>>>>> transmit it to a storage server (like
>>>>> S3). CXF doesn't consume memory
>>>>> bevause it is streaming the large fle
>>>>> uploads to the file system, and then
>>>>> provides an input stream on that file.
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
>