Re: Upload a large file without oom with Grizzly

From: S�bastien Lorber <lorber.sebastien_at_gmail.com>
Date: Wed, 28 Aug 2013 18:54:24 +0200

Hello,

I've made my pull request.
https://github.com/AsyncHttpClient/async-http-client/pull/367

With my usecase it works, the file is uploaded like before.

But I didn't notice a big memory improvement.

Is it possible that SSL doesn't allow to stream the body or something like
that?

In memory, I have a lot of:
- HeapByteBuffer
Which are hold by SSLUtils$3
Which are hold by BufferBuffers
Which are hold by WriteResult
Which are hold by AsyncWriteQueueRecord

Here is an exemple of the OOM stacktrace:

java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
at
org.glassfish.grizzly.ssl.SSLUtils.allocateOutputBuffer(SSLUtils.java:342)
at org.glassfish.grizzly.ssl.SSLBaseFilter$2.grow(SSLBaseFilter.java:117)
at
org.glassfish.grizzly.ssl.SSLConnectionContext.ensureBufferSize(SSLConnectionContext.java:392)
at
org.glassfish.grizzly.ssl.SSLConnectionContext.wrap(SSLConnectionContext.java:272)
at
org.glassfish.grizzly.ssl.SSLConnectionContext.wrapAll(SSLConnectionContext.java:227)
at org.glassfish.grizzly.ssl.SSLBaseFilter.wrapAll(SSLBaseFilter.java:404)
at
org.glassfish.grizzly.ssl.SSLBaseFilter.handleWrite(SSLBaseFilter.java:319)
at org.glassfish.grizzly.ssl.SSLFilter.accurateWrite(SSLFilter.java:255)
at org.glassfish.grizzly.ssl.SSLFilter.handleWrite(SSLFilter.java:143)
at
com.ning.http.client.providers.grizzly.GrizzlyAsyncHttpProvider$SwitchingSSLFilter.handleWrite(GrizzlyAsyncHttpProvider.java:2503)
at
org.glassfish.grizzly.filterchain.ExecutorResolver$8.execute(ExecutorResolver.java:111)
at
org.glassfish.grizzly.filterchain.DefaultFilterChain.executeFilter(DefaultFilterChain.java:288)
at
org.glassfish.grizzly.filterchain.DefaultFilterChain.executeChainPart(DefaultFilterChain.java:206)
at
org.glassfish.grizzly.filterchain.DefaultFilterChain.execute(DefaultFilterChain.java:136)
at
org.glassfish.grizzly.filterchain.DefaultFilterChain.process(DefaultFilterChain.java:114)
at
org.glassfish.grizzly.ProcessorExecutor.execute(ProcessorExecutor.java:77)
at
org.glassfish.grizzly.filterchain.FilterChainContext.write(FilterChainContext.java:853)
at
org.glassfish.grizzly.filterchain.FilterChainContext.write(FilterChainContext.java:720)
at
com.ning.http.client.providers.grizzly.FeedableBodyGenerator.flushQueue(FeedableBodyGenerator.java:132)
at
com.ning.http.client.providers.grizzly.FeedableBodyGenerator.feed(FeedableBodyGenerator.java:101)
at
com.ning.http.client.providers.grizzly.MultipartBodyGeneratorFeeder$FeedBodyGeneratorOutputStream.write(MultipartBodyGeneratorFeeder.java:222)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
at com.ning.http.multipart.FilePart.sendData(FilePart.java:179)
at com.ning.http.multipart.Part.send(Part.java:331)
at com.ning.http.multipart.Part.sendParts(Part.java:397)
at
com.ning.http.client.providers.grizzly.MultipartBodyGeneratorFeeder.feed(MultipartBodyGeneratorFeeder.java:144)

Any idea?

2013/8/27 Ryan Lubke <ryan.lubke_at_oracle.com>

> Excellent! Looking forward to the pull request!
>
>
> S�bastien Lorber wrote:
>
> Ryan thanks, it works fine, I'll make a pull request on AHC tomorrow with
> a better code using the same Part classes that already exist.
>
> I created an OutputStream that redirects to the BodyGenerator feeder.
>
> The problem I currently have is that the feeder feeds the queue faster
> than the async thread polling it :)
> I need to expose a limit to that queue size or something, will work on
> that, it will be better than a thread sleep to slow down the filepart
> reading
>
>
> 2013/8/27 Ryan Lubke <ryan.lubke_at_oracle.com>
>
>> Yes, something like that. I was going to tackle adding something like
>> this today. I'll follow up with something you can test out.
>>
>>
>> S�bastien Lorber wrote:
>>
>> Ok thanks!
>>
>> I think I see what I could do, probably something like that:
>>
>>
>> FeedableBodyGenerator bodyGenerator = new FeedableBodyGenerator();
>> MultipartBodyGeneratorFeeder bodyGeneratorFeeder = new
>> MultipartBodyGeneratorFeeder(bodyGenerator);
>> Request uploadRequest1 = new RequestBuilder("POST")
>> .setUrl("url")
>> .setBody(bodyGenerator)
>> .build();
>>
>> ListenableFuture<Response> asyncRes = asyncHttpClient
>> .prepareRequest(uploadRequest1)
>> .execute(new AsyncCompletionHandlerBase());
>>
>>
>> bodyGeneratorFeeder.append("param1","value1");
>> bodyGeneratorFeeder.append("param2","value2");
>> bodyGeneratorFeeder.append("fileToUpload",fileInputStream);
>> bodyGeneratorFeeder.end();
>>
>> Response uploadResponse = asyncRes.get();
>>
>>
>> Does it seem ok to you?
>>
>> I guess it could be interesting to provide that
>> MultipartBodyGeneratorFeeder class to AHC or Grizzly since some other
>> people may want to achieve the same thing
>>
>>
>>
>>
>>
>> 2013/8/26 Ryan Lubke <ryan.lubke_at_oracle.com>
>>
>>>
>>>
>>> S�bastien Lorber wrote:
>>>
>>>> Hello,
>>>>
>>>> I would like to know if it's possible to upload a file with AHC /
>>>> Grizzly in streaming, I mean without loading the whole file bytes in memory.
>>>>
>>>> The default behavior seems to allocate a byte[] which contans the whole
>>>> file, so it means that my server can be OOM if too many users upload a
>>>> large file in the same time.
>>>>
>>>>
>>>> I've tryied with a Heap and ByteBuffer memory managers, with
>>>> reallocate=true/false but no more success.
>>>>
>>>> It seems the whole file content is appended wto the BufferOutputStream,
>>>> and then the underlying buffer is written.
>>>>
>>>> At least this seems to be the case with AHC integration:
>>>>
>>>> https://github.com/AsyncHttpClient/async-http-client/blob/6faf1f316e5546110b0779a5a42fd9d03ba6bc15/providers/grizzly/src/main/java/org/asynchttpclient/providers/grizzly/bodyhandler/PartsBodyHandler.java
>>>>
>>>>
>>>> So, is there a way to patch AHC to stream the file so that I could
>>>> eventually consume only 20mo of heap while uploading a 500mo file?
>>>> Or is this simply impossible with Grizzly?
>>>> I didn't notice anything related to that in the documentation.
>>>>
>>> It's possible with the FeedableBodyGenerator. But if you're tied to
>>> using Multipart uploads, you'd have to convert the multipart data to
>>> Buffers manually and send using the FeedableBodyGenerator.
>>> I'll take a closer look to see if this area can be improved.
>>>
>>>
>>>> Btw in my case it is a file upload. I receive a file with CXF and have
>>>> to transmit it to a storage server (like S3). CXF doesn't consume memory
>>>> bevause it is streaming the large fle uploads to the file system, and then
>>>> provides an input stream on that file.
>>>>
>>>> Thanks
>>>>
>>>>
>>>>
>>
>