Re: Upload a large file without oom with Grizzly

From: S�bastien Lorber <lorber.sebastien_at_gmail.com>
Date: Fri, 30 Aug 2013 17:23:16 +0200

By the way, do you have any idea when the 1.7.20 will be released (with
these new improvements?)

We would like to know if we wait for a release or if we install our own
temp release on Nexus :)

2013/8/30 S�bastien Lorber <lorber.sebastien_at_gmail.com>

> Thank you, it works fine!
>
>
> I just had to modify a single line after your commit.
>
>
> com.ning.http.client.providers.grizzly.GrizzlyAsyncHttpProvider#initializeTransport
> ->
> clientTransport.getAsyncQueueIO().getWriter().setMaxPendingBytesPerConnection(10000);
>
>
> If I let the initial value (-1) it won't block, canWrite always returns
> true
>
>
> Btw, on AHC I didn't find any way to pass this value as a config
> attribute, neither the size of the write buffer you talked about.
>
> So in the end, is there a way with current AHC code to use this "canWrite
> = false" behavior?
> If not, can you please provide a way to set this behavior on v1.7.20 ?
> thanks
>
>
> PS: does it make sens to use the same number of bytes un the feed(Buffer)
> method and in the setMaxPendingBytesPerConnection(10000); ? do you have
> any tuning recommandation?
>
>
>
> 2013/8/29 Ryan Lubke <ryan.lubke_at_oracle.com>
>
>> Please disregard.
>>
>>
>> Ryan Lubke wrote:
>>
>> S�bastien,
>>
>> Could you also provide a sample of how you're performing your feed?
>>
>> Thanks,
>> -rl
>>
>> Ryan Lubke wrote:
>>
>> S�bastien,
>>
>> I'd recommend looking at Connection.canWrite() [1] and
>> Connection.notifyCanWrite(WriteListener) [1]
>>
>> By default, Grizzly will configure the async write queue length to be
>> four times the write buffer size (which is based off the socket write
>> buffer).
>> When this queue exceeds this value, canWrite() will return false.
>>
>> When this occurs, you can register a WriteListener to be notified when
>> the queue length is below the configured max and then simulate blocking
>> until the onWritePossible() callback has been invoked.
>>
>> ----------------------------------------------------------------
>>
>> final FutureImpl<Boolean> future = Futures.createSafeFuture();
>>
>> // Connection may be obtained by calling
>> FilterChainContext.getConnection().
>> connection.notifyCanWrite(new WriteHandler() {
>>
>> @Override
>> public void onWritePossible() throws Exception {
>> future.result(Boolean.TRUE);
>> }
>>
>> @Override
>> public void onError(Throwable t) {
>> future.failure(Exceptions.makeIOException(t));
>> }
>> });
>>
>> try {
>> final long writeTimeout = 30;
>> future.get(writeTimeout, TimeUnit.MILLISECONDS);
>> } catch (ExecutionException e) {
>> HttpTransactionContext httpCtx =
>> HttpTransactionContext.get(connection);
>> httpCtx.abort(e.getCause());
>> } catch (Exception e) {
>> HttpTransactionContext httpCtx =
>> HttpTransactionContext.get(connection);
>> httpCtx.abort(e);
>> }
>>
>> ---------------------------------------------------------------------
>>
>> [1]
>> http://grizzly.java.net/docs/2.3/apidocs/org/glassfish/grizzly/OutputSink.html
>> .
>>
>> S�bastien Lorber wrote:
>>
>> Ryan, I've did some other tests.
>>
>>
>> It seems that using a blocking queue in the FeedableBodyGenerator is
>> totally useless because the thread consuming it is not blocking and the
>> queue never blocks the feeding, which was my intention in the beginning.
>> Maybe it depends on the IO strategy used?
>> I use AHC default which seems to use SameThreadIOStrategy so I don't
>> think it's related to the IO strategy.
>>
>>
>> So in the end I can upload a 70m file with a heap of 50m, but I have to
>> put a Thread.sleep(30) between each 100k Buffer send to the
>> FeedableBodyGenerator
>>
>> The connection with the server is not good here, but in production it is
>> normally a lot better as far as I know.
>>
>>
>>
>> I've tried things
>> like clientTransport.getAsyncQueueIO().getWriter().setMaxPendingBytesPerConnection(100000);
>> but it doesn't seem to work for me.
>>
>> I'd like the Grizzly internals to block when there are too much pending
>> bytes to send. Is it possible?
>>
>>
>>
>>
>> PS: I've just been able to send a 500mo file with 100mo heap, but it
>> needed a sleep of 100ms between each 100k Buffer sent to the bodygenerator
>>
>>
>>
>>
>> 2013/8/29 S�bastien Lorber <lorber.sebastien_at_gmail.com>
>>
>>>
>>>
>>> By chance do you if I can remove the MessageCloner used in the SSL
>>> filter?
>>> SSLBaseFilter$OnWriteCopyCloner
>>>
>>> It seems to allocate a lot of memory.
>>> I don't really understand why messages have to be cloned, can I remove
>>> this? How?
>>>
>>>
>>> 2013/8/29 S�bastien Lorber <lorber.sebastien_at_gmail.com>
>>>
>>>>
>>>> I'm trying to send a 500m file for my tests with a heap of 400m.
>>>>
>>>> In our real use cases we would probably have files under 20mo but we
>>>> want to reduce the memory consumption because we can have x parallel
>>>> uploads on the same server according to the user activity.
>>>>
>>>> I'll try to check if using this BodyGenerator reduced the memory
>>>> footprint or if it's almost like before.
>>>>
>>>>
>>>> 2013/8/28 Ryan Lubke <ryan.lubke_at_oracle.com>
>>>>
>>>>> At this point in time, as far as the SSL buffer allocation is
>>>>> concerned, it's untunable.
>>>>>
>>>>> That said, feel free to open a feature request.
>>>>>
>>>>> As to your second question, there is no suggested size. This is all
>>>>> very application specific.
>>>>>
>>>>> I'm curious, how large of a file are you sending?
>>>>>
>>>>>
>>>>>
>>>>> S�bastien Lorber wrote:
>>>>>
>>>>> I have seen a lot of buffers which have a size of 33842 and it seems
>>>>> the limit is near half the capacity.
>>>>>
>>>>> Perhaps there's a way to tune that buffer size so that it consumes
>>>>> less memory?
>>>>> Is there an ideal Buffer size to send to the feed method?
>>>>>
>>>>>
>>>>> 2013/8/28 Ryan Lubke <ryan.lubke_at_oracle.com>
>>>>>
>>>>>> I'll be reviewing the PR today, thanks again!
>>>>>>
>>>>>> Regarding the OOM: as it stands now, for each new buffer that is
>>>>>> passed to the SSLFilter, we allocate a buffer twice the size in order to
>>>>>> accommodate the encrypted result. So there's an increase.
>>>>>>
>>>>>> Depending on the socket configurations of both endpoints, and how
>>>>>> fast the remote is reading data, it could
>>>>>> be the write queue is becoming too large. We do have a way to
>>>>>> detect this situation, but I'm pretty sure
>>>>>> the Grizzly internals are currently shielded here. I will see what I
>>>>>> can do to allow users to leverage this.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> S�bastien Lorber wrote:
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I've made my pull request.
>>>>>> https://github.com/AsyncHttpClient/async-http-client/pull/367
>>>>>>
>>>>>> With my usecase it works, the file is uploaded like before.
>>>>>>
>>>>>>
>>>>>>
>>>>>> But I didn't notice a big memory improvement.
>>>>>>
>>>>>> Is it possible that SSL doesn't allow to stream the body or something
>>>>>> like that?
>>>>>>
>>>>>>
>>>>>>
>>>>>> In memory, I have a lot of:
>>>>>> - HeapByteBuffer
>>>>>> Which are hold by SSLUtils$3
>>>>>> Which are hold by BufferBuffers
>>>>>> Which are hold by WriteResult
>>>>>> Which are hold by AsyncWriteQueueRecord
>>>>>>
>>>>>>
>>>>>> Here is an exemple of the OOM stacktrace:
>>>>>>
>>>>>> java.lang.OutOfMemoryError: Java heap space
>>>>>> at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
>>>>>> at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
>>>>>> at
>>>>>> org.glassfish.grizzly.ssl.SSLUtils.allocateOutputBuffer(SSLUtils.java:342)
>>>>>> at
>>>>>> org.glassfish.grizzly.ssl.SSLBaseFilter$2.grow(SSLBaseFilter.java:117)
>>>>>> at
>>>>>> org.glassfish.grizzly.ssl.SSLConnectionContext.ensureBufferSize(SSLConnectionContext.java:392)
>>>>>> at
>>>>>> org.glassfish.grizzly.ssl.SSLConnectionContext.wrap(SSLConnectionContext.java:272)
>>>>>> at
>>>>>> org.glassfish.grizzly.ssl.SSLConnectionContext.wrapAll(SSLConnectionContext.java:227)
>>>>>> at
>>>>>> org.glassfish.grizzly.ssl.SSLBaseFilter.wrapAll(SSLBaseFilter.java:404)
>>>>>> at
>>>>>> org.glassfish.grizzly.ssl.SSLBaseFilter.handleWrite(SSLBaseFilter.java:319)
>>>>>> at
>>>>>> org.glassfish.grizzly.ssl.SSLFilter.accurateWrite(SSLFilter.java:255)
>>>>>> at org.glassfish.grizzly.ssl.SSLFilter.handleWrite(SSLFilter.java:143)
>>>>>> at
>>>>>> com.ning.http.client.providers.grizzly.GrizzlyAsyncHttpProvider$SwitchingSSLFilter.handleWrite(GrizzlyAsyncHttpProvider.java:2503)
>>>>>> at
>>>>>> org.glassfish.grizzly.filterchain.ExecutorResolver$8.execute(ExecutorResolver.java:111)
>>>>>> at
>>>>>> org.glassfish.grizzly.filterchain.DefaultFilterChain.executeFilter(DefaultFilterChain.java:288)
>>>>>> at
>>>>>> org.glassfish.grizzly.filterchain.DefaultFilterChain.executeChainPart(DefaultFilterChain.java:206)
>>>>>> at
>>>>>> org.glassfish.grizzly.filterchain.DefaultFilterChain.execute(DefaultFilterChain.java:136)
>>>>>> at
>>>>>> org.glassfish.grizzly.filterchain.DefaultFilterChain.process(DefaultFilterChain.java:114)
>>>>>> at
>>>>>> org.glassfish.grizzly.ProcessorExecutor.execute(ProcessorExecutor.java:77)
>>>>>> at
>>>>>> org.glassfish.grizzly.filterchain.FilterChainContext.write(FilterChainContext.java:853)
>>>>>> at
>>>>>> org.glassfish.grizzly.filterchain.FilterChainContext.write(FilterChainContext.java:720)
>>>>>> at
>>>>>> com.ning.http.client.providers.grizzly.FeedableBodyGenerator.flushQueue(FeedableBodyGenerator.java:132)
>>>>>> at
>>>>>> com.ning.http.client.providers.grizzly.FeedableBodyGenerator.feed(FeedableBodyGenerator.java:101)
>>>>>> at
>>>>>> com.ning.http.client.providers.grizzly.MultipartBodyGeneratorFeeder$FeedBodyGeneratorOutputStream.write(MultipartBodyGeneratorFeeder.java:222)
>>>>>> at
>>>>>> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
>>>>>> at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
>>>>>> at com.ning.http.multipart.FilePart.sendData(FilePart.java:179)
>>>>>> at com.ning.http.multipart.Part.send(Part.java:331)
>>>>>> at com.ning.http.multipart.Part.sendParts(Part.java:397)
>>>>>> at
>>>>>> com.ning.http.client.providers.grizzly.MultipartBodyGeneratorFeeder.feed(MultipartBodyGeneratorFeeder.java:144)
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Any idea?
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2013/8/27 Ryan Lubke <ryan.lubke_at_oracle.com>
>>>>>>
>>>>>>> Excellent! Looking forward to the pull request!
>>>>>>>
>>>>>>>
>>>>>>> S�bastien Lorber wrote:
>>>>>>>
>>>>>>> Ryan thanks, it works fine, I'll make a pull request on AHC tomorrow
>>>>>>> with a better code using the same Part classes that already exist.
>>>>>>>
>>>>>>> I created an OutputStream that redirects to the BodyGenerator feeder.
>>>>>>>
>>>>>>> The problem I currently have is that the feeder feeds the queue
>>>>>>> faster than the async thread polling it :)
>>>>>>> I need to expose a limit to that queue size or something, will work
>>>>>>> on that, it will be better than a thread sleep to slow down the filepart
>>>>>>> reading
>>>>>>>
>>>>>>>
>>>>>>> 2013/8/27 Ryan Lubke <ryan.lubke_at_oracle.com>
>>>>>>>
>>>>>>>> Yes, something like that. I was going to tackle adding something
>>>>>>>> like this today. I'll follow up with something you can test out.
>>>>>>>>
>>>>>>>>
>>>>>>>> S�bastien Lorber wrote:
>>>>>>>>
>>>>>>>> Ok thanks!
>>>>>>>>
>>>>>>>> I think I see what I could do, probably something like that:
>>>>>>>>
>>>>>>>>
>>>>>>>> FeedableBodyGenerator bodyGenerator = new
>>>>>>>> FeedableBodyGenerator();
>>>>>>>> MultipartBodyGeneratorFeeder bodyGeneratorFeeder = new
>>>>>>>> MultipartBodyGeneratorFeeder(bodyGenerator);
>>>>>>>> Request uploadRequest1 = new RequestBuilder("POST")
>>>>>>>> .setUrl("url")
>>>>>>>> .setBody(bodyGenerator)
>>>>>>>> .build();
>>>>>>>>
>>>>>>>> ListenableFuture<Response> asyncRes = asyncHttpClient
>>>>>>>> .prepareRequest(uploadRequest1)
>>>>>>>> .execute(new AsyncCompletionHandlerBase());
>>>>>>>>
>>>>>>>>
>>>>>>>> bodyGeneratorFeeder.append("param1","value1");
>>>>>>>> bodyGeneratorFeeder.append("param2","value2");
>>>>>>>> bodyGeneratorFeeder.append("fileToUpload",fileInputStream);
>>>>>>>> bodyGeneratorFeeder.end();
>>>>>>>>
>>>>>>>> Response uploadResponse = asyncRes.get();
>>>>>>>>
>>>>>>>>
>>>>>>>> Does it seem ok to you?
>>>>>>>>
>>>>>>>> I guess it could be interesting to provide that
>>>>>>>> MultipartBodyGeneratorFeeder class to AHC or Grizzly since some other
>>>>>>>> people may want to achieve the same thing
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> 2013/8/26 Ryan Lubke <ryan.lubke_at_oracle.com>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> S�bastien Lorber wrote:
>>>>>>>>>
>>>>>>>>>> Hello,
>>>>>>>>>>
>>>>>>>>>> I would like to know if it's possible to upload a file with AHC /
>>>>>>>>>> Grizzly in streaming, I mean without loading the whole file bytes in memory.
>>>>>>>>>>
>>>>>>>>>> The default behavior seems to allocate a byte[] which contans the
>>>>>>>>>> whole file, so it means that my server can be OOM if too many users upload
>>>>>>>>>> a large file in the same time.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I've tryied with a Heap and ByteBuffer memory managers, with
>>>>>>>>>> reallocate=true/false but no more success.
>>>>>>>>>>
>>>>>>>>>> It seems the whole file content is appended wto the
>>>>>>>>>> BufferOutputStream, and then the underlying buffer is written.
>>>>>>>>>>
>>>>>>>>>> At least this seems to be the case with AHC integration:
>>>>>>>>>>
>>>>>>>>>> https://github.com/AsyncHttpClient/async-http-client/blob/6faf1f316e5546110b0779a5a42fd9d03ba6bc15/providers/grizzly/src/main/java/org/asynchttpclient/providers/grizzly/bodyhandler/PartsBodyHandler.java
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> So, is there a way to patch AHC to stream the file so that I
>>>>>>>>>> could eventually consume only 20mo of heap while uploading a 500mo file?
>>>>>>>>>> Or is this simply impossible with Grizzly?
>>>>>>>>>> I didn't notice anything related to that in the documentation.
>>>>>>>>>>
>>>>>>>>> It's possible with the FeedableBodyGenerator. But if you're tied
>>>>>>>>> to using Multipart uploads, you'd have to convert the multipart data to
>>>>>>>>> Buffers manually and send using the FeedableBodyGenerator.
>>>>>>>>> I'll take a closer look to see if this area can be improved.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Btw in my case it is a file upload. I receive a file with CXF and
>>>>>>>>>> have to transmit it to a storage server (like S3). CXF doesn't consume
>>>>>>>>>> memory bevause it is streaming the large fle uploads to the file system,
>>>>>>>>>> and then provides an input stream on that file.
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>