Re: DirectByteBufferRecord and HeapMemoryManager?

From: Daniel Feist <dfeist_at_gmail.com>
Date: Mon, 29 Dec 2014 16:44:50 -0300

Yes, the relevant code lines are:

Write (send): org.glassfish.grizzly.nio.transport.TCPNIOUtils#calcWriteBufferSize
line 212
Recieve:
org.glassfish.grizzly.nio.transport.TCPNIOUtils#allocateAndReadBuffer
line 227

In the case where the http body is large (and chunking isn't used),
and SocketChannel.getSendBufferSize() returns a large size (e.g. 12MB
in my case) there is no way to limit the amount of direct memory used
per thread. On the other hand, a maximum can be defined for the
recieve buffer size.

So in my case, I'm limiting the recieve buffer to 1MB, but without
modifications to the TCP/IP stack of the host OS, a send buffer of
12MB will be used per thread if packet is 8MB+. Also the buffer used
will always be greater than 1MB if packet is 670Kb+. There is no way
to limit send buffer to 1MB also.

Dan

On Mon, Dec 29, 2014 at 4:29 PM, Oleksiy Stashok
<oleksiy.stashok_at_oracle.com> wrote:
> Hi,
>
> you mean 12M buffer will be allocated when you *send* a huge packet, right?
>
> WBR,
> Alexey.
>
>
> On 29.12.14 11:18, Daniel Feist wrote:
>>
>> Hi again,
>>
>> Just a small follow up on this one:
>>
>> In the end I am both i) increasing amount of direct memory and ii)
>> limiting the recieve buffer size to 1MB to avoid 12MB being used.
>>
>> One thing I noticed though, is while there is a system property to
>> limit the receive buffer size, if I happen to send payloads of 8MB or
>> more a direct buffer of 12MB will always be allocated per thread and
>> there is no way to limit this.
>>
>> This isn't an immediate isssue for me because the kernal/selector
>> threads do the sending (worker threads perform the recieve) and there
>> are therefore less of them.. but it's something to be aware of..
>>
>> Dan
>>
>> On Tue, Dec 9, 2014 at 6:21 PM, Daniel Feist <dfeist_at_gmail.com> wrote:
>>>>>
>>>>> A related question, if you have a moment: On my test enviroment the
>>>>> connection object returns a recieveBufferSize of 12Mb so if I test
>>>>> with high concurrency and I'm using the WorkerThreadIOStrategy with a
>>>>> thread pool of 200 threads, does that mean that up tp 2.3Gb off-heap
>>>>> memory will need to be allocated or am I jumping to conlcusions about
>>>>> this relationship?
>>>>
>>>> well, it's possible if all threads read at the same time - you'll need
>>>> 2G
>>>> memory, that's why it's better to limit the receiveBufferSize either
>>>> explicitly for each Connection or using system property (I believe you
>>>> know
>>>> which one).
>>>> Another possible change is to reduce the number of threads or use
>>>> SameThreadIOStrategy, if tasks you run are not blocking.
>>>
>>> Yes, this is what I assumed, just confirming my assumptions. :-)
>>>
>>> thanks!
>>>
>>>
>>>>
>>>> Thanks.
>>>>
>>>> WBR,
>>>> Alexey.
>>>>
>>>>
>>>>> I'll try the PooledMemoryManager for sure, thanks for the tip.
>>>>>
>>>>> Dan
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Dec 8, 2014 at 7:04 PM, Oleksiy Stashok
>>>>> <oleksiy.stashok_at_oracle.com> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> On 08.12.14 10:37, Daniel Feist wrote:
>>>>>>>
>>>>>>> What I'm wondering is why the following exist:
>>>>>>>
>>>>>>> 1) TCPNIOUtils.java Line 230-246.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> (https://github.com/GrizzlyNIO/grizzly-mirror/blob/2.3.x/modules/grizzly/src/main/java/org/glassfish/grizzly/nio/transport/TCPNIOUtils.java#L230)
>>>>>>>
>>>>>>> Because if a non-direct memoryManager has been chosen I'm not sure
>>>>>>> why
>>>>>>> that choice needs to be overridden and a direct buffer used anyway as
>>>>>>> an intermediate step.
>>>>>>
>>>>>> Pls. take a look at the JDK code here [1] (line 195)
>>>>>>
>>>>>> if the passed ByteBuffer is not direct ByteBuffer - JDK will do the
>>>>>> same
>>>>>> "intermediate step" - allocate direct ByteBuffer, use it for reading,
>>>>>> copy
>>>>>> data to our heap ByteBuffer.
>>>>>>
>>>>>> We could've used that, but in that case we have to guess the read size
>>>>>> and
>>>>>> do something like this:
>>>>>>
>>>>>> 1. memoryManager.allocate(large_chunk);
>>>>>> 2. read to the allocated heap ByteBuffer
>>>>>> 2.1. JDK allocates direct ByteBuffer of size large_chunk
>>>>>> 2.2 read data to the direct ByteBuffer
>>>>>> 2.3 copy direct ByteBuffer data to our heap ByteBuffer
>>>>>> 3. release unused part of ByteBuffer back to MemoryManager (if any)
>>>>>>
>>>>>> Instead of that, we use large enough direct ByteBuffer, read data
>>>>>> directly
>>>>>> to this ByteBuffer (JDK doesn't use intermediate ByteBuffer in that
>>>>>> case).
>>>>>> After we read to the direct ByteBuffer we know exactly how many bytes
>>>>>> we
>>>>>> need to allocate from the MemoryManager.
>>>>>> So we just reshuffled the steps sequence above and have this:
>>>>>>
>>>>>> 1. allocate direct ByteBuffer of size large_chunk
>>>>>> 2. read to the allocated direct ByteBuffer (in this case JDK doesn't
>>>>>> do
>>>>>> intermediate allocation step)
>>>>>> 3. memoryManager.allocate(read_bytes_count) // we know how many bytes
>>>>>> we
>>>>>> read
>>>>>> 4. copy direct ByteBuffer to allocated heap ByteBuffer
>>>>>>
>>>>>> So, by reshuffling the direct ByteBuffer allocation we're able to
>>>>>> optimize
>>>>>> read path.
>>>>>>
>>>>>> Makes sense?
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>> WBR,
>>>>>> Alexey.
>>>>>>
>>>>>> [1]
>>>>>>
>>>>>>
>>>>>> http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8-b132/sun/nio/ch/IOUtil.java#IOUtil.read%28java.io.FileDescriptor%2Cjava.nio.ByteBuffer%2Clong%2Csun.nio.ch.NativeDispatcher%29
>>>>>>
>>>>>>> 2) DirectByteBufferRecord
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> (https://github.com/GrizzlyNIO/grizzly-mirror/blob/2.3.x/modules/grizzly/src/main/java/org/glassfish/grizzly/nio/DirectByteBufferRecord.java#L54)
>>>>>>>
>>>>>>> This is allocating direct buffers, and also caching them per-thread,
>>>>>>> yet it's not a MemoryManager implementation, it's something
>>>>>>> different.
>>>>>>> Is this just old/legacy?
>>>>>>>
>>>>>>> Dan
>>>>>>>
>>>>>>> On Mon, Dec 8, 2014 at 6:03 PM, Oleksiy Stashok
>>>>>>> <oleksiy.stashok_at_oracle.com> wrote:
>>>>>>>>
>>>>>>>> Hi Daniel,
>>>>>>>>
>>>>>>>>
>>>>>>>> On 08.12.14 09:32, Daniel Feist wrote:
>>>>>>>>
>>>>>>>>> I see there is a system property I can use to limit maximum size of
>>>>>>>>> these direct bufffers and thus avoid the OutOfMemoryExceptions, but
>>>>>>>>> I'm wondering why the MemoryManager is explicitlu being bypassed
>>>>>>>>> here
>>>>>>>>> rather than simply being used? This also means there are two
>>>>>>>>> allocations and reads per request and not just one. Can anyone
>>>>>>>>> shed
>>>>>>>>> some light?
>>>>>>>>
>>>>>>>> Well, if you pass HeapByteBuffer to a SocketChannel - it'll do the
>>>>>>>> same
>>>>>>>> underneath - allocate (or take pooled) direct ByteBuffer and use it
>>>>>>>> for
>>>>>>>> reading.
>>>>>>>> So we basically do the same in our code and passing direct
>>>>>>>> ByteBuffer
>>>>>>>> to
>>>>>>>> a
>>>>>>>> SocketChannel, so SocketChannel itself will not allocate direct
>>>>>>>> ByteBuffer.
>>>>>>>>
>>>>>>>> This approach gives us one advantage - once we read to the direct
>>>>>>>> ByteBuffer
>>>>>>>> - we know the exact amount of bytes we need to allocate from the
>>>>>>>> MemoryManager (no guessing).
>>>>>>>>
>>>>>>>> Hope it will help.
>>>>>>>>
>>>>>>>> WBR,
>>>>>>>> Alexey.
>>>>>>>>
>>>>>>>> PS: Pls. give a shot to PooledMemoryManager, it can work with direct
>>>>>>>> and
>>>>>>>> heap buffers and it performed well on our tests.
>>>>>>>>
>>>>>>>>
>