Re: DirectByteBufferRecord and HeapMemoryManager?

From: Oleksiy Stashok <oleksiy.stashok_at_oracle.com>
Date: Mon, 29 Dec 2014 12:31:52 -0800

Can you pls. file the issue, so we don't forget about this?

Thank you.

WBR,
Alexey.

On 29.12.14 11:44, Daniel Feist wrote:
> Yes, the relevant code lines are:
>
> Write (send): org.glassfish.grizzly.nio.transport.TCPNIOUtils#calcWriteBufferSize
> line 212
> Recieve:
> org.glassfish.grizzly.nio.transport.TCPNIOUtils#allocateAndReadBuffer
> line 227
>
> In the case where the http body is large (and chunking isn't used),
> and SocketChannel.getSendBufferSize() returns a large size (e.g. 12MB
> in my case) there is no way to limit the amount of direct memory used
> per thread. On the other hand, a maximum can be defined for the
> recieve buffer size.
>
> So in my case, I'm limiting the recieve buffer to 1MB, but without
> modifications to the TCP/IP stack of the host OS, a send buffer of
> 12MB will be used per thread if packet is 8MB+. Also the buffer used
> will always be greater than 1MB if packet is 670Kb+. There is no way
> to limit send buffer to 1MB also.
>
> Dan
>
>
>
>
> On Mon, Dec 29, 2014 at 4:29 PM, Oleksiy Stashok
> <oleksiy.stashok_at_oracle.com> wrote:
>> Hi,
>>
>> you mean 12M buffer will be allocated when you *send* a huge packet, right?
>>
>> WBR,
>> Alexey.
>>
>>
>> On 29.12.14 11:18, Daniel Feist wrote:
>>> Hi again,
>>>
>>> Just a small follow up on this one:
>>>
>>> In the end I am both i) increasing amount of direct memory and ii)
>>> limiting the recieve buffer size to 1MB to avoid 12MB being used.
>>>
>>> One thing I noticed though, is while there is a system property to
>>> limit the receive buffer size, if I happen to send payloads of 8MB or
>>> more a direct buffer of 12MB will always be allocated per thread and
>>> there is no way to limit this.
>>>
>>> This isn't an immediate isssue for me because the kernal/selector
>>> threads do the sending (worker threads perform the recieve) and there
>>> are therefore less of them.. but it's something to be aware of..
>>>
>>> Dan
>>>
>>> On Tue, Dec 9, 2014 at 6:21 PM, Daniel Feist <dfeist_at_gmail.com> wrote:
>>>>>> A related question, if you have a moment: On my test enviroment the
>>>>>> connection object returns a recieveBufferSize of 12Mb so if I test
>>>>>> with high concurrency and I'm using the WorkerThreadIOStrategy with a
>>>>>> thread pool of 200 threads, does that mean that up tp 2.3Gb off-heap
>>>>>> memory will need to be allocated or am I jumping to conlcusions about
>>>>>> this relationship?
>>>>> well, it's possible if all threads read at the same time - you'll need
>>>>> 2G
>>>>> memory, that's why it's better to limit the receiveBufferSize either
>>>>> explicitly for each Connection or using system property (I believe you
>>>>> know
>>>>> which one).
>>>>> Another possible change is to reduce the number of threads or use
>>>>> SameThreadIOStrategy, if tasks you run are not blocking.
>>>> Yes, this is what I assumed, just confirming my assumptions. :-)
>>>>
>>>> thanks!
>>>>
>>>>
>>>>> Thanks.
>>>>>
>>>>> WBR,
>>>>> Alexey.
>>>>>
>>>>>
>>>>>> I'll try the PooledMemoryManager for sure, thanks for the tip.
>>>>>>
>>>>>> Dan
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Dec 8, 2014 at 7:04 PM, Oleksiy Stashok
>>>>>> <oleksiy.stashok_at_oracle.com> wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> On 08.12.14 10:37, Daniel Feist wrote:
>>>>>>>> What I'm wondering is why the following exist:
>>>>>>>>
>>>>>>>> 1) TCPNIOUtils.java Line 230-246.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> (https://github.com/GrizzlyNIO/grizzly-mirror/blob/2.3.x/modules/grizzly/src/main/java/org/glassfish/grizzly/nio/transport/TCPNIOUtils.java#L230)
>>>>>>>>
>>>>>>>> Because if a non-direct memoryManager has been chosen I'm not sure
>>>>>>>> why
>>>>>>>> that choice needs to be overridden and a direct buffer used anyway as
>>>>>>>> an intermediate step.
>>>>>>> Pls. take a look at the JDK code here [1] (line 195)
>>>>>>>
>>>>>>> if the passed ByteBuffer is not direct ByteBuffer - JDK will do the
>>>>>>> same
>>>>>>> "intermediate step" - allocate direct ByteBuffer, use it for reading,
>>>>>>> copy
>>>>>>> data to our heap ByteBuffer.
>>>>>>>
>>>>>>> We could've used that, but in that case we have to guess the read size
>>>>>>> and
>>>>>>> do something like this:
>>>>>>>
>>>>>>> 1. memoryManager.allocate(large_chunk);
>>>>>>> 2. read to the allocated heap ByteBuffer
>>>>>>> 2.1. JDK allocates direct ByteBuffer of size large_chunk
>>>>>>> 2.2 read data to the direct ByteBuffer
>>>>>>> 2.3 copy direct ByteBuffer data to our heap ByteBuffer
>>>>>>> 3. release unused part of ByteBuffer back to MemoryManager (if any)
>>>>>>>
>>>>>>> Instead of that, we use large enough direct ByteBuffer, read data
>>>>>>> directly
>>>>>>> to this ByteBuffer (JDK doesn't use intermediate ByteBuffer in that
>>>>>>> case).
>>>>>>> After we read to the direct ByteBuffer we know exactly how many bytes
>>>>>>> we
>>>>>>> need to allocate from the MemoryManager.
>>>>>>> So we just reshuffled the steps sequence above and have this:
>>>>>>>
>>>>>>> 1. allocate direct ByteBuffer of size large_chunk
>>>>>>> 2. read to the allocated direct ByteBuffer (in this case JDK doesn't
>>>>>>> do
>>>>>>> intermediate allocation step)
>>>>>>> 3. memoryManager.allocate(read_bytes_count) // we know how many bytes
>>>>>>> we
>>>>>>> read
>>>>>>> 4. copy direct ByteBuffer to allocated heap ByteBuffer
>>>>>>>
>>>>>>> So, by reshuffling the direct ByteBuffer allocation we're able to
>>>>>>> optimize
>>>>>>> read path.
>>>>>>>
>>>>>>> Makes sense?
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>> WBR,
>>>>>>> Alexey.
>>>>>>>
>>>>>>> [1]
>>>>>>>
>>>>>>>
>>>>>>> http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8-b132/sun/nio/ch/IOUtil.java#IOUtil.read%28java.io.FileDescriptor%2Cjava.nio.ByteBuffer%2Clong%2Csun.nio.ch.NativeDispatcher%29
>>>>>>>
>>>>>>>> 2) DirectByteBufferRecord
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> (https://github.com/GrizzlyNIO/grizzly-mirror/blob/2.3.x/modules/grizzly/src/main/java/org/glassfish/grizzly/nio/DirectByteBufferRecord.java#L54)
>>>>>>>>
>>>>>>>> This is allocating direct buffers, and also caching them per-thread,
>>>>>>>> yet it's not a MemoryManager implementation, it's something
>>>>>>>> different.
>>>>>>>> Is this just old/legacy?
>>>>>>>>
>>>>>>>> Dan
>>>>>>>>
>>>>>>>> On Mon, Dec 8, 2014 at 6:03 PM, Oleksiy Stashok
>>>>>>>> <oleksiy.stashok_at_oracle.com> wrote:
>>>>>>>>> Hi Daniel,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 08.12.14 09:32, Daniel Feist wrote:
>>>>>>>>>
>>>>>>>>>> I see there is a system property I can use to limit maximum size of
>>>>>>>>>> these direct bufffers and thus avoid the OutOfMemoryExceptions, but
>>>>>>>>>> I'm wondering why the MemoryManager is explicitlu being bypassed
>>>>>>>>>> here
>>>>>>>>>> rather than simply being used? This also means there are two
>>>>>>>>>> allocations and reads per request and not just one. Can anyone
>>>>>>>>>> shed
>>>>>>>>>> some light?
>>>>>>>>> Well, if you pass HeapByteBuffer to a SocketChannel - it'll do the
>>>>>>>>> same
>>>>>>>>> underneath - allocate (or take pooled) direct ByteBuffer and use it
>>>>>>>>> for
>>>>>>>>> reading.
>>>>>>>>> So we basically do the same in our code and passing direct
>>>>>>>>> ByteBuffer
>>>>>>>>> to
>>>>>>>>> a
>>>>>>>>> SocketChannel, so SocketChannel itself will not allocate direct
>>>>>>>>> ByteBuffer.
>>>>>>>>>
>>>>>>>>> This approach gives us one advantage - once we read to the direct
>>>>>>>>> ByteBuffer
>>>>>>>>> - we know the exact amount of bytes we need to allocate from the
>>>>>>>>> MemoryManager (no guessing).
>>>>>>>>>
>>>>>>>>> Hope it will help.
>>>>>>>>>
>>>>>>>>> WBR,
>>>>>>>>> Alexey.
>>>>>>>>>
>>>>>>>>> PS: Pls. give a shot to PooledMemoryManager, it can work with direct
>>>>>>>>> and
>>>>>>>>> heap buffers and it performed well on our tests.
>>>>>>>>>
>>>>>>>>>