Re: DirectByteBufferRecord and HeapMemoryManager?

From: Daniel Feist <dfeist_at_gmail.com>
Date: Mon, 29 Dec 2014 18:26:59 -0300

Done.

https://java.net/jira/browse/GRIZZLY-1730

Feel free to edit, if issue is not as clear as it could be..

Dan

On Mon, Dec 29, 2014 at 5:31 PM, Oleksiy Stashok
<oleksiy.stashok_at_oracle.com> wrote:
> Can you pls. file the issue, so we don't forget about this?
>
> Thank you.
>
> WBR,
> Alexey.
>
>
> On 29.12.14 11:44, Daniel Feist wrote:
>>
>> Yes, the relevant code lines are:
>>
>> Write (send):
>> org.glassfish.grizzly.nio.transport.TCPNIOUtils#calcWriteBufferSize
>> line 212
>> Recieve:
>> org.glassfish.grizzly.nio.transport.TCPNIOUtils#allocateAndReadBuffer
>> line 227
>>
>> In the case where the http body is large (and chunking isn't used),
>> and SocketChannel.getSendBufferSize() returns a large size (e.g. 12MB
>> in my case) there is no way to limit the amount of direct memory used
>> per thread. On the other hand, a maximum can be defined for the
>> recieve buffer size.
>>
>> So in my case, I'm limiting the recieve buffer to 1MB, but without
>> modifications to the TCP/IP stack of the host OS, a send buffer of
>> 12MB will be used per thread if packet is 8MB+. Also the buffer used
>> will always be greater than 1MB if packet is 670Kb+. There is no way
>> to limit send buffer to 1MB also.
>>
>> Dan
>>
>>
>>
>>
>> On Mon, Dec 29, 2014 at 4:29 PM, Oleksiy Stashok
>> <oleksiy.stashok_at_oracle.com> wrote:
>>>
>>> Hi,
>>>
>>> you mean 12M buffer will be allocated when you *send* a huge packet,
>>> right?
>>>
>>> WBR,
>>> Alexey.
>>>
>>>
>>> On 29.12.14 11:18, Daniel Feist wrote:
>>>>
>>>> Hi again,
>>>>
>>>> Just a small follow up on this one:
>>>>
>>>> In the end I am both i) increasing amount of direct memory and ii)
>>>> limiting the recieve buffer size to 1MB to avoid 12MB being used.
>>>>
>>>> One thing I noticed though, is while there is a system property to
>>>> limit the receive buffer size, if I happen to send payloads of 8MB or
>>>> more a direct buffer of 12MB will always be allocated per thread and
>>>> there is no way to limit this.
>>>>
>>>> This isn't an immediate isssue for me because the kernal/selector
>>>> threads do the sending (worker threads perform the recieve) and there
>>>> are therefore less of them.. but it's something to be aware of..
>>>>
>>>> Dan
>>>>
>>>> On Tue, Dec 9, 2014 at 6:21 PM, Daniel Feist <dfeist_at_gmail.com> wrote:
>>>>>>>
>>>>>>> A related question, if you have a moment: On my test enviroment the
>>>>>>> connection object returns a recieveBufferSize of 12Mb so if I test
>>>>>>> with high concurrency and I'm using the WorkerThreadIOStrategy with a
>>>>>>> thread pool of 200 threads, does that mean that up tp 2.3Gb off-heap
>>>>>>> memory will need to be allocated or am I jumping to conlcusions about
>>>>>>> this relationship?
>>>>>>
>>>>>> well, it's possible if all threads read at the same time - you'll need
>>>>>> 2G
>>>>>> memory, that's why it's better to limit the receiveBufferSize either
>>>>>> explicitly for each Connection or using system property (I believe you
>>>>>> know
>>>>>> which one).
>>>>>> Another possible change is to reduce the number of threads or use
>>>>>> SameThreadIOStrategy, if tasks you run are not blocking.
>>>>>
>>>>> Yes, this is what I assumed, just confirming my assumptions. :-)
>>>>>
>>>>> thanks!
>>>>>
>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>> WBR,
>>>>>> Alexey.
>>>>>>
>>>>>>
>>>>>>> I'll try the PooledMemoryManager for sure, thanks for the tip.
>>>>>>>
>>>>>>> Dan
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Dec 8, 2014 at 7:04 PM, Oleksiy Stashok
>>>>>>> <oleksiy.stashok_at_oracle.com> wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> On 08.12.14 10:37, Daniel Feist wrote:
>>>>>>>>>
>>>>>>>>> What I'm wondering is why the following exist:
>>>>>>>>>
>>>>>>>>> 1) TCPNIOUtils.java Line 230-246.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> (https://github.com/GrizzlyNIO/grizzly-mirror/blob/2.3.x/modules/grizzly/src/main/java/org/glassfish/grizzly/nio/transport/TCPNIOUtils.java#L230)
>>>>>>>>>
>>>>>>>>> Because if a non-direct memoryManager has been chosen I'm not sure
>>>>>>>>> why
>>>>>>>>> that choice needs to be overridden and a direct buffer used anyway
>>>>>>>>> as
>>>>>>>>> an intermediate step.
>>>>>>>>
>>>>>>>> Pls. take a look at the JDK code here [1] (line 195)
>>>>>>>>
>>>>>>>> if the passed ByteBuffer is not direct ByteBuffer - JDK will do the
>>>>>>>> same
>>>>>>>> "intermediate step" - allocate direct ByteBuffer, use it for
>>>>>>>> reading,
>>>>>>>> copy
>>>>>>>> data to our heap ByteBuffer.
>>>>>>>>
>>>>>>>> We could've used that, but in that case we have to guess the read
>>>>>>>> size
>>>>>>>> and
>>>>>>>> do something like this:
>>>>>>>>
>>>>>>>> 1. memoryManager.allocate(large_chunk);
>>>>>>>> 2. read to the allocated heap ByteBuffer
>>>>>>>> 2.1. JDK allocates direct ByteBuffer of size large_chunk
>>>>>>>> 2.2 read data to the direct ByteBuffer
>>>>>>>> 2.3 copy direct ByteBuffer data to our heap ByteBuffer
>>>>>>>> 3. release unused part of ByteBuffer back to MemoryManager (if any)
>>>>>>>>
>>>>>>>> Instead of that, we use large enough direct ByteBuffer, read data
>>>>>>>> directly
>>>>>>>> to this ByteBuffer (JDK doesn't use intermediate ByteBuffer in that
>>>>>>>> case).
>>>>>>>> After we read to the direct ByteBuffer we know exactly how many
>>>>>>>> bytes
>>>>>>>> we
>>>>>>>> need to allocate from the MemoryManager.
>>>>>>>> So we just reshuffled the steps sequence above and have this:
>>>>>>>>
>>>>>>>> 1. allocate direct ByteBuffer of size large_chunk
>>>>>>>> 2. read to the allocated direct ByteBuffer (in this case JDK doesn't
>>>>>>>> do
>>>>>>>> intermediate allocation step)
>>>>>>>> 3. memoryManager.allocate(read_bytes_count) // we know how many
>>>>>>>> bytes
>>>>>>>> we
>>>>>>>> read
>>>>>>>> 4. copy direct ByteBuffer to allocated heap ByteBuffer
>>>>>>>>
>>>>>>>> So, by reshuffling the direct ByteBuffer allocation we're able to
>>>>>>>> optimize
>>>>>>>> read path.
>>>>>>>>
>>>>>>>> Makes sense?
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>> WBR,
>>>>>>>> Alexey.
>>>>>>>>
>>>>>>>> [1]
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8-b132/sun/nio/ch/IOUtil.java#IOUtil.read%28java.io.FileDescriptor%2Cjava.nio.ByteBuffer%2Clong%2Csun.nio.ch.NativeDispatcher%29
>>>>>>>>
>>>>>>>>> 2) DirectByteBufferRecord
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> (https://github.com/GrizzlyNIO/grizzly-mirror/blob/2.3.x/modules/grizzly/src/main/java/org/glassfish/grizzly/nio/DirectByteBufferRecord.java#L54)
>>>>>>>>>
>>>>>>>>> This is allocating direct buffers, and also caching them
>>>>>>>>> per-thread,
>>>>>>>>> yet it's not a MemoryManager implementation, it's something
>>>>>>>>> different.
>>>>>>>>> Is this just old/legacy?
>>>>>>>>>
>>>>>>>>> Dan
>>>>>>>>>
>>>>>>>>> On Mon, Dec 8, 2014 at 6:03 PM, Oleksiy Stashok
>>>>>>>>> <oleksiy.stashok_at_oracle.com> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi Daniel,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 08.12.14 09:32, Daniel Feist wrote:
>>>>>>>>>>
>>>>>>>>>>> I see there is a system property I can use to limit maximum size
>>>>>>>>>>> of
>>>>>>>>>>> these direct bufffers and thus avoid the OutOfMemoryExceptions,
>>>>>>>>>>> but
>>>>>>>>>>> I'm wondering why the MemoryManager is explicitlu being bypassed
>>>>>>>>>>> here
>>>>>>>>>>> rather than simply being used? This also means there are two
>>>>>>>>>>> allocations and reads per request and not just one. Can anyone
>>>>>>>>>>> shed
>>>>>>>>>>> some light?
>>>>>>>>>>
>>>>>>>>>> Well, if you pass HeapByteBuffer to a SocketChannel - it'll do the
>>>>>>>>>> same
>>>>>>>>>> underneath - allocate (or take pooled) direct ByteBuffer and use
>>>>>>>>>> it
>>>>>>>>>> for
>>>>>>>>>> reading.
>>>>>>>>>> So we basically do the same in our code and passing direct
>>>>>>>>>> ByteBuffer
>>>>>>>>>> to
>>>>>>>>>> a
>>>>>>>>>> SocketChannel, so SocketChannel itself will not allocate direct
>>>>>>>>>> ByteBuffer.
>>>>>>>>>>
>>>>>>>>>> This approach gives us one advantage - once we read to the direct
>>>>>>>>>> ByteBuffer
>>>>>>>>>> - we know the exact amount of bytes we need to allocate from the
>>>>>>>>>> MemoryManager (no guessing).
>>>>>>>>>>
>>>>>>>>>> Hope it will help.
>>>>>>>>>>
>>>>>>>>>> WBR,
>>>>>>>>>> Alexey.
>>>>>>>>>>
>>>>>>>>>> PS: Pls. give a shot to PooledMemoryManager, it can work with
>>>>>>>>>> direct
>>>>>>>>>> and
>>>>>>>>>> heap buffers and it performed well on our tests.
>>>>>>>>>>
>>>>>>>>>>
>