Re: DirectByteBufferRecord and HeapMemoryManager?

From: Oleksiy Stashok <oleksiy.stashok_at_oracle.com>
Date: Tue, 30 Dec 2014 18:22:03 -0800

Fixed.

Thank you.

On 29.12.14 13:26, Daniel Feist wrote:
> Done.
>
> https://java.net/jira/browse/GRIZZLY-1730
>
> Feel free to edit, if issue is not as clear as it could be..
>
> Dan
>
> On Mon, Dec 29, 2014 at 5:31 PM, Oleksiy Stashok
> <oleksiy.stashok_at_oracle.com> wrote:
>> Can you pls. file the issue, so we don't forget about this?
>>
>> Thank you.
>>
>> WBR,
>> Alexey.
>>
>>
>> On 29.12.14 11:44, Daniel Feist wrote:
>>> Yes, the relevant code lines are:
>>>
>>> Write (send):
>>> org.glassfish.grizzly.nio.transport.TCPNIOUtils#calcWriteBufferSize
>>> line 212
>>> Recieve:
>>> org.glassfish.grizzly.nio.transport.TCPNIOUtils#allocateAndReadBuffer
>>> line 227
>>>
>>> In the case where the http body is large (and chunking isn't used),
>>> and SocketChannel.getSendBufferSize() returns a large size (e.g. 12MB
>>> in my case) there is no way to limit the amount of direct memory used
>>> per thread. On the other hand, a maximum can be defined for the
>>> recieve buffer size.
>>>
>>> So in my case, I'm limiting the recieve buffer to 1MB, but without
>>> modifications to the TCP/IP stack of the host OS, a send buffer of
>>> 12MB will be used per thread if packet is 8MB+. Also the buffer used
>>> will always be greater than 1MB if packet is 670Kb+. There is no way
>>> to limit send buffer to 1MB also.
>>>
>>> Dan
>>>
>>>
>>>
>>>
>>> On Mon, Dec 29, 2014 at 4:29 PM, Oleksiy Stashok
>>> <oleksiy.stashok_at_oracle.com> wrote:
>>>> Hi,
>>>>
>>>> you mean 12M buffer will be allocated when you *send* a huge packet,
>>>> right?
>>>>
>>>> WBR,
>>>> Alexey.
>>>>
>>>>
>>>> On 29.12.14 11:18, Daniel Feist wrote:
>>>>> Hi again,
>>>>>
>>>>> Just a small follow up on this one:
>>>>>
>>>>> In the end I am both i) increasing amount of direct memory and ii)
>>>>> limiting the recieve buffer size to 1MB to avoid 12MB being used.
>>>>>
>>>>> One thing I noticed though, is while there is a system property to
>>>>> limit the receive buffer size, if I happen to send payloads of 8MB or
>>>>> more a direct buffer of 12MB will always be allocated per thread and
>>>>> there is no way to limit this.
>>>>>
>>>>> This isn't an immediate isssue for me because the kernal/selector
>>>>> threads do the sending (worker threads perform the recieve) and there
>>>>> are therefore less of them.. but it's something to be aware of..
>>>>>
>>>>> Dan
>>>>>
>>>>> On Tue, Dec 9, 2014 at 6:21 PM, Daniel Feist <dfeist_at_gmail.com> wrote:
>>>>>>>> A related question, if you have a moment: On my test enviroment the
>>>>>>>> connection object returns a recieveBufferSize of 12Mb so if I test
>>>>>>>> with high concurrency and I'm using the WorkerThreadIOStrategy with a
>>>>>>>> thread pool of 200 threads, does that mean that up tp 2.3Gb off-heap
>>>>>>>> memory will need to be allocated or am I jumping to conlcusions about
>>>>>>>> this relationship?
>>>>>>> well, it's possible if all threads read at the same time - you'll need
>>>>>>> 2G
>>>>>>> memory, that's why it's better to limit the receiveBufferSize either
>>>>>>> explicitly for each Connection or using system property (I believe you
>>>>>>> know
>>>>>>> which one).
>>>>>>> Another possible change is to reduce the number of threads or use
>>>>>>> SameThreadIOStrategy, if tasks you run are not blocking.
>>>>>> Yes, this is what I assumed, just confirming my assumptions. :-)
>>>>>>
>>>>>> thanks!
>>>>>>
>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>> WBR,
>>>>>>> Alexey.
>>>>>>>
>>>>>>>
>>>>>>>> I'll try the PooledMemoryManager for sure, thanks for the tip.
>>>>>>>>
>>>>>>>> Dan
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Dec 8, 2014 at 7:04 PM, Oleksiy Stashok
>>>>>>>> <oleksiy.stashok_at_oracle.com> wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> On 08.12.14 10:37, Daniel Feist wrote:
>>>>>>>>>> What I'm wondering is why the following exist:
>>>>>>>>>>
>>>>>>>>>> 1) TCPNIOUtils.java Line 230-246.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> (https://github.com/GrizzlyNIO/grizzly-mirror/blob/2.3.x/modules/grizzly/src/main/java/org/glassfish/grizzly/nio/transport/TCPNIOUtils.java#L230)
>>>>>>>>>>
>>>>>>>>>> Because if a non-direct memoryManager has been chosen I'm not sure
>>>>>>>>>> why
>>>>>>>>>> that choice needs to be overridden and a direct buffer used anyway
>>>>>>>>>> as
>>>>>>>>>> an intermediate step.
>>>>>>>>> Pls. take a look at the JDK code here [1] (line 195)
>>>>>>>>>
>>>>>>>>> if the passed ByteBuffer is not direct ByteBuffer - JDK will do the
>>>>>>>>> same
>>>>>>>>> "intermediate step" - allocate direct ByteBuffer, use it for
>>>>>>>>> reading,
>>>>>>>>> copy
>>>>>>>>> data to our heap ByteBuffer.
>>>>>>>>>
>>>>>>>>> We could've used that, but in that case we have to guess the read
>>>>>>>>> size
>>>>>>>>> and
>>>>>>>>> do something like this:
>>>>>>>>>
>>>>>>>>> 1. memoryManager.allocate(large_chunk);
>>>>>>>>> 2. read to the allocated heap ByteBuffer
>>>>>>>>> 2.1. JDK allocates direct ByteBuffer of size large_chunk
>>>>>>>>> 2.2 read data to the direct ByteBuffer
>>>>>>>>> 2.3 copy direct ByteBuffer data to our heap ByteBuffer
>>>>>>>>> 3. release unused part of ByteBuffer back to MemoryManager (if any)
>>>>>>>>>
>>>>>>>>> Instead of that, we use large enough direct ByteBuffer, read data
>>>>>>>>> directly
>>>>>>>>> to this ByteBuffer (JDK doesn't use intermediate ByteBuffer in that
>>>>>>>>> case).
>>>>>>>>> After we read to the direct ByteBuffer we know exactly how many
>>>>>>>>> bytes
>>>>>>>>> we
>>>>>>>>> need to allocate from the MemoryManager.
>>>>>>>>> So we just reshuffled the steps sequence above and have this:
>>>>>>>>>
>>>>>>>>> 1. allocate direct ByteBuffer of size large_chunk
>>>>>>>>> 2. read to the allocated direct ByteBuffer (in this case JDK doesn't
>>>>>>>>> do
>>>>>>>>> intermediate allocation step)
>>>>>>>>> 3. memoryManager.allocate(read_bytes_count) // we know how many
>>>>>>>>> bytes
>>>>>>>>> we
>>>>>>>>> read
>>>>>>>>> 4. copy direct ByteBuffer to allocated heap ByteBuffer
>>>>>>>>>
>>>>>>>>> So, by reshuffling the direct ByteBuffer allocation we're able to
>>>>>>>>> optimize
>>>>>>>>> read path.
>>>>>>>>>
>>>>>>>>> Makes sense?
>>>>>>>>>
>>>>>>>>> Thanks.
>>>>>>>>>
>>>>>>>>> WBR,
>>>>>>>>> Alexey.
>>>>>>>>>
>>>>>>>>> [1]
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8-b132/sun/nio/ch/IOUtil.java#IOUtil.read%28java.io.FileDescriptor%2Cjava.nio.ByteBuffer%2Clong%2Csun.nio.ch.NativeDispatcher%29
>>>>>>>>>
>>>>>>>>>> 2) DirectByteBufferRecord
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> (https://github.com/GrizzlyNIO/grizzly-mirror/blob/2.3.x/modules/grizzly/src/main/java/org/glassfish/grizzly/nio/DirectByteBufferRecord.java#L54)
>>>>>>>>>>
>>>>>>>>>> This is allocating direct buffers, and also caching them
>>>>>>>>>> per-thread,
>>>>>>>>>> yet it's not a MemoryManager implementation, it's something
>>>>>>>>>> different.
>>>>>>>>>> Is this just old/legacy?
>>>>>>>>>>
>>>>>>>>>> Dan
>>>>>>>>>>
>>>>>>>>>> On Mon, Dec 8, 2014 at 6:03 PM, Oleksiy Stashok
>>>>>>>>>> <oleksiy.stashok_at_oracle.com> wrote:
>>>>>>>>>>> Hi Daniel,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 08.12.14 09:32, Daniel Feist wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I see there is a system property I can use to limit maximum size
>>>>>>>>>>>> of
>>>>>>>>>>>> these direct bufffers and thus avoid the OutOfMemoryExceptions,
>>>>>>>>>>>> but
>>>>>>>>>>>> I'm wondering why the MemoryManager is explicitlu being bypassed
>>>>>>>>>>>> here
>>>>>>>>>>>> rather than simply being used? This also means there are two
>>>>>>>>>>>> allocations and reads per request and not just one. Can anyone
>>>>>>>>>>>> shed
>>>>>>>>>>>> some light?
>>>>>>>>>>> Well, if you pass HeapByteBuffer to a SocketChannel - it'll do the
>>>>>>>>>>> same
>>>>>>>>>>> underneath - allocate (or take pooled) direct ByteBuffer and use
>>>>>>>>>>> it
>>>>>>>>>>> for
>>>>>>>>>>> reading.
>>>>>>>>>>> So we basically do the same in our code and passing direct
>>>>>>>>>>> ByteBuffer
>>>>>>>>>>> to
>>>>>>>>>>> a
>>>>>>>>>>> SocketChannel, so SocketChannel itself will not allocate direct
>>>>>>>>>>> ByteBuffer.
>>>>>>>>>>>
>>>>>>>>>>> This approach gives us one advantage - once we read to the direct
>>>>>>>>>>> ByteBuffer
>>>>>>>>>>> - we know the exact amount of bytes we need to allocate from the
>>>>>>>>>>> MemoryManager (no guessing).
>>>>>>>>>>>
>>>>>>>>>>> Hope it will help.
>>>>>>>>>>>
>>>>>>>>>>> WBR,
>>>>>>>>>>> Alexey.
>>>>>>>>>>>
>>>>>>>>>>> PS: Pls. give a shot to PooledMemoryManager, it can work with
>>>>>>>>>>> direct
>>>>>>>>>>> and
>>>>>>>>>>> heap buffers and it performed well on our tests.
>>>>>>>>>>>
>>>>>>>>>>>