RE: Grizzly in Hadoop

From: Devaraj Das <ddas_at_yahoo-inc.com>
Date: Thu, 14 Jun 2007 14:46:22 +0530

By the way, one thing that is unclear to me is how the does the pipeline
threads interact with the read selector threads. So, for example, if I do
SelectorThread.setMaxThreads(100) &
SelectorThread.setSelectorReadThreadsCount(10),
1) would it mean that the all the 10 selector threads dumps data into that
one pipeline (with 100 threads)? Is that configurable (I guess so)?
2) For a multiple CPU machine (like a 4 CPU one), do I really require
multiple selector read threads if all I want to read is HTTP requests
(simple GET requests with some headers).
Assume the system is heavily loaded (esp. from the data IO point of view;
the server writes a lot, few kilobytes to many megabytes, of http response
bytes to the clients) and has to support ~1000 clients concurrently.

Also, a related question is since the data that the server writes to each
client could vary in kilobytes to megabytes range, do you think that we
should look at a different strategy for handling clients (depending on how
long it would take to serve a client assuming that we have a rough idea of
the serving time). I am coming from the angle that if let's say we have 100
threads (in the pool) serving 100 clients, and if for some reason we are
taking a long time to process those client requests (or writing responses),
then although we will accept requests from other clients, we can't service
them and those unfortunate clients could potentially be starved for a long
time.

-----Original Message-----
From: Jeanfrancois.Arcand_at_Sun.COM [mailto:Jeanfrancois.Arcand_at_Sun.COM]
Sent: Tuesday, June 12, 2007 11:49 PM
To: users_at_grizzly.dev.java.net
Cc: 'Owen O'Malley'; 'Sameer Paranjpye'; 'Tahir Hashmi'
Subject: Re: Grizzly in Hadoop

Hi Devaraj,

Devaraj Das wrote:
> Hi Jeanfrancois,
> Thanks for filing the bug! I would request you to fix this ASAP since
> it's critical for us.

Sure. We should have something this week if all goes well. Now if you can't
wait, you can always fallback to Grizzly 1.0. Not ideal, but a possible
workaround :-)

Thanks!

-- Jeanfrancois

> Regards,
> Devaraj.
>
> -----Original Message-----
> From: Jeanfrancois.Arcand_at_Sun.COM [mailto:Jeanfrancois.Arcand_at_Sun.COM]
> Sent: Tuesday, June 12, 2007 10:14 PM
> To: users_at_grizzly.dev.java.net
> Cc: 'Owen O'Malley'; 'Sameer Paranjpye'
> Subject: Re: Grizzly in Hadoop
>
> Hi,
>
> Jeanfrancois Arcand wrote:
>> Hi Devaraj,
>>
>> Devaraj Das wrote:
>>> Hi Jeanfrancois,
>>> I will get back to you in detail in some time from now. But for now,
>>> I just want to update you that setting the keepalivetimeout to 0
>>> seemed to have improved performance. Thanks for that tip. Now, do
>>> you think, if we start using the persistent connections (transfer a
>>> batch of files at a time) feature, it would help us significantly?
>> Yes it would as you will save the network overhead of opening a
> connection.
>>> Also, will using multiple selectorReadThreads help the performance
>>> (since we have multiple CPUs on our machines)?
>> Right now it is not supported with the http module (Just wakes up I
>> didn't port it yet). Hence that's why you aren't seeing any
>> performance difference.
>
> I've filled:
>
> https://grizzly.dev.java.net/issues/show_bug.cgi?id=4
>
> to track the issue.
>
> -- Jeanfrancois
>
>> Taking about your benchmark, It would be interesting to understand
>> how you configure Jetty. Your connections seem to be data intensive
>> as opposed to connection intensive. NIO is good at handling large
>> amount of connections without spending idle threads but if the
>> threads are not idle anyway (as in this case), the place to make this
>> faster is only to reduce the number of bcopy/memcopy calls. I'm just
>> thinking loud here :-)
>>
>>
>>> I will send the source code of the adpater shortly...
>>> Thanks,
>> Thanks!
>>
>> -- Jeanfrancois
>>
>>> Devaraj.
>>>
>>> -----Original Message-----
>>> From: Jeanfrancois.Arcand_at_Sun.COM
>>> [mailto:Jeanfrancois.Arcand_at_Sun.COM]
>>> Sent: Tuesday, June 12, 2007 6:25 PM
>>> To: users_at_grizzly.dev.java.net; Scott Oaks
>>> Subject: Re: Grizzly in Hadoop
>>>
>>> Hi,
>>>
>>> Devaraj Das wrote:
>>>> Hi,
>>>>
>>>> We are considering using Grizzly (1.5) in Hadoop (an open source
>>>> framework that has the MapReduce and Distributed File System
>>>> implementations). The main reason for using it is to optimize a
>>>> framework
>>> phase called "shuffle".
>>>> In this phase we move lots of data across the network.
>>> Cool :-) I know Hadoop as your Web 2.0 stack is also considering
>>> using it
>>> :-)
>>>
>>>> We are currently using HTTP for moving the data (actually files)
>>>> and we use Jetty5. Now we are thinking of moving to Grizzly (to
>>>> have NIO and all its niceness). But initial experiments with our
>>>> benchmark showed that with Grizzly the performance of the shuffle
>>>> phase is nearly the same as we have with Jetty5. This is not what
>>>> we initially expected and hence would like to get feedback on where
>>>> we might be going
>>> wrong.
>>>> Hadoop is designed to run on large clusters of 100s of nodes
>>>> (currently it can run stable/reliably in a 1K node cluster). From
>>>> the Grizzly point of view, what needs to be known is that each node
>>>> has a HTTP server. Both
>>>> Jetty5 and Grizzly provides the ability to have multiple handlers
>>>> to service the incoming requests.
>>>>
>>>> There are 2 clients on each node, and each client has a
>>>> configurable number of fetcher threads. The fetcher code is written
>>>> using the java.net.URLConnection API.
>>>> Every node has both the server and the clients. The threads
>>>> basically hit the HTTP server asking for specific files. They all
>>>> do this at once (with some randomness in the order of hosts, maybe).
>>>>
>>>> The benchmark that I tested with is a sort for ~5TB of data with a
>>>> cluster of 500 nodes. On the hardware side, I used a cluster that
>>>> has
>>>> 4 dualcore processors in each machine. The machines are partitioned
>>>> into racks with a gigabit ethernet within the rack and 100Mbps
>>>> across the racks. There are roughly 78000 independent files spread
>>>> across these 500 nodes each of size ~60KB that the client pulls
>>>> (and again we
>>> have two such clients per node).
>>>> So you can imagine you have a massive all-all communication
>>>> happening. The configuration for the server and client is as follows:
>>>> Grizzly configuration for port 9999
>>>> maxThreads: 100
>>>> minThreads: 1
>>>> ByteBuffer size: 8192
>>>> useDirectByteBuffer: false
>>>> useByteBufferView: false
>>>> maxHttpHeaderSize: 8192
>>>> maxKeepAliveRequests: 256
>>>> keepAliveTimeoutInSeconds: 10
>>>> Static File Cache enabled: true
>>>> Stream Algorithm :
>>>> com.sun.grizzly.http.algorithms.NoParsingAlgorithm
>>>> Pipeline : com.sun.grizzly.http.LinkedListPipeline
>>>> Round Robin Selector Algorithm enabled: false
>>>> Round Robin Selector pool size: 0
>>>> recycleTasks: true
>>>> Asynchronous Request Processing enabled: false I also
>>>> tried some configs with multiple selectorReadThreads but didn't
>>>> make much difference.
>>> keepAliveTimeoutInSeconds: 10 seems for me a little low...what is
>>> the reason to have such a low number? Closing faster idle connection?
>>>
>>>
>>>> The client has 30 fetcher threads and the way it is designed is
>>>> that only one fetch from any given host would happen at any point
>>>> of time. So if a server host, h1, has 'n' files that we should
>>>> pull, we do that one at a
>>> time
>>>> (as opposed to multiple threads hitting that server to fetch
>>>> multiple
>>> files
>>>> in parallel).
>>>>
>>>> Also, we don't use the features of HTTP1.1 persistent connections
>>>> or pipelining. We fetch exactly one file and close the connection
>>>> to the server.
>>> That's answer my previous question :-) I would recommend setting the
>>> keepAliveTimeoutInSeconds=0 then as I'm sure the performance will
>>> improve (no call to the keep-alive subsystem).
>>>
>>>> With the above setup, the performance I see is not different from
>>>> what I
>>> see
>>>> with Jetty5.
>>> Can it be the benchmark itself that it not able to load more?
>>>
>>>
>>> I see a lot of Read timeouts on the client side (and we have a
>>>> backoff (for the server host) on the client implementation whenever
>>>> we
>>> fail
>>>> to fetch a file). I also saw some exceptions of the form on the server:
>>> You seem to hit the epoll problem on Linux. I know there is a way to
>>> avoid using epoll (a property). I will ping the NIO team and let you
>>> know.
>>>
>>> Also, what exactly your Adapter implementation is doing? Can you
>>> share the code? If I can have access to your setup, I would like to
>>> see if using Grizzly 1.0.15 makes a difference (just to make sure we
>>> don't have a bug in 1.5....as far as I can tell, I.5 is as fast as
>>> 1.0 on my benchmark).
>>>
>>> Thanks,
>>>
>>> --Jeanfrancois
>>>
>>>
>>>> Jun 11, 2007 5:04:51 PM com.sun.grizzly.Controller doSelect
>>>> SEVERE: doSelect exception
>>>> java.io.IOException: Operation not permitted
>>>> at sun.nio.ch.EPollArrayWrapper.epollCtl(Native Method)
>>>> at
>>>>
> sun.nio.ch.EPollArrayWrapper.updateRegistrations(EPollArrayWrapper.jav
> a:202)
>
>>>> at
sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:183)
>>>> at
>>> sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
>>>> at
sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
>>>> at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
>>>> at
>>>> com.sun.grizzly.TCPSelectorHandler.select(TCPSelectorHandler.java:277)
>>>> at com.sun.grizzly.Controller.doSelect(Controller.java:218)
>>>> at com.sun.grizzly.Controller.start(Controller.java:451)
>>>> at
>>>>
> com.sun.grizzly.http.SelectorThread.startListener(SelectorThread.java:
> 1158)
>>>> at
>>>>
> com.sun.grizzly.http.SelectorThread.startEndpoint(SelectorThread.java:
> 1121)
>>>> at
>>> com.sun.grizzly.http.SelectorThread.run(SelectorThread.java:1099)
>>>> Are we missing something in the configuration or something else?
>>>>
>>>> Thanks for the help.
>>>>
>>>> Regards,
>>>> Devaraj.
>>>>
>>>> -------------------------------------------------------------------
>>>> -- To unsubscribe, e-mail: users-unsubscribe_at_grizzly.dev.java.net
>>>> For additional commands, e-mail: users-help_at_grizzly.dev.java.net
>>>>
>>> --------------------------------------------------------------------
>>> - To unsubscribe, e-mail: users-unsubscribe_at_grizzly.dev.java.net
>>> For additional commands, e-mail: users-help_at_grizzly.dev.java.net
>>>
>>>
>>> --------------------------------------------------------------------
>>> - To unsubscribe, e-mail: users-unsubscribe_at_grizzly.dev.java.net
>>> For additional commands, e-mail: users-help_at_grizzly.dev.java.net
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe_at_grizzly.dev.java.net
>> For additional commands, e-mail: users-help_at_grizzly.dev.java.net
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe_at_grizzly.dev.java.net
> For additional commands, e-mail: users-help_at_grizzly.dev.java.net
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe_at_grizzly.dev.java.net
> For additional commands, e-mail: users-help_at_grizzly.dev.java.net
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe_at_grizzly.dev.java.net
For additional commands, e-mail: users-help_at_grizzly.dev.java.net