Hi,
Devaraj Das wrote:
> Hi,
>
> We are considering using Grizzly (1.5) in Hadoop (an open source framework
> that has the MapReduce and Distributed File System implementations). The
> main reason for using it is to optimize a framework phase called "shuffle".
> In this phase we move lots of data across the network.
Cool :-) I know Hadoop as your Web 2.0 stack is also considering using
it :-)
>
> We are currently using HTTP for moving the data (actually files) and we use
> Jetty5. Now we are thinking of moving to Grizzly (to have NIO and all its
> niceness). But initial experiments with our benchmark showed that with
> Grizzly the performance of the shuffle phase is nearly the same as we have
> with Jetty5. This is not what we initially expected and hence would like to
> get feedback on where we might be going wrong.
>
> Hadoop is designed to run on large clusters of 100s of nodes (currently it
> can run stable/reliably in a 1K node cluster). From the Grizzly point of
> view, what needs to be known is that each node has a HTTP server. Both
> Jetty5 and Grizzly provides the ability to have multiple handlers to service
> the incoming requests.
>
> There are 2 clients on each node, and each client has a configurable number
> of fetcher threads. The fetcher code is written using the
> java.net.URLConnection API.
> Every node has both the server and the clients. The threads basically hit
> the HTTP server asking for specific files. They all do this at once (with
> some randomness in the order of hosts, maybe).
>
> The benchmark that I tested with is a sort for ~5TB of data with a cluster
> of 500 nodes. On the hardware side, I used a cluster that has 4 dualcore
> processors in each machine. The machines are partitioned into racks with a
> gigabit ethernet within the rack and 100Mbps across the racks. There are
> roughly 78000 independent files spread across these 500 nodes each of size
> ~60KB that the client pulls (and again we have two such clients per node).
> So you can imagine you have a massive all-all communication happening. The
> configuration for the server and client is as follows:
> Grizzly configuration for port 9999
> maxThreads: 100
> minThreads: 1
> ByteBuffer size: 8192
> useDirectByteBuffer: false
> useByteBufferView: false
> maxHttpHeaderSize: 8192
> maxKeepAliveRequests: 256
> keepAliveTimeoutInSeconds: 10
> Static File Cache enabled: true
> Stream Algorithm :
> com.sun.grizzly.http.algorithms.NoParsingAlgorithm
> Pipeline : com.sun.grizzly.http.LinkedListPipeline
> Round Robin Selector Algorithm enabled: false
> Round Robin Selector pool size: 0
> recycleTasks: true
> Asynchronous Request Processing enabled: false
> I also tried some configs with multiple selectorReadThreads but didn't make
> much difference.
keepAliveTimeoutInSeconds: 10 seems for me a little low...what is the
reason to have such a low number? Closing faster idle connection?
>
> The client has 30 fetcher threads and the way it is designed is that only
> one fetch from any given host would happen at any point of time. So if a
> server host, h1, has 'n' files that we should pull, we do that one at a time
> (as opposed to multiple threads hitting that server to fetch multiple files
> in parallel).
>
> Also, we don't use the features of HTTP1.1 persistent connections or
> pipelining. We fetch exactly one file and close the connection to the
> server.
That's answer my previous question :-) I would recommend setting the
keepAliveTimeoutInSeconds=0 then as I'm sure the performance will
improve (no call to the keep-alive subsystem).
>
> With the above setup, the performance I see is not different from what I see
> with Jetty5.
Can it be the benchmark itself that it not able to load more?
I see a lot of Read timeouts on the client side (and we have a
> backoff (for the server host) on the client implementation whenever we fail
> to fetch a file). I also saw some exceptions of the form on the server:
You seem to hit the epoll problem on Linux. I know there is a way to
avoid using epoll (a property). I will ping the NIO team and let you know.
Also, what exactly your Adapter implementation is doing? Can you share
the code? If I can have access to your setup, I would like to see if
using Grizzly 1.0.15 makes a difference (just to make sure we don't have
a bug in 1.5....as far as I can tell, I.5 is as fast as 1.0 on my
benchmark).
Thanks,
--Jeanfrancois
>
> Jun 11, 2007 5:04:51 PM com.sun.grizzly.Controller doSelect
> SEVERE: doSelect exception
> java.io.IOException: Operation not permitted
> at sun.nio.ch.EPollArrayWrapper.epollCtl(Native Method)
> at
> sun.nio.ch.EPollArrayWrapper.updateRegistrations(EPollArrayWrapper.java:202)
> at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:183)
> at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
> at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
> at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
> at
> com.sun.grizzly.TCPSelectorHandler.select(TCPSelectorHandler.java:277)
> at com.sun.grizzly.Controller.doSelect(Controller.java:218)
> at com.sun.grizzly.Controller.start(Controller.java:451)
> at
> com.sun.grizzly.http.SelectorThread.startListener(SelectorThread.java:1158)
> at
> com.sun.grizzly.http.SelectorThread.startEndpoint(SelectorThread.java:1121)
> at com.sun.grizzly.http.SelectorThread.run(SelectorThread.java:1099)
>
> Are we missing something in the configuration or something else?
>
> Thanks for the help.
>
> Regards,
> Devaraj.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe_at_grizzly.dev.java.net
> For additional commands, e-mail: users-help_at_grizzly.dev.java.net
>