Thursday, August 28, 2008

web 2.0 image clusters

at http://livemocha.com we use a cluster of lighttpd servers to handle the requests for image, audio/video files. the reason to use lighttpd instead of apache is to do with the superior capability of lighttpd to handle large number of concurrent requests with low latency.

the apache server will use one process to serve a request. while the request is being served, that process will not accept any other requests. any other request needs to be served by a different apache process.

for a low volume server, this is not a problem. there are enough apache processes lieing around to serve the concurrent requests.

it does become a problem with high volume. a typical web page may have 20-30 images, a css file, some javascript files. the browser will first make a request to apache (assuming apache is the web server) for the main web page. then as the browser parses the page, it will issue additional requests to the image files etc found on the page.

if these requests go back to apache, they will each have to be served by one apache process one at a time. the KeepAlive option at apache **may** provide some relief by keeping the TCP connection open at apache so that the browser can keep using that for the images. but this doesn't change the fact that one apache process will always block other requests while it is serving the image.

the image request is mostly I/O bound - meaning most of the time to fulfill the request will be spent waiting for the image file to be fetched from a hard drive, perhaps in some network share. in the case of livemocha, most images were served off a NFS server. So it is quite wasteful to have a process sitting around doing nothing but waiting for I/O to complete.

Having a process waiting around is bad because it takes memory. There is only a finite amount of it in the server, and when there are more processes than that can fit memory the O/S will have to swap a processes into disk to make room for another process. When the O/S context switches to the swapped process, it has to be brought back into memory again, possibly resulting in another process being swapped, so on and so forth. this is a vicious cycle that can and will bring the server to its knees - this phenomena is known as the "slashdot effect".

lighttpd can serve multiple requests from one process. it uses asynchronous I/O available in linux 2.6 kernels (ex: epoll) to multiplex multiple requests in one process. this uses very low memory and can perform well under the "slashdot effect".

nginx is another server that performs very well using similar technologies. we decided on lighttpd mainly because some other key sites were known to use it. from benchmarks, nginx seemed to be superior.

No comments: