Hello,
I've been playing around with writing an nginx module and trying to configure it to run at high load (testing with the curl-loader tool). My bench mark is the http-static-module, that is, I want to run at least as much load on my module as the static module can without errors. I'm also (for now) keeping the default number of worker processes (1) and worker connections (1024), but more on that later.
Currently, using curl-loader, I can send requests/recv responses off to the ngx_http_static_module at a rate of 5000-7000 requests per second over a period of several minutes, all with 200 OK responses.
With my module, I usually manage to get up to about 1000 requests per second with all 200 OK responses.
Then, pass that threshold, I start to see "Err Connection Time Out" problems cropping up in the curl-loader log. Usually there will be long blocks (maybe 20 or so) of them, then I'll go back to 200
OK's. The rates are still good, maybe 200 time outs out of 100,000 connections, but I'm wondering why they aren't perfect like the http-static-module.
The only real difference I can see between my module and the static module is the time it takes to generate the response (I've set the test up so that they return the same amount of data, ~5K, however my module does do other memory allocations for processing).
I used gettimeofday to try and get microsecond resolution on the time it takes to generate a response.
With the static module, I see about 20-50 microseconds on average to generate a response. My module, which has to do more processing, takes on average 60-260 microseconds to generate its response. The pattern seems to start on the lower side, get larger, then go back to the lower side, but this isnt' exact. Note that in both cases though, I occasionally see randomly high times (like 15000 microseconds), however, this doens't correspond to the number of timeouts I see in curl loader (indeed, I get this even for that static module, which doens't time out) .
So I tried simply adding a delay with usleep into the static module, and sure enough, I started seeing time out errors cropping up with the static module. So it seems the number of time outs is (roughly) proportional to the time it takes to generate the response.
But I'm still not clear on why nginx is sending time outs at all. That is, if it takes longer to generate the response, shouldn't it just take longer to send to response? Is there a configurable value somewhere that's causing nginx to send a time out? What effect does the number of worker processes and connections have? I have curl-loader set to have no limit on completion time (which I believe is the default), so I don't think it's what's causing the time outs, but I'm not sure (there is nothing in nginx's error.log when I get a time out).
I can indeed increase the number or worker processes/connections to get better throughput with my module, but it takes more dramatic increases then I would expect. E.g. 40 processes and 4000 connections or so let me run 1400 connections/second on my module without errors. This helped bring the processing time down to about 60-140 microseconds. But it seems there should be a better way to achieve this throughput without using that many resources.
Any advice you might have would be helpful. One specific thing I'm wondering is if I'm being too liberal with my use of ngx_palloc/calloc, and that might be slowing things down? I.e. might explicit frees of the memory when its done help? But any other ideas would be great too.
Thanks, and have a good day!