We have a setup that looks like this:
nginx->haproxy->app servers
We are terminating SSL with nginx and it sits in front of everything. During our peak load times, we are experiencing about a 2x performance hit. Requests that would normally take 400 ms are taking 800ms. It's taking longer for the entire Internet.
The problem is, I have absolutely no sign of any slowdowns in my logs and graphs. New Relic shows all the app servers are responding correctly with no change in speed. Nginx and haproxy show nothing in their logs about requests slowing down, but we are slowing down at end users. Despite nginx showing that a particular request I tracked is taking 17ms ($response_time) through the entire stack, it took 1.5 seconds to curl it during peak load last week.
So, that leaves me with two options:
1) Network issues - I have more than enough pipe left according to graphs from the router. I'm only using 400 Mbps out of the 1 Gbps port and there are no errors in ifconfig or on the switch or routers. However, SoftLayer manages this gear, so I can't verify this personally.
2) nginx is holding up the request and either not logging it or I'm not logging the right thing. Is it possible that requests are being queued up because workers are busier and they're not getting acted on as quickly? If this is in fact happening, what can I log in nginx other than $response_time, since that is showing no slowdown at all. And, if this is possible that requests are actually taking longer than $response_time is indicating, how do I go about tweaking the config to speed things up?