Welcome! Log In Create A New Profile

Advanced

Strange $upstream_response_time latency spikes with reverse proxy

All files from this thread

File Name File Size   Posted by Date  
107.3 KB open | download jay 03/18/2013 Read message
102 KB open | download jay 03/18/2013 Read message
jay
March 14, 2013 10:08PM
Hi list!

I'm trying to debug an interesting problem where we randomly get a "high
latency" response time from our upstream servers. It appears to occur in
about 1.5% of all requests. Here's a description of the tests I've been
running to isolate the problem within nginx:

I'm using an endpoint on the upstream servers that operates extremely
quickly; a request which only responds back with the server's current local
UNIX timestamp. From the nginx server, I start ApacheBench with 5,000
concurrent connections directly to the upstream server (bypassing nginx).
Here is what a typical run of this test looks like:


Document Path: /time/0
Document Length: 19 bytes

Concurrency Level: 5000
Time taken for tests: 0.756 seconds
Complete requests: 5000
Failed requests: 0
Write errors: 0
Total transferred: 1110000 bytes
HTML transferred: 95000 bytes
Requests per second: 6617.33 [#/sec] (mean)
Time per request: 755.592 [ms] (mean)
Time per request: 0.151 [ms] (mean, across all concurrent requests)
Transfer rate: 1434.62 [Kbytes/sec] received

Connection Times (ms)
min mean[+/-sd] median max
Connect: 4 63 53.7 35 167
Processing: 22 44 19.1 38 249
Waiting: 17 35 18.8 30 243
Total: 55 107 64.4 73 401

Percentage of the requests served within a certain time (ms)
50% 73
66% 77
75% 84
80% 202
90% 222
95% 231
98% 250
99% 251
100% 401 (longest request)



And here's the same test with the longest response times I've seen:


Document Path: /time/0
Document Length: 19 bytes

Concurrency Level: 5000
Time taken for tests: 0.807 seconds
Complete requests: 5000
Failed requests: 0
Write errors: 0
Total transferred: 1110000 bytes
HTML transferred: 95000 bytes
Requests per second: 6197.08 [#/sec] (mean)
Time per request: 806.831 [ms] (mean)
Time per request: 0.161 [ms] (mean, across all concurrent requests)
Transfer rate: 1343.51 [Kbytes/sec] received

Connection Times (ms)
min mean[+/-sd] median max
Connect: 3 45 51.8 17 144
Processing: 10 29 24.4 23 623
Waiting: 9 25 24.4 18 623
Total: 26 75 67.4 39 626

Percentage of the requests served within a certain time (ms)
50% 39
66% 42
75% 45
80% 173
90% 190
95% 199
98% 217
99% 218
100% 626 (longest request)



Not bad. Now, keep in mind, this is a SINGLE upstream server handling these
requests over the network. Once I change my test to point ab at the local
nginx, the strange latency issue rears its ugly head. I have 4 upstream
servers in my config.

Here's what the same test through nginx looks like:


Concurrency Level: 5000
Time taken for tests: 1.602 seconds
Complete requests: 5000
Failed requests: 0
Write errors: 0
Total transferred: 1170000 bytes
HTML transferred: 95000 bytes
Requests per second: 3121.08 [#/sec] (mean)
Time per request: 1602.012 [ms] (mean)
Time per request: 0.320 [ms] (mean, across all concurrent requests)
Transfer rate: 713.21 [Kbytes/sec] received

Connection Times (ms)
min mean[+/-sd] median max
Connect: 109 172 39.4 167 246
Processing: 106 505 143.3 530 1248
Waiting: 103 504 143.5 530 1248
Total: 344 677 108.6 696 1358

Percentage of the requests served within a certain time (ms)
50% 696
66% 723
75% 741
80% 752
90% 768
95% 779
98% 786
99% 788
100% 1358 (longest request)



Ack! It's like nginx decides to drop an extra second on some requests for
no reason.

I've also recorded these test runs with nginx's access log. Here's the log
format, first:


log_format main '$remote_addr - - ' ## User's IP Address
'[$time_local] ' ## DateTime
'"$request" ' ## User's Request
URL
'$status ' ## HTTP Code
'$body_bytes_sent ' ## Bytes BODY ONLY
'"$http_referer" ' ## User's Referer
'"$http_user_agent" ' ## User's Agent
'$request_time ' ## NginX Response
'$upstream_response_time ' ## Upstream Response
'$bytes_sent ' ## Bytes Sent (GZIP)
'$request_length'; ## String Length



The access log has 10,000 lines total (i.e. two of these tests with 5,000
concurrent connections), and when I sort by upstream_response_time, I get a
log with the first 140 lines having about 1s on the upstream_response_time,
and the remaining 9,860 lines show 700ms and less. Here's a snippet showing
the strangeness, starting with line numbers:


1: 127.0.0.1 - - [14/Mar/2013:17:37:21 -0700] "GET /time/0 HTTP/1.0"
200 19 "-" "ApacheBench/2.3" 1.027 1.026 234 83
2: 127.0.0.1 - - [14/Mar/2013:17:37:21 -0700] "GET /time/0 HTTP/1.0"
200 19 "-" "ApacheBench/2.3" 1.027 1.026 234 83
3: 127.0.0.1 - - [14/Mar/2013:17:37:21 -0700] "GET /time/0 HTTP/1.0"
200 19 "-" "ApacheBench/2.3" 1.026 1.025 234 83
....
138: 127.0.0.1 - - [14/Mar/2013:17:57:18 -0700] "GET /time/0 HTTP/1.0"
200 19 "-" "ApacheBench/2.3" 1.000 0.999 234 81
139: 127.0.0.1 - - [14/Mar/2013:17:57:18 -0700] "GET /time/0 HTTP/1.0"
200 19 "-" "ApacheBench/2.3" 0.999 0.999 234 81
140: 127.0.0.1 - - [14/Mar/2013:17:57:18 -0700] "GET /time/0 HTTP/1.0"
200 19 "-" "ApacheBench/2.3" 0.999 0.999 234 81
141: 127.0.0.1 - - [14/Mar/2013:17:37:21 -0700] "GET /time/0 HTTP/1.0"
200 19 "-" "ApacheBench/2.3" 0.708 0.568 234 83
142: 127.0.0.1 - - [14/Mar/2013:17:37:21 -0700] "GET /time/0 HTTP/1.0"
200 19 "-" "ApacheBench/2.3" 0.708 0.568 234 83
143: 127.0.0.1 - - [14/Mar/2013:17:37:21 -0700] "GET /time/0 HTTP/1.0"
200 19 "-" "ApacheBench/2.3" 0.708 0.568 234 83
....
9998: 127.0.0.1 - - [14/Mar/2013:17:57:16 -0700] "GET /time/0 HTTP/1.0"
200 19 "-" "ApacheBench/2.3" 0.142 0.005 234 81
9999: 127.0.0.1 - - [14/Mar/2013:17:57:16 -0700] "GET /time/0 HTTP/1.0"
200 19 "-" "ApacheBench/2.3" 0.142 0.005 234 81
10000: 127.0.0.1 - - [14/Mar/2013:17:57:16 -0700] "GET /time/0 HTTP/1.0"
200 19 "-" "ApacheBench/2.3" 0.122 0.002 234 81



The upstream_response_time difference between line 140 and 141 is nearly
500ms! The total request_time also displays an interesting gap of almost
300ms. What's going on here?

The kernels have been tuned on all servers for a high number of open files,
and tcp buffers:

$ ulimit -a
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 119715
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1048576
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 119715
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited



$ cat /proc/sys/net/core/*mem_*
184217728
184217728
184217728
184217728
184217728



Also for reference, here is part of my nginx.conf which may be useful for
diagnosis:


worker_processes 7;
worker_rlimit_nofile 500000;

events {
use epoll;
worker_connections 500000;
multi_accept on;
}

http {
log_format main ...

access_log ...

##
-----------------------------------------------------------------------
## TCP Tuning
##
-----------------------------------------------------------------------
sendfile off;
tcp_nopush off;
tcp_nodelay on;

##
-----------------------------------------------------------------------
## Max Data Size
##
-----------------------------------------------------------------------
client_max_body_size 1k;
client_body_buffer_size 1k;
client_header_buffer_size 32k;
large_client_header_buffers 200 32k;

##
-----------------------------------------------------------------------
## GZIP
##
-----------------------------------------------------------------------
gzip on;
gzip_min_length 1000;
gzip_disable msie6;
gzip_proxied any;
gzip_buffers 100 64k;
gzip_types text/javascript;

##
-----------------------------------------------------------------------
## Proxy Load Distribution
##
-----------------------------------------------------------------------
proxy_redirect off;
proxy_connect_timeout 5;
proxy_send_timeout 5;
proxy_read_timeout 8;
proxy_next_upstream error timeout invalid_header http_500
http_502 http_503 http_504;
proxy_buffering off;

##
-----------------------------------------------------------------------
## Hide 'Server: nginx' Server Header
##
-----------------------------------------------------------------------
server_tokens off;
proxy_pass_header Server;

upstream ...

server ...
}
_______________________________________________
nginx mailing list
nginx@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx
Subject Author Posted

Strange $upstream_response_time latency spikes with reverse proxy

jay March 14, 2013 10:08PM

Re: Strange $upstream_response_time latency spikes with reverse proxy

Maxim Dounin March 15, 2013 04:22AM

Re: Strange $upstream_response_time latency spikes with reverse proxy

jay March 16, 2013 04:38AM

Re: Strange $upstream_response_time latency spikes with reverse proxy

nginxorg March 16, 2013 11:06AM

Re: Strange $upstream_response_time latency spikes with reverse proxy

jay March 17, 2013 05:20AM

Re: Strange $upstream_response_time latency spikes with reverse proxy Attachments

jay March 18, 2013 06:10PM

Re: Strange $upstream_response_time latency spikes with reverse proxy

Andrei Belov March 19, 2013 05:50AM

Re: Strange $upstream_response_time latency spikes with reverse proxy

jay March 19, 2013 06:44PM

Re: Strange $upstream_response_time latency spikes with reverse proxy

jay March 22, 2013 10:14PM

Re: Strange $upstream_response_time latency spikes with reverse proxy

Maxim Dounin March 16, 2013 07:40PM

Re: Strange $upstream_response_time latency spikes with reverse proxy

jay March 17, 2013 05:24AM

Re: Strange $upstream_response_time latency spikes with reverse proxy

Maxim Dounin March 17, 2013 07:44AM

Re: Strange $upstream_response_time latency spikes with reverse proxy

jay March 18, 2013 05:20PM

Re: Strange $upstream_response_time latency spikes with reverse proxy

Maxim Dounin March 19, 2013 10:20AM

Re: Strange $upstream_response_time latency spikes with reverse proxy

jay March 19, 2013 06:18PM



Sorry, only registered users may post in this forum.

Click here to login

Online Users

Guests: 59
Record Number of Users: 2 on January 19, 2018
Record Number of Guests: 156 on January 19, 2018
Powered by nginx      Powered by FreeBSD      PHP Powered      Powered by MariaDB      ipv6 ready