Welcome! Log In Create A New Profile

Nginx upstream delays

Forum List Message List New Topic Print View

Wiebe Cazemier

March 09, 2015 11:30AM

Hello,

I have a question about sporadic long upstream response times I'm seeing on (two of) our Nginx servers. It's kind of hard to show and quantify, but I'll do my best.

One is a Django Gunicorn server. We included the upstream response time in the Nginx access log and wrote a script to analyze them. What we see, is that on the login page of a website (a page that does almost nothing) 95%-99% of 'GET /customer/login/' requests are processed within about 50 ms. The other few percent can take several seconds. Sometimes even 5s. Our Munin graphs show no correlation in disk latency, cpu time, memory use, etc.

I also added an access log to Gunicorn, so that I can see how long Gunicorn takes to process requests that Nginx thinks take long. Gunicorn has 8 workers. It can be seen that there is actually no delay in Gunicorn. For instance, Nginx sees this (the long upstream response time is marked red, 3.042s):

11.22.33.44 - - [06/Mar/2015:10:27:46 +0100] "GET /customer/login/ HTTP/1.1" 200 8310 0.061 0.121 "-" "Echoping/6.0.2"
11.22.33.44 - - [06/Mar/2015:10:27:48 +0100] "GET /customer/login/ HTTP/1.1" 200 8310 0.035 0.092 "-" "Echoping/6.0.2"
11.22.33.44 - - [ 06/Mar/2015:10:27:52 +0100] "GET /customer/login/ HTTP/1.1" 200 8310 3.042 3.098 "-" "Echoping/6.0.2"
11.22.33.44 - - [06/Mar/2015:10:27:53 +0100] "GET /customer/login/ HTTP/1.1" 200 8310 0.051 0.108 "-" "Echoping/6.0.2"
11.22.33.44 - - [06/Mar/2015:10:27:54 +0100] "GET /customer/login/ HTTP/1.1" 200 8310 0.038 0.096 "-" "Echoping/6.0.2"
x.x.x.x - - [06/Mar/2015:10:27:58 +0100] "POST /customer/login/?next=/customer/home/ HTTP/1.1" 302 5 0.123 0.123

But then the corresponding Gunicorn logs shows normal response times (the figure after 'None', in µs) (Corresponding line marked blue):

<blockquote>

11.22.33.44 - - [06/Mar/2015:10:27:41] "GET /customer/login/ HTTP/1.0" 200 None 41686 "-" "Echoping/6.0.2"
11.22.33.44 - - [06/Mar/2015:10:27:42] "GET /customer/login/ HTTP/1.0" 200 None 27629 "-" "Echoping/6.0.2"
11.22.33.44 - - [06/Mar/2015:10:27:43] "GET /customer/login/ HTTP/1.0" 200 None 28143 "-" "Echoping/6.0.2"
11.22.33.44 - - [06/Mar/2015:10:27:44] "GET /customer/login/ HTTP/1.0" 200 None 41846 "-" "Echoping/6.0.2"
11.22.33.44 - - [06/Mar/2015:10:27:45] "GET /customer/login/ HTTP/1.0" 200 None 30192 "-" "Echoping/6.0.2"
11.22.33.44 - - [06/Mar/2015:10:27:46] "GET /customer/login/ HTTP/1.0" 200 None 59382 "-" "Echoping/6.0.2"
11.22.33.44 - - [06/Mar/2015:10:27:48] "GET /customer/login/ HTTP/1.0" 200 None 33308 "-" "Echoping/6.0.2"
11.22.33.44 - - [ 06/Mar/2015:10:27:52 ] "GET /customer/login/ HTTP/1.0" 200 None 39849 "-" "Echoping/6.0.2"
11.22.33.44 - - [06/Mar/2015:10:27:53] "GET /customer/login/ HTTP/1.0" 200 None 48321 "-" "Echoping/6.0.2"
11.22.33.44 - - [06/Mar/2015:10:27:54] "GET /customer/login/ HTTP/1.0" 200 None 36484 "-" "Echoping/6.0.2"
x.x.x.x - - [06/Mar/2015:10:27:58] "POST /customer/login/?next=/customer/home/ HTTP/1.0" 302 None 122295
y.y.y.y - - [06/Mar/2015:10:28:02] "GET /customer/login/?next=/customer/home/ HTTP/1.0" 200 None 97824
y.y.y.y - - [06/Mar/2015:10:28:03] "GET /customer/login/?next=/customer/home/ HTTP/1.0" 200 None 78162
11.22.33.44 - - [06/Mar/2015:10:28:26] "GET /customer/login/ HTTP/1.0" 200 None 38350 "-" "Echoping/6.0.2"
11.22.33.44 - - [06/Mar/2015:10:28:27] "GET /customer/login/ HTTP/1.0" 200 None 31076 "-" "Echoping/6.0.2"
11.22.33.44 - - [06/Mar/2015:10:28:28] "GET /customer/login/ HTTP/1.0" 200 None 28536 "-" "Echoping/6.0.2"
11.22.33.44 - - [06/Mar/2015:10:28:30] "GET /customer/login/ HTTP/1.0" 200 None 30981 "-" "Echoping/6.0.2"
11.22.33.44 - - [06/Mar/2015:10:28:31] "GET /customer/login/ HTTP/1.0" 200 None 29920 "-" "Echoping/6.0.2"

</blockquote>

As I said, there are currently 8 workers. I already increased them from 4. The log above shows that there are enough seconds between each request that 8 workers should be able to handle it. I also created a MySQL slow log, which doesn't show the delays. MySQL is always fast.

Another server we have is Nginx with PHP-FPM (with 150 PHP children in the pool), no database access. On one particular recent log of a few hundred thousand entries, 99% of requests is done in 129ms. But one response even took 3170ms. Its PHP proxy settings are:

<blockquote>

location ~ \.php$ {
fastcgi_split_path_info ^(.+\.php)(/.+)$;
# NOTE: You should have "cgi.fix_pathinfo = 0;" in php.ini

# With php5-fpm:
fastcgi_pass unix:/var/run/php5-fpm.sock;
fastcgi_index index.php;
include fastcgi_params;
}

</blockquote>

It seems something in the communication between Nginx and the service behind it slows down sometimes, but I can't figure out what it might be. Any idea what it might be or how to diagnose it better?

Regards,

Wiebe Cazemier

_______________________________________________
nginx mailing list
nginx@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx

Reply Quote

RSS

Subject	Author	Posted
Nginx upstream delays	Wiebe Cazemier	March 09, 2015 11:30AM
Re: Nginx upstream delays	B.R.	March 09, 2015 12:08PM
Re: Nginx upstream delays	wandenberg	March 09, 2015 08:38PM
Re: Nginx upstream delays	Wiebe Cazemier	March 10, 2015 05:26AM
Re: Nginx upstream delays	B.R.	March 10, 2015 08:36AM
Re: Nginx upstream delays	Wiebe Cazemier	March 10, 2015 10:56AM
Re: Nginx upstream delays	B.R.	March 10, 2015 01:02PM
Re: Nginx upstream delays	Maxim Dounin	March 10, 2015 10:40AM

Sorry, only registered users may post in this forum.

Click here to login

Online Users

Guests: 285

Record Number of Users: 8 on April 13, 2023

Record Number of Guests: 421 on December 02, 2018