Welcome! Log In Create A New Profile

Advanced

Re: [PATCH 21 of 31] Fix cpu hog with all upstream servers marked "down"

Oded Arbel
August 15, 2011 11:00AM
Regarding the above mentioned patch (also quoted below), I wanted to provide feedback on this:

On my system, we have several reverse proxy servers running Nginx and forwarding requests to upstream. Our configuration looks like this:
upstream trc {
server prod2-f1:10213 max_fails=500 fail_timeout=30s;
server prod2-f2:10213 max_fails=500 fail_timeout=30s;
...
server 127.0.0.1:10213 backup;
ip_hash;
}

We've noticed that every once in a while (about 5-10 times a week) one of the servers gets into a state where an Nginx worker starts eating 100% CPU and timing out on requests. I've applied the aforementioned patch to our Nginx installation (release 1.0.0 with the Nginx_Upstream_Hash patch) and deployed to our production servers. After a few hours, we started having the Nginx workers on all the servers eat 100% CPU.

Connecting with gdb to one of the problematic worker I got this backtrace:
#0 0x000000000044a650 in ngx_http_upstream_get_round_robin_peer ()
#1 0x00000000004253dc in ngx_event_connect_peer ()
#2 0x0000000000448618 in ngx_http_upstream_connect ()
#3 0x0000000000448e10 in ngx_http_upstream_process_header ()
#4 0x00000000004471fb in ngx_http_upstream_handler ()
#5 0x00000000004247fa in ngx_event_expire_timers ()
#6 0x00000000004246ed in ngx_process_events_and_timers ()
#7 0x000000000042a048 in ngx_worker_process_cycle ()
#8 0x00000000004287e0 in ngx_spawn_process ()
#9 0x000000000042963c in ngx_start_worker_processes ()
#10 0x000000000042a5d5 in ngx_master_process_cycle ()
#11 0x0000000000410adf in main ()

I then tried tracing through the running worker using the GDB command "next", which said:
Single stepping until exit from function ngx_http_upstream_get_round_robin_peer

And never returned until I got fed up and broke it.

I finally reverted the patch and restarted the service, and continue to get this behavior. So my conclusion is that for my specific problem, this patch does not solve it.

--
Oded <oded@geek.co.il>


diff --git a/src/http/ngx_http_upstream_round_robin.c b/src/http/ngx_http_upstream_round_robin.c
--- a/src/http/ngx_http_upstream_round_robin.c
+++ b/src/http/ngx_http_upstream_round_robin.c
@@ -583,7 +583,7 @@ failed:
static ngx_uint_t
ngx_http_upstream_get_peer(ngx_http_upstream_rr_peers_t *peers)
{
- ngx_uint_t i, n;
+ ngx_uint_t i, n, reset = 0;
ngx_http_upstream_rr_peer_t *peer;

peer = &peers->peer[0];
@@ -622,6 +622,10 @@ ngx_http_upstream_get_peer(ngx_http_upst
return n;
}

+ if (reset++) {
+ return 0;
+ }
+
for (i = 0; i < peers->number; i++) {
peer[i].current_weight = peer[i].weight;
}

_______________________________________________
nginx-devel mailing list
nginx-devel@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx-devel
Subject Author Views Posted

[PATCH 00 of 31] generic patch queue for 1.0.4

Maxim Dounin 3868 June 27, 2011 01:10PM

[PATCH 01 of 31] Cache: fix another "stalled cache updating" alert

Maxim Dounin 1022 June 27, 2011 01:10PM

Re: [PATCH 01 of 31] Cache: fix another "stalled cache updating" alert

Kirill A. Korinskiy 995 June 27, 2011 02:06PM

Re: [PATCH 01 of 31] Cache: fix another "stalled cache updating" alert

Kirill A. Korinskiy 1012 June 27, 2011 03:00PM

[PATCH 02 of 31] Fastcgi: fix fastcgi_param with "HTTP_"

Maxim Dounin 1037 June 27, 2011 01:10PM

[PATCH 03 of 31] Bugfix: https wasn't working on systems with 32-bit off_t

Maxim Dounin 1178 June 27, 2011 01:10PM

[PATCH 04 of 31] Upstream: fix request finalization if client timed out

Maxim Dounin 949 June 27, 2011 01:10PM

[PATCH 05 of 31] Upstream: properly allocate memory for tried flags

Maxim Dounin 1046 June 27, 2011 01:10PM

[PATCH 06 of 31] Complain on invalid log levels

Maxim Dounin 1274 June 27, 2011 01:10PM

[PATCH 07 of 31] Fix incorrect 201 replies from dav module

Maxim Dounin 1046 June 27, 2011 01:10PM

[PATCH 08 of 31] Fix double content when return is used in error_page redirection

Maxim Dounin 1118 June 27, 2011 01:10PM

[PATCH 09 of 31] Drop incorrect special case for return 204

Maxim Dounin 1079 June 27, 2011 01:10PM

[PATCH 10 of 31] Clear old Location header (if any) while adding new one

Maxim Dounin 1051 June 27, 2011 01:10PM

[PATCH 11 of 31] Better handle various per-server ssl options with SNI

Maxim Dounin 1197 June 27, 2011 01:10PM

[PATCH 12 of 31] Better handle late upstream creation

Maxim Dounin 965 June 27, 2011 01:12PM

[PATCH 13 of 31] Gzip filter: handle empty flush buffers

Maxim Dounin 1120 June 27, 2011 01:12PM

[PATCH 14 of 31] Fix connection drops with AIO

Maxim Dounin 904 June 27, 2011 01:12PM

[PATCH 15 of 31] Fix socket leak with "aio sendfile" and "limit_rate" directives

Maxim Dounin 1123 June 27, 2011 01:12PM

[PATCH 16 of 31] Correctly handle Content-Encoding set from perl

Maxim Dounin 901 June 27, 2011 01:12PM

[PATCH 17 of 31] Gzip static: "always" parameter in "gzip_static" directive

Maxim Dounin 1088 June 27, 2011 01:12PM

Re: [PATCH 17 of 31] Gzip static: "always" parameter in "gzip_static" directive

Zhu Qun-Ying 958 June 27, 2011 02:02PM

Re: [PATCH 17 of 31] Gzip static: "always" parameter in "gzip_static" directive

Maxim Dounin 1063 June 28, 2011 06:34AM

[PATCH 18 of 31] Memcached: memcached_gzip_flag directive

Maxim Dounin 1015 June 27, 2011 01:12PM

[PATCH 19 of 31] Mail: handle smtp multiline replies

Maxim Dounin 1001 June 27, 2011 01:12PM

[PATCH 20 of 31] Additional headers for proxy_ignore_headers/fastcgi_ignore_headers

Maxim Dounin 1120 June 27, 2011 01:12PM

[PATCH 21 of 31] Fix cpu hog with all upstream servers marked "down"

Maxim Dounin 951 June 27, 2011 01:12PM

Re: [PATCH 21 of 31] Fix cpu hog with all upstream servers marked "down"

Oded Arbel 983 August 15, 2011 11:00AM

Re: [PATCH 21 of 31] Fix cpu hog with all upstream servers marked "down"

Maxim Dounin 983 August 15, 2011 12:00PM

Re: [PATCH 21 of 31] Fix cpu hog with all upstream servers marked "down"

lanshun zhou 928 August 15, 2011 01:52PM

Re: [PATCH 21 of 31] Fix cpu hog with all upstream servers marked "down"

Oded Arbel 1012 August 15, 2011 11:10AM

Re: [PATCH 21 of 31] Fix cpu hog with all upstream servers marked "down"

Oded Arbel 1262 August 15, 2011 02:46PM

[PATCH 22 of 31] Cache: correctly set conf_file while adding paths

Maxim Dounin 1168 June 27, 2011 01:12PM

[PATCH 23 of 31] Upstream: fix proxy_store leaving temporary files for subrequests

Maxim Dounin 1265 June 27, 2011 01:12PM

[PATCH 24 of 31] Cache: fix sending of empty responses

Maxim Dounin 1037 June 27, 2011 01:14PM

[PATCH 25 of 31] Cache: fix sending of stale responses

Maxim Dounin 1162 June 27, 2011 01:14PM

[PATCH 26 of 31] Variables: honor no_cacheable for not_found variables

Maxim Dounin 1132 June 27, 2011 01:14PM

[PATCH 27 of 31] Core: protect from subrequest loops

Maxim Dounin 1054 June 27, 2011 01:14PM

[PATCH 28 of 31] Core: resolve various cycles with named locations and post_action

Maxim Dounin 1092 June 27, 2011 01:14PM

[PATCH 29 of 31] Autoindex: escape '?' in file names

Maxim Dounin 981 June 27, 2011 01:14PM

[PATCH 30 of 31] Autoindex: escape html in file names

Maxim Dounin 868 June 27, 2011 01:14PM

[PATCH 31 of 31] Unbreak build with embedded perl and --with-openssl

Maxim Dounin 904 June 27, 2011 01:14PM

Re: [PATCH 00 of 31] generic patch queue for 1.0.4

António P. P. Almeida 943 June 27, 2011 10:10PM

Re: [PATCH 00 of 31] generic patch queue for 1.0.4

Maxim Dounin 1126 June 28, 2011 10:40AM

Re: [PATCH 00 of 31] generic patch queue for 1.0.4

fanboy 976 June 28, 2011 01:48AM

Re: [PATCH 00 of 31] generic patch queue for 1.0.4

Maxim Dounin 1195 June 28, 2011 11:00AM



Sorry, you do not have permission to post/reply in this forum.

Online Users

Guests: 248
Record Number of Users: 8 on April 13, 2023
Record Number of Guests: 421 on December 02, 2018
Powered by nginx      Powered by FreeBSD      PHP Powered      Powered by MariaDB      ipv6 ready