Welcome! Log In Create A New Profile

Advanced

Re: [PATCH 21 of 31] Fix cpu hog with all upstream servers marked "down"

Oded Arbel
August 15, 2011 11:10AM
----- Original Message -----
> Regarding the above mentioned patch (also quoted below), I wanted to
> provide feedback on this:
>
> On my system, we have several reverse proxy servers running Nginx and
> forwarding requests to upstream. Our configuration looks like this:
> upstream trc {
> server prod2-f1:10213 max_fails=500 fail_timeout=30s;
> server prod2-f2:10213 max_fails=500 fail_timeout=30s;
> ...
> server 127.0.0.1:10213 backup;
> ip_hash;
> }
>
> We've noticed that every once in a while (about 5-10 times a week)
> one of the servers gets into a state where an Nginx worker starts
> eating 100% CPU and timing out on requests. I've applied the
> aforementioned patch to our Nginx installation (release 1.0.0 with
> the Nginx_Upstream_Hash patch) and deployed to our production
> servers. After a few hours, we started having the Nginx workers on
> all the servers eat 100% CPU.
>
> Connecting with gdb to one of the problematic worker I got this
> backtrace:
> #0 0x000000000044a650 in ngx_http_upstream_get_round_robin_peer ()
> #1 0x00000000004253dc in ngx_event_connect_peer ()
> #2 0x0000000000448618 in ngx_http_upstream_connect ()
> #3 0x0000000000448e10 in ngx_http_upstream_process_header ()
> #4 0x00000000004471fb in ngx_http_upstream_handler ()
> #5 0x00000000004247fa in ngx_event_expire_timers ()
> #6 0x00000000004246ed in ngx_process_events_and_timers ()
> #7 0x000000000042a048 in ngx_worker_process_cycle ()
> #8 0x00000000004287e0 in ngx_spawn_process ()
> #9 0x000000000042963c in ngx_start_worker_processes ()
> #10 0x000000000042a5d5 in ngx_master_process_cycle ()
> #11 0x0000000000410adf in main ()
>
> I then tried tracing through the running worker using the GDB command
> "next", which said:
> Single stepping until exit from function
> ngx_http_upstream_get_round_robin_peer
>
> And never returned until I got fed up and broke it.
>
> I finally reverted the patch and restarted the service, and continue
> to get this behavior. So my conclusion is that for my specific
> problem, this patch does not solve it.

Additionally:

1) I believe that my problem is related to the fact that I have 25% of the upstream servers configured in the "down" state (due to some unrelated work on those servers). I've just removed the "down" servers and restarted, and I will see if that will prevent the problem from happening.
2) the trigger for the problem is continuous load on the servers over a length of time - with minimal load or with occasional spikes, the servers performs fine. The reason is likely that under more then moderate load, the upstream application servers have a relatively high request failure rate (something like 2-3%) which causes upstream applications servers to always go in and out of the "down" state automatically, so the list of "up" servers is always in flux.

--
Oded <oded@geek.co.il>

_______________________________________________
nginx-devel mailing list
nginx-devel@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx-devel
Subject Author Views Posted

[PATCH 00 of 31] generic patch queue for 1.0.4

Maxim Dounin 3867 June 27, 2011 01:10PM

[PATCH 01 of 31] Cache: fix another "stalled cache updating" alert

Maxim Dounin 1021 June 27, 2011 01:10PM

Re: [PATCH 01 of 31] Cache: fix another "stalled cache updating" alert

Kirill A. Korinskiy 995 June 27, 2011 02:06PM

Re: [PATCH 01 of 31] Cache: fix another "stalled cache updating" alert

Kirill A. Korinskiy 1011 June 27, 2011 03:00PM

[PATCH 02 of 31] Fastcgi: fix fastcgi_param with "HTTP_"

Maxim Dounin 1037 June 27, 2011 01:10PM

[PATCH 03 of 31] Bugfix: https wasn't working on systems with 32-bit off_t

Maxim Dounin 1178 June 27, 2011 01:10PM

[PATCH 04 of 31] Upstream: fix request finalization if client timed out

Maxim Dounin 949 June 27, 2011 01:10PM

[PATCH 05 of 31] Upstream: properly allocate memory for tried flags

Maxim Dounin 1045 June 27, 2011 01:10PM

[PATCH 06 of 31] Complain on invalid log levels

Maxim Dounin 1274 June 27, 2011 01:10PM

[PATCH 07 of 31] Fix incorrect 201 replies from dav module

Maxim Dounin 1046 June 27, 2011 01:10PM

[PATCH 08 of 31] Fix double content when return is used in error_page redirection

Maxim Dounin 1118 June 27, 2011 01:10PM

[PATCH 09 of 31] Drop incorrect special case for return 204

Maxim Dounin 1079 June 27, 2011 01:10PM

[PATCH 10 of 31] Clear old Location header (if any) while adding new one

Maxim Dounin 1051 June 27, 2011 01:10PM

[PATCH 11 of 31] Better handle various per-server ssl options with SNI

Maxim Dounin 1197 June 27, 2011 01:10PM

[PATCH 12 of 31] Better handle late upstream creation

Maxim Dounin 964 June 27, 2011 01:12PM

[PATCH 13 of 31] Gzip filter: handle empty flush buffers

Maxim Dounin 1120 June 27, 2011 01:12PM

[PATCH 14 of 31] Fix connection drops with AIO

Maxim Dounin 904 June 27, 2011 01:12PM

[PATCH 15 of 31] Fix socket leak with "aio sendfile" and "limit_rate" directives

Maxim Dounin 1122 June 27, 2011 01:12PM

[PATCH 16 of 31] Correctly handle Content-Encoding set from perl

Maxim Dounin 901 June 27, 2011 01:12PM

[PATCH 17 of 31] Gzip static: "always" parameter in "gzip_static" directive

Maxim Dounin 1087 June 27, 2011 01:12PM

Re: [PATCH 17 of 31] Gzip static: "always" parameter in "gzip_static" directive

Zhu Qun-Ying 958 June 27, 2011 02:02PM

Re: [PATCH 17 of 31] Gzip static: "always" parameter in "gzip_static" directive

Maxim Dounin 1063 June 28, 2011 06:34AM

[PATCH 18 of 31] Memcached: memcached_gzip_flag directive

Maxim Dounin 1015 June 27, 2011 01:12PM

[PATCH 19 of 31] Mail: handle smtp multiline replies

Maxim Dounin 1001 June 27, 2011 01:12PM

[PATCH 20 of 31] Additional headers for proxy_ignore_headers/fastcgi_ignore_headers

Maxim Dounin 1120 June 27, 2011 01:12PM

[PATCH 21 of 31] Fix cpu hog with all upstream servers marked "down"

Maxim Dounin 950 June 27, 2011 01:12PM

Re: [PATCH 21 of 31] Fix cpu hog with all upstream servers marked "down"

Oded Arbel 983 August 15, 2011 11:00AM

Re: [PATCH 21 of 31] Fix cpu hog with all upstream servers marked "down"

Maxim Dounin 983 August 15, 2011 12:00PM

Re: [PATCH 21 of 31] Fix cpu hog with all upstream servers marked "down"

lanshun zhou 928 August 15, 2011 01:52PM

Re: [PATCH 21 of 31] Fix cpu hog with all upstream servers marked "down"

Oded Arbel 1011 August 15, 2011 11:10AM

Re: [PATCH 21 of 31] Fix cpu hog with all upstream servers marked "down"

Oded Arbel 1262 August 15, 2011 02:46PM

[PATCH 22 of 31] Cache: correctly set conf_file while adding paths

Maxim Dounin 1168 June 27, 2011 01:12PM

[PATCH 23 of 31] Upstream: fix proxy_store leaving temporary files for subrequests

Maxim Dounin 1265 June 27, 2011 01:12PM

[PATCH 24 of 31] Cache: fix sending of empty responses

Maxim Dounin 1037 June 27, 2011 01:14PM

[PATCH 25 of 31] Cache: fix sending of stale responses

Maxim Dounin 1162 June 27, 2011 01:14PM

[PATCH 26 of 31] Variables: honor no_cacheable for not_found variables

Maxim Dounin 1132 June 27, 2011 01:14PM

[PATCH 27 of 31] Core: protect from subrequest loops

Maxim Dounin 1053 June 27, 2011 01:14PM

[PATCH 28 of 31] Core: resolve various cycles with named locations and post_action

Maxim Dounin 1092 June 27, 2011 01:14PM

[PATCH 29 of 31] Autoindex: escape '?' in file names

Maxim Dounin 981 June 27, 2011 01:14PM

[PATCH 30 of 31] Autoindex: escape html in file names

Maxim Dounin 868 June 27, 2011 01:14PM

[PATCH 31 of 31] Unbreak build with embedded perl and --with-openssl

Maxim Dounin 904 June 27, 2011 01:14PM

Re: [PATCH 00 of 31] generic patch queue for 1.0.4

António P. P. Almeida 943 June 27, 2011 10:10PM

Re: [PATCH 00 of 31] generic patch queue for 1.0.4

Maxim Dounin 1126 June 28, 2011 10:40AM

Re: [PATCH 00 of 31] generic patch queue for 1.0.4

fanboy 976 June 28, 2011 01:48AM

Re: [PATCH 00 of 31] generic patch queue for 1.0.4

Maxim Dounin 1195 June 28, 2011 11:00AM



Sorry, you do not have permission to post/reply in this forum.

Online Users

Guests: 254
Record Number of Users: 8 on April 13, 2023
Record Number of Guests: 421 on December 02, 2018
Powered by nginx      Powered by FreeBSD      PHP Powered      Powered by MariaDB      ipv6 ready