Welcome! Log In Create A New Profile

Advanced

Re: [PATCH 21 of 31] Fix cpu hog with all upstream servers marked "down"

Maxim Dounin
August 15, 2011 12:00PM
Hello!

On Mon, Aug 15, 2011 at 02:59:36PM -0000, Oded Arbel wrote:

> Regarding the above mentioned patch (also quoted below), I
> wanted to provide feedback on this:
>
> On my system, we have several reverse proxy servers running
> Nginx and forwarding requests to upstream. Our configuration
> looks like this:
> upstream trc {
> server prod2-f1:10213 max_fails=500 fail_timeout=30s;
> server prod2-f2:10213 max_fails=500 fail_timeout=30s;
> ...
> server 127.0.0.1:10213 backup;
> ip_hash;

Ip hash balancer doesn't support "backup" servers (and it will
complain loudly if you place "ip_hash" before servers). Could you
please check if you still see the problem after removing backup
server?

> }
>
> We've noticed that every once in a while (about 5-10 times a
> week) one of the servers gets into a state where an Nginx worker
> starts eating 100% CPU and timing out on requests. I've applied
> the aforementioned patch to our Nginx installation (release
> 1.0.0 with the Nginx_Upstream_Hash patch) and deployed to our

You mean the one from Evan Miller's upstream hash module, as
available at http://wiki.nginx.org/HttpUpstreamRequestHashModule?

> production servers. After a few hours, we started having the
> Nginx workers on all the servers eat 100% CPU.
>
> Connecting with gdb to one of the problematic worker I got this
> backtrace:
> #0 0x000000000044a650 in ngx_http_upstream_get_round_robin_peer ()
> #1 0x00000000004253dc in ngx_event_connect_peer ()
> #2 0x0000000000448618 in ngx_http_upstream_connect ()
> #3 0x0000000000448e10 in ngx_http_upstream_process_header ()
> #4 0x00000000004471fb in ngx_http_upstream_handler ()
> #5 0x00000000004247fa in ngx_event_expire_timers ()
> #6 0x00000000004246ed in ngx_process_events_and_timers ()
> #7 0x000000000042a048 in ngx_worker_process_cycle ()
> #8 0x00000000004287e0 in ngx_spawn_process ()
> #9 0x000000000042963c in ngx_start_worker_processes ()
> #10 0x000000000042a5d5 in ngx_master_process_cycle ()
> #11 0x0000000000410adf in main ()
>
> I then tried tracing through the running worker using the GDB
> command "next", which said:
> Single stepping until exit from function
> ngx_http_upstream_get_round_robin_peer
>
> And never returned until I got fed up and broke it.
>
> I finally reverted the patch and restarted the service, and
> continue to get this behavior. So my conclusion is that for my
> specific problem, this patch does not solve it.

Your problem is different from one the patch is intended to solve.
The patch solves one (and only one) problem where all servers are
marked "down" in config, clearly not the case you have.

Maxim Dounin

_______________________________________________
nginx-devel mailing list
nginx-devel@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx-devel
Subject Author Views Posted

[PATCH 00 of 31] generic patch queue for 1.0.4

Maxim Dounin 3869 June 27, 2011 01:10PM

[PATCH 01 of 31] Cache: fix another "stalled cache updating" alert

Maxim Dounin 1022 June 27, 2011 01:10PM

Re: [PATCH 01 of 31] Cache: fix another "stalled cache updating" alert

Kirill A. Korinskiy 996 June 27, 2011 02:06PM

Re: [PATCH 01 of 31] Cache: fix another "stalled cache updating" alert

Kirill A. Korinskiy 1012 June 27, 2011 03:00PM

[PATCH 02 of 31] Fastcgi: fix fastcgi_param with "HTTP_"

Maxim Dounin 1037 June 27, 2011 01:10PM

[PATCH 03 of 31] Bugfix: https wasn't working on systems with 32-bit off_t

Maxim Dounin 1178 June 27, 2011 01:10PM

[PATCH 04 of 31] Upstream: fix request finalization if client timed out

Maxim Dounin 949 June 27, 2011 01:10PM

[PATCH 05 of 31] Upstream: properly allocate memory for tried flags

Maxim Dounin 1046 June 27, 2011 01:10PM

[PATCH 06 of 31] Complain on invalid log levels

Maxim Dounin 1275 June 27, 2011 01:10PM

[PATCH 07 of 31] Fix incorrect 201 replies from dav module

Maxim Dounin 1046 June 27, 2011 01:10PM

[PATCH 08 of 31] Fix double content when return is used in error_page redirection

Maxim Dounin 1119 June 27, 2011 01:10PM

[PATCH 09 of 31] Drop incorrect special case for return 204

Maxim Dounin 1080 June 27, 2011 01:10PM

[PATCH 10 of 31] Clear old Location header (if any) while adding new one

Maxim Dounin 1051 June 27, 2011 01:10PM

[PATCH 11 of 31] Better handle various per-server ssl options with SNI

Maxim Dounin 1198 June 27, 2011 01:10PM

[PATCH 12 of 31] Better handle late upstream creation

Maxim Dounin 965 June 27, 2011 01:12PM

[PATCH 13 of 31] Gzip filter: handle empty flush buffers

Maxim Dounin 1121 June 27, 2011 01:12PM

[PATCH 14 of 31] Fix connection drops with AIO

Maxim Dounin 905 June 27, 2011 01:12PM

[PATCH 15 of 31] Fix socket leak with "aio sendfile" and "limit_rate" directives

Maxim Dounin 1123 June 27, 2011 01:12PM

[PATCH 16 of 31] Correctly handle Content-Encoding set from perl

Maxim Dounin 901 June 27, 2011 01:12PM

[PATCH 17 of 31] Gzip static: "always" parameter in "gzip_static" directive

Maxim Dounin 1088 June 27, 2011 01:12PM

Re: [PATCH 17 of 31] Gzip static: "always" parameter in "gzip_static" directive

Zhu Qun-Ying 959 June 27, 2011 02:02PM

Re: [PATCH 17 of 31] Gzip static: "always" parameter in "gzip_static" directive

Maxim Dounin 1064 June 28, 2011 06:34AM

[PATCH 18 of 31] Memcached: memcached_gzip_flag directive

Maxim Dounin 1015 June 27, 2011 01:12PM

[PATCH 19 of 31] Mail: handle smtp multiline replies

Maxim Dounin 1002 June 27, 2011 01:12PM

[PATCH 20 of 31] Additional headers for proxy_ignore_headers/fastcgi_ignore_headers

Maxim Dounin 1121 June 27, 2011 01:12PM

[PATCH 21 of 31] Fix cpu hog with all upstream servers marked "down"

Maxim Dounin 951 June 27, 2011 01:12PM

Re: [PATCH 21 of 31] Fix cpu hog with all upstream servers marked "down"

Oded Arbel 984 August 15, 2011 11:00AM

Re: [PATCH 21 of 31] Fix cpu hog with all upstream servers marked "down"

Maxim Dounin 983 August 15, 2011 12:00PM

Re: [PATCH 21 of 31] Fix cpu hog with all upstream servers marked "down"

lanshun zhou 929 August 15, 2011 01:52PM

Re: [PATCH 21 of 31] Fix cpu hog with all upstream servers marked "down"

Oded Arbel 1012 August 15, 2011 11:10AM

Re: [PATCH 21 of 31] Fix cpu hog with all upstream servers marked "down"

Oded Arbel 1262 August 15, 2011 02:46PM

[PATCH 22 of 31] Cache: correctly set conf_file while adding paths

Maxim Dounin 1168 June 27, 2011 01:12PM

[PATCH 23 of 31] Upstream: fix proxy_store leaving temporary files for subrequests

Maxim Dounin 1265 June 27, 2011 01:12PM

[PATCH 24 of 31] Cache: fix sending of empty responses

Maxim Dounin 1038 June 27, 2011 01:14PM

[PATCH 25 of 31] Cache: fix sending of stale responses

Maxim Dounin 1163 June 27, 2011 01:14PM

[PATCH 26 of 31] Variables: honor no_cacheable for not_found variables

Maxim Dounin 1133 June 27, 2011 01:14PM

[PATCH 27 of 31] Core: protect from subrequest loops

Maxim Dounin 1054 June 27, 2011 01:14PM

[PATCH 28 of 31] Core: resolve various cycles with named locations and post_action

Maxim Dounin 1093 June 27, 2011 01:14PM

[PATCH 29 of 31] Autoindex: escape '?' in file names

Maxim Dounin 982 June 27, 2011 01:14PM

[PATCH 30 of 31] Autoindex: escape html in file names

Maxim Dounin 869 June 27, 2011 01:14PM

[PATCH 31 of 31] Unbreak build with embedded perl and --with-openssl

Maxim Dounin 905 June 27, 2011 01:14PM

Re: [PATCH 00 of 31] generic patch queue for 1.0.4

António P. P. Almeida 944 June 27, 2011 10:10PM

Re: [PATCH 00 of 31] generic patch queue for 1.0.4

Maxim Dounin 1127 June 28, 2011 10:40AM

Re: [PATCH 00 of 31] generic patch queue for 1.0.4

fanboy 976 June 28, 2011 01:48AM

Re: [PATCH 00 of 31] generic patch queue for 1.0.4

Maxim Dounin 1195 June 28, 2011 11:00AM



Sorry, you do not have permission to post/reply in this forum.

Online Users

Guests: 250
Record Number of Users: 8 on April 13, 2023
Record Number of Guests: 421 on December 02, 2018
Powered by nginx      Powered by FreeBSD      PHP Powered      Powered by MariaDB      ipv6 ready