Maxim Dounin
August 15, 2011 12:00PM
Hello!

On Mon, Aug 15, 2011 at 02:59:36PM -0000, Oded Arbel wrote:

> Regarding the above mentioned patch (also quoted below), I
> wanted to provide feedback on this:
>
> On my system, we have several reverse proxy servers running
> Nginx and forwarding requests to upstream. Our configuration
> looks like this:
> upstream trc {
> server prod2-f1:10213 max_fails=500 fail_timeout=30s;
> server prod2-f2:10213 max_fails=500 fail_timeout=30s;
> ...
> server 127.0.0.1:10213 backup;
> ip_hash;

Ip hash balancer doesn't support "backup" servers (and it will
complain loudly if you place "ip_hash" before servers). Could you
please check if you still see the problem after removing backup
server?

> }
>
> We've noticed that every once in a while (about 5-10 times a
> week) one of the servers gets into a state where an Nginx worker
> starts eating 100% CPU and timing out on requests. I've applied
> the aforementioned patch to our Nginx installation (release
> 1.0.0 with the Nginx_Upstream_Hash patch) and deployed to our

You mean the one from Evan Miller's upstream hash module, as
available at http://wiki.nginx.org/HttpUpstreamRequestHashModule?

> production servers. After a few hours, we started having the
> Nginx workers on all the servers eat 100% CPU.
>
> Connecting with gdb to one of the problematic worker I got this
> backtrace:
> #0 0x000000000044a650 in ngx_http_upstream_get_round_robin_peer ()
> #1 0x00000000004253dc in ngx_event_connect_peer ()
> #2 0x0000000000448618 in ngx_http_upstream_connect ()
> #3 0x0000000000448e10 in ngx_http_upstream_process_header ()
> #4 0x00000000004471fb in ngx_http_upstream_handler ()
> #5 0x00000000004247fa in ngx_event_expire_timers ()
> #6 0x00000000004246ed in ngx_process_events_and_timers ()
> #7 0x000000000042a048 in ngx_worker_process_cycle ()
> #8 0x00000000004287e0 in ngx_spawn_process ()
> #9 0x000000000042963c in ngx_start_worker_processes ()
> #10 0x000000000042a5d5 in ngx_master_process_cycle ()
> #11 0x0000000000410adf in main ()
>
> I then tried tracing through the running worker using the GDB
> command "next", which said:
> Single stepping until exit from function
> ngx_http_upstream_get_round_robin_peer
>
> And never returned until I got fed up and broke it.
>
> I finally reverted the patch and restarted the service, and
> continue to get this behavior. So my conclusion is that for my
> specific problem, this patch does not solve it.

Your problem is different from one the patch is intended to solve.
The patch solves one (and only one) problem where all servers are
marked "down" in config, clearly not the case you have.

Maxim Dounin

_______________________________________________
nginx-devel mailing list
nginx-devel@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx-devel
Subject Author Views Posted

[PATCH 00 of 31] generic patch queue for 1.0.4

Maxim Dounin 3875 June 27, 2011 01:10PM

[PATCH 01 of 31] Cache: fix another "stalled cache updating" alert

Maxim Dounin 1027 June 27, 2011 01:10PM

Re: [PATCH 01 of 31] Cache: fix another "stalled cache updating" alert

Kirill A. Korinskiy 999 June 27, 2011 02:06PM

Re: [PATCH 01 of 31] Cache: fix another "stalled cache updating" alert

Kirill A. Korinskiy 1018 June 27, 2011 03:00PM

[PATCH 02 of 31] Fastcgi: fix fastcgi_param with "HTTP_"

Maxim Dounin 1042 June 27, 2011 01:10PM

[PATCH 03 of 31] Bugfix: https wasn't working on systems with 32-bit off_t

Maxim Dounin 1182 June 27, 2011 01:10PM

[PATCH 04 of 31] Upstream: fix request finalization if client timed out

Maxim Dounin 955 June 27, 2011 01:10PM

[PATCH 05 of 31] Upstream: properly allocate memory for tried flags

Maxim Dounin 1051 June 27, 2011 01:10PM

[PATCH 06 of 31] Complain on invalid log levels

Maxim Dounin 1281 June 27, 2011 01:10PM

[PATCH 07 of 31] Fix incorrect 201 replies from dav module

Maxim Dounin 1052 June 27, 2011 01:10PM

[PATCH 08 of 31] Fix double content when return is used in error_page redirection

Maxim Dounin 1122 June 27, 2011 01:10PM

[PATCH 09 of 31] Drop incorrect special case for return 204

Maxim Dounin 1084 June 27, 2011 01:10PM

[PATCH 10 of 31] Clear old Location header (if any) while adding new one

Maxim Dounin 1058 June 27, 2011 01:10PM

[PATCH 11 of 31] Better handle various per-server ssl options with SNI

Maxim Dounin 1201 June 27, 2011 01:10PM

[PATCH 12 of 31] Better handle late upstream creation

Maxim Dounin 971 June 27, 2011 01:12PM

[PATCH 13 of 31] Gzip filter: handle empty flush buffers

Maxim Dounin 1126 June 27, 2011 01:12PM

[PATCH 14 of 31] Fix connection drops with AIO

Maxim Dounin 908 June 27, 2011 01:12PM

[PATCH 15 of 31] Fix socket leak with "aio sendfile" and "limit_rate" directives

Maxim Dounin 1128 June 27, 2011 01:12PM

[PATCH 16 of 31] Correctly handle Content-Encoding set from perl

Maxim Dounin 905 June 27, 2011 01:12PM

[PATCH 17 of 31] Gzip static: "always" parameter in "gzip_static" directive

Maxim Dounin 1093 June 27, 2011 01:12PM

Re: [PATCH 17 of 31] Gzip static: "always" parameter in "gzip_static" directive

Zhu Qun-Ying 963 June 27, 2011 02:02PM

Re: [PATCH 17 of 31] Gzip static: "always" parameter in "gzip_static" directive

Maxim Dounin 1068 June 28, 2011 06:34AM

[PATCH 18 of 31] Memcached: memcached_gzip_flag directive

Maxim Dounin 1020 June 27, 2011 01:12PM

[PATCH 19 of 31] Mail: handle smtp multiline replies

Maxim Dounin 1005 June 27, 2011 01:12PM

[PATCH 20 of 31] Additional headers for proxy_ignore_headers/fastcgi_ignore_headers

Maxim Dounin 1125 June 27, 2011 01:12PM

[PATCH 21 of 31] Fix cpu hog with all upstream servers marked "down"

Maxim Dounin 954 June 27, 2011 01:12PM

Re: [PATCH 21 of 31] Fix cpu hog with all upstream servers marked "down"

Oded Arbel 990 August 15, 2011 11:00AM

Re: [PATCH 21 of 31] Fix cpu hog with all upstream servers marked "down"

Maxim Dounin 988 August 15, 2011 12:00PM

Re: [PATCH 21 of 31] Fix cpu hog with all upstream servers marked "down"

lanshun zhou 932 August 15, 2011 01:52PM

Re: [PATCH 21 of 31] Fix cpu hog with all upstream servers marked "down"

Oded Arbel 1018 August 15, 2011 11:10AM

Re: [PATCH 21 of 31] Fix cpu hog with all upstream servers marked "down"

Oded Arbel 1267 August 15, 2011 02:46PM

[PATCH 22 of 31] Cache: correctly set conf_file while adding paths

Maxim Dounin 1171 June 27, 2011 01:12PM

[PATCH 23 of 31] Upstream: fix proxy_store leaving temporary files for subrequests

Maxim Dounin 1270 June 27, 2011 01:12PM

[PATCH 24 of 31] Cache: fix sending of empty responses

Maxim Dounin 1044 June 27, 2011 01:14PM

[PATCH 25 of 31] Cache: fix sending of stale responses

Maxim Dounin 1167 June 27, 2011 01:14PM

[PATCH 26 of 31] Variables: honor no_cacheable for not_found variables

Maxim Dounin 1139 June 27, 2011 01:14PM

[PATCH 27 of 31] Core: protect from subrequest loops

Maxim Dounin 1059 June 27, 2011 01:14PM

[PATCH 28 of 31] Core: resolve various cycles with named locations and post_action

Maxim Dounin 1098 June 27, 2011 01:14PM

[PATCH 29 of 31] Autoindex: escape '?' in file names

Maxim Dounin 986 June 27, 2011 01:14PM

[PATCH 30 of 31] Autoindex: escape html in file names

Maxim Dounin 873 June 27, 2011 01:14PM

[PATCH 31 of 31] Unbreak build with embedded perl and --with-openssl

Maxim Dounin 909 June 27, 2011 01:14PM

Re: [PATCH 00 of 31] generic patch queue for 1.0.4

António P. P. Almeida 949 June 27, 2011 10:10PM

Re: [PATCH 00 of 31] generic patch queue for 1.0.4

Maxim Dounin 1132 June 28, 2011 10:40AM

Re: [PATCH 00 of 31] generic patch queue for 1.0.4

fanboy 980 June 28, 2011 01:48AM

Re: [PATCH 00 of 31] generic patch queue for 1.0.4

Maxim Dounin 1202 June 28, 2011 11:00AM



Sorry, you do not have permission to post/reply in this forum.

Online Users

Guests: 165
Record Number of Users: 8 on April 13, 2023
Record Number of Guests: 421 on December 02, 2018
Powered by nginx      Powered by FreeBSD      PHP Powered      Powered by MariaDB      ipv6 ready