January 23, 2012 06:00PM
gtuhl Wrote:
-------------------------------------------------------
> Initially we were seeing a ton of "connect()
> failed (110: Connection timed out)", 1 every
> couple seconds. I added these to sysctl.conf and
> that seemed to solve the problem:
>
> net.ipv4.tcp_syncookies = 1
> net.ipv4.tcp_fin_timeout = 20
> net.ipv4.tcp_max_syn_backlog = 20480
> net.core.netdev_max_backlog = 4096
> net.ipv4.tcp_max_tw_buckets = 400000
> net.core.somaxconn = 4096
>
> Now things generally run fine but every once in
> awhile we get a huge burst of "upstream
> prematurely closed connection while reading
> response header from upstream" followed by a "no
> live upstreams". Again, no apparent load on the
> machines involved. These bursts only last a
> minute or so. We also still get an occasional
> "connect() failed (110: Connection timed out)" but
> they are far less frequent, perhaps 1 or 2 per
> hour.
>

On looking at this again recently, we made two adjustments that eliminated the connection issues completely:

net.nf_conntrack_max = 262144
net.ipv4.ip_local_port_range = 1024 65000

After making those two changes things became quite stable. However, we still have massive numbers of TIME_WAIT connections both on the nginx machine and on the upstream apache machines.

The nginx machine is accepting roughly 1000 requests/s, and has 40,000 connections in TIME_WAIT.
The apache machines are each accepting roughly 250 requests/s, and have 15,000 connections in TIME_WAIT.

We tried setting net.ipv4.tcp_tw_reuse to 1 and restarting networking. That did not cause any trouble, but also didn't drop the TIME_WAIT count. I have read that net.ipv4.tcp_tw_recycle is dangerous but we may try that if others have had good experiences.

Is there a way to have these cleaned up more quickly? My concern is that even with the expanded ip_local_port_range 40k is cutting it rather close. Before we bumped ip_local_port_range the whole system was falling down right as the TIME_WAIT count approached 32k. Is it normal for nginx to cause this many TIME_WAIT connections? If we're only doing 1k requests/s and nearly exhausting the available port range what would sites with heavier volume do?
Subject Author Posted

Nginx as Load Balancer Connection Issues

gtuhl January 06, 2012 04:49PM

Re: Nginx as Load Balancer Connection Issues

gtuhl January 23, 2012 06:00PM

Re: Nginx as Load Balancer Connection Issues

ggrensteiner January 24, 2012 12:59PM

Re: Nginx as Load Balancer Connection Issues

Andrey Korolyov January 24, 2012 01:14PM

Re: Nginx as Load Balancer Connection Issues

gtuhl January 24, 2012 01:23PM

Re: Nginx as Load Balancer Connection Issues

ggrensteiner January 25, 2012 02:36PM

Re: Nginx as Load Balancer Connection Issues

gtuhl March 20, 2012 05:33PM

Re: Nginx as Load Balancer Connection Issues

ressaid January 25, 2012 06:24PM

Re: Nginx as Load Balancer Connection Issues

David Yu March 20, 2012 05:48PM

Re: Nginx as Load Balancer Connection Issues

Alexandr Gomoliako March 20, 2012 05:44PM

Re: Nginx as Load Balancer Connection Issues

gtuhl March 21, 2012 08:56AM

Re: Nginx as Load Balancer Connection Issues

gtuhl March 28, 2012 10:27AM

Re: Nginx as Load Balancer Connection Issues

gtuhl April 30, 2012 09:26PM

Re: Nginx as Load Balancer Connection Issues

Andrey Belov May 01, 2012 01:50AM



Sorry, only registered users may post in this forum.

Click here to login

Online Users

Guests: 127
Record Number of Users: 8 on April 13, 2023
Record Number of Guests: 421 on December 02, 2018
Powered by nginx      Powered by FreeBSD      PHP Powered      Powered by MariaDB      ipv6 ready