Welcome! Log In Create A New Profile

Re: Question about failure and fail-over

Forum List Message List New Topic Print View

Maxim Dounin

July 18, 2013 10:30AM

Hello!

On Thu, Jul 18, 2013 at 07:10:27AM -0400, Branden Visser wrote:

> Hi all, I have a general question about server failure and failover
> within an upstream group to ensure I understand it correctly.
>
> Lets say I have the configuration:
>
> proxy_next_upstream timeout;
> proxy_connect_timeout 5;
> ...
> upstream {
> 127.0.0.1 max_fails=3 fail_timeout=10s
> 127.0.0.2 max_fails=3 fail_timeout=10s
> 127.0.0.3 max_fails=3 fail_timeout=10s
> }
>
> And then the server 127.0.0.1 starts "hanging" indefinitely on
> connection attempts.
>
> a) Once 3 connection attempts timeout after 5 seconds on 127.0.0.1, it
> will be marked down. However, during that 5 second timeout, it is
> possible that 30, or N connections / requests may be in process of
> timing out as well, so you may end up with 30 internal connection
> failures as a result of 127.0.0.1's issue. Although they all are
> retried on the next available upstream, 30 end-users noticed a 5
> second hang in their request as a result of waiting for the timeout to
> occur.

Yep. Use least_conn balancer to mitigate such kind of backend
problems, see http://nginx.org/r/least_conn.

Additionally, it's usually good idea to make sure your backends
return RST on listen queue overflow. On most Linux systems
default seems to be just to drop SYN packets on listen queue
overflow, which will result in an unbound number of connections
waiting for a timeout. Changing
/proc/sys/net/ipv4/tcp_abort_on_overflow might be good idea, see
here for details:

http://man7.org/linux/man-pages/man7/tcp.7.html

> b) After 10 seconds, if the server is still hanging, a) basically
> repeats in the same manner.

No. As of 1.1.6+, only single request will be routed to the
server after fail_timeout. The server will be considered up only
if it will be able to respond to this request.

> Is this correct? If I add "keepalive 64;" into the upstream block,
> does the above scenario change? If a server is marked down as a result
> of no new connections being able to connect, are all persistent
> connections destroyed as well?

Balancing doesn't know anything about cached connections. If a
server is marked down, no attempts to use cached connections to
the server will be made, and eventually all connections to the
server will be replaced with connections to other servers, as per
LRU algorthm.

--
Maxim Dounin
http://nginx.org/en/donation.html

_______________________________________________
nginx mailing list
nginx@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx

Reply Quote

RSS

Subject	Author	Posted
Question about failure and fail-over	Branden Visser	July 18, 2013 07:12AM
Re: Question about failure and fail-over	Maxim Dounin	July 18, 2013 10:30AM

Sorry, only registered users may post in this forum.

Click here to login

Online Users

Guests: 137

Record Number of Users: 8 on April 13, 2023

Record Number of Guests: 421 on December 02, 2018