Welcome! Log In Create A New Profile

Advanced

Re: [BUG?]fail_timeout/max_fails: code doesn't do what doc says

Maxim Dounin
May 22, 2013 12:58PM
Hello!

On Tue, May 21, 2013 at 05:51:40PM +0400, Dmitry Popov wrote:

> On Tue, 21 May 2013 17:23:08 +0400
> Maxim Dounin <mdounin@mdounin.ru> wrote:
>
> >
> > This is expected behaviour. Documentation is a bit simplified
> > here, and fail_timeout is used like session time limit - the
> > peer->fails counter is reset once there are no failures within
> > fail_timeout.
> >
> > While this might be non-ideal for some use cases, it's certainly
> > not a bug.
> >
>
> Well, it really hurts. Upstreams which fail in ~1% of requests is not a rare
> case, and we can't use max_fails+fail_timeout for them (because round-robin is
> thrashed for them and ip_hash is completely useless). Moreover, it is very hard
> to debug because of wiki.

Well, in normal world if an upstream constantly fails ~1% of
requests - it's not healthy and should not be used. I
understand that your use case is a bit special though.

> > Such algorithm forget everything about previous failures once per
> > fail_timeout, and won't detect bursts of failures split across
> > two fail_timeout intervals.
> >
> > Consider:
> >
> > - max_fails = 3, fail_timeout = 10s
> > - failure at 0s
> > - failure at 9s
> > - at 10s peer->fails counter is reset
> > - failure at 11s
> > - failure at 12s
> >
> > While 3 failures are only 3 seconds away from each other, this
> > is not detected due to granularity introduced by the algorithm.
>
> Yes, I know this case, sorry, forgot to mention. However, I think it will
> extend detection period to 2-3 fail_timeouts in real life (in theory up to
> max_fails fail_timeouts, yes, but it's almost improbable). If we want correct
> implementation we need per-second array (with fail_timeout elements), that's an
> overkill in my opinion.

Sure, per-second array isn't a solution.

> By the way, leaky bucket approach (like limit_req but
> with fails per second) might work well here, what do you think?

Yes, leaky/token bucket should work. That's actually what I think
about if I think about changing the above algorithm to something
strictly bound to fail_timeout period.

--
Maxim Dounin
http://nginx.org/en/donation.html

_______________________________________________
nginx-devel mailing list
nginx-devel@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx-devel
Subject Author Views Posted

[BUG?]fail_timeout/max_fails: code doesn't do what doc says

Dmitry Popov 830 May 19, 2013 06:10PM

Re: [BUG?]fail_timeout/max_fails: code doesn't do what doc says

Maxim Dounin 401 May 20, 2013 01:12PM

Re: [BUG?]fail_timeout/max_fails: code doesn't do what doc says

Dmitry Popov 445 May 21, 2013 08:20AM

Re: [BUG?]fail_timeout/max_fails: code doesn't do what doc says

Dmitry Popov 380 May 21, 2013 08:48AM

Re: [BUG?]fail_timeout/max_fails: code doesn't do what doc says

Maxim Dounin 535 May 21, 2013 09:24AM

Re: [BUG?]fail_timeout/max_fails: code doesn't do what doc says

Dmitry Popov 422 May 21, 2013 09:56AM

Re: [BUG?]fail_timeout/max_fails: code doesn't do what doc says

Maxim Dounin 450 May 22, 2013 12:58PM



Sorry, you do not have permission to post/reply in this forum.

Online Users

Guests: 163
Record Number of Users: 8 on April 13, 2023
Record Number of Guests: 421 on December 02, 2018
Powered by nginx      Powered by FreeBSD      PHP Powered      Powered by MariaDB      ipv6 ready