We're using nginx as a loadbalancer and we're seeing some strange behaviour when one of our backend servers takes a long time to respond to a request.
We have a configuration like this:
upstream handlehttp {
ip_hash;
server XXX max_fails=3 fail_timeout=30s;
server YYY max_fails=3 fail_timeout=30s;
}
server {
location / {
try_files $uri @backend;
}
location @backend {
proxy_pass http://handlehttp;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_next_upstream error timeout invalid_header http_500 http_502 http_503;
proxy_read_timeout 300;
}
}
What we thought we had configured was:
If one backend server fails more than 3 times within 30 seconds it would be considered disabled and all requests sent to the other backend server (the original server getting request after 30 seconds again).
What we're actually seeing is that if a a request takes 300+ seconds, the backend is immediately set as disabled and all further requests are send to the other backend...
Are we missing something or is this the correct behaviour for nginx?