Welcome! Log In Create A New Profile

Advanced

5xx errors and "98: Address already in use" under heavy load

Posted by jarstewa 
5xx errors and "98: Address already in use" under heavy load
June 27, 2019 07:02PM
When running load tests against an Nginx server, I seen a failure mode that results in Nginx returning 5xx errors, and the error log is filled with messages like:

> [crit] 140#0: *1572276 bind(0.0.0.0) failed (98: Address already in use) while connecting to upstream

My theories for what might be happening here were:
1) File handles exhausted
2) Ephemeral ports or sockets exhausted
3) Nginx crashed, came back up and tried to re-bind to the same port

For 1), I think we should see `24: Too many open files` according to https://blog.serverdensity.com/troubleshoot-nginx/
For 2), I think we should see `99: Cannot assign requested address` according to https://www.nginx.com/blog/overcoming-ephemeral-port-exhaustion-nginx-plus/
If 3) happened, I think we would have seen health check failures from our external load balancer sitting in front of nginx (which we did not).

Note that our health check is implemented in Nginx as:
> location = /health {
> return 204;
> }

So I’m guessing the reason the health check did not fail is that the health check is not trying to open any connections to the upstream server. I think this makes 3) seem less likely, since if the nginx process crashed I think the health checks would have failed as well.

Does anyone have any insight into what's happening here, or how to diagnose further?

Thanks!



Edited 1 time(s). Last edit at 06/27/2019 07:03PM by jarstewa.
Sorry, only registered users may post in this forum.

Click here to login

Online Users

Guests: 169
Record Number of Users: 8 on April 13, 2023
Record Number of Guests: 421 on December 02, 2018
Powered by nginx      Powered by FreeBSD      PHP Powered      Powered by MariaDB      ipv6 ready