Welcome! Log In Create A New Profile

Advanced

Random TCP Timeouts

September 20, 2022 07:03AM
Hello

I am running about sixty nginx webservers on a Proxmox Cluster (uses KVM). The VMs are running the most recent version of Debian 11. They use nginx, different versions of PHP-FPM and MariaDB. I follow a infrastructure-as-code-approch, all the servers with some exceptions in our infrastrucure like jitsi are provisioned by ansible and therefore nearly identical. I host standard Typo3-Installations as well as more complex applications usually developed in Laravel. Some time ago I started configuring active checks in CheckMK (the checks use a plugin from Nagios called check_http) for our infrastructure and projects of our customers which had gone live, meaning accessible from the outside.

After this I started getting timeout errors at the frequency of about one or two a day spread seemingly at random accross the twenty servers which I monitor with active checks. At first I considered this to be simply false positives, but last Friday it happened during a Jitsi Conference and was reported to me by a colleague. I checked the logs of nginx and found the following entries for the exact time period in which Checkmk couldn't reach the server, which is less than one minute (time between checks, the next check is always negative, meaning no errors) and probably only a few seconds. What is the cause of this problem and how can I fix it? Do you have a suggestion how I could reproduce and then further analyze the problem?

Checkmk-Error-Message:

Summary connect to address 195.34.XXX.XXX and port 443: Connection refused Details HTTP CRITICAL - Unable to open TCP socket

Checkmk-Recovery-Message:

Summary HTTP OK: HTTP/1.1 200 OK - 59404 bytes in 0.008 second response time

Nginx error log:

2022/09/16 11:18:42 [alert] 3212994#3212994: *2590 open socket #18 left in connection 5
2022/09/16 11:18:42 [alert] 3212994#3212994: *2494 open socket #15 left in connection 8
2022/09/16 11:18:42 [alert] 3212994#3212994: *2533 open socket #16 left in connection 9
2022/09/16 11:18:42 [alert] 3212994#3212994: *2534 open socket #17 left in connection 10
2022/09/16 11:18:42 [alert] 3212994#3212994: *2591 open socket #20 left in connection 11
2022/09/16 11:18:42 [alert] 3212994#3212994: *2573 open socket #24 left in connection 12
2022/09/16 11:18:42 [alert] 3212994#3212994: *2532 open socket #10 left in connection 13
2022/09/16 11:18:42 [alert] 3212994#3212994: *3230 open socket #28 left in connection 14
2022/09/16 11:18:42 [alert] 3212994#3212994: *2467 open socket #19 left in connection 15
2022/09/16 11:18:42 [alert] 3212994#3212994: *2535 open socket #21 left in connection 16
2022/09/16 11:18:42 [alert] 3212994#3212994: *3233 open socket #27 left in connection 17
2022/09/16 11:18:42 [alert] 3212994#3212994: *2771 open socket #30 left in connection 22
2022/09/16 11:18:42 [alert] 3212994#3212994: *2770 open socket #29 left in connection 23
2022/09/16 11:18:42 [alert] 3212994#3212994: *3234 open socket #22 left in connection 24
2022/09/16 11:18:42 [alert] 3212994#3212994: *3229 open socket #11 left in connection 26
2022/09/16 11:18:42 [alert] 3212994#3212994: *3231 open socket #32 left in connection 28
2022/09/16 11:18:42 [alert] 3212994#3212994: aborting
2022/09/16 11:20:19 [error] 3295994#3295994: *153 upstream timed out (110: Connection timed out) while reading response>

Yours sincerely

Stefan Malte Schumacher
Subject Author Posted

Random TCP Timeouts

s.schumacher September 20, 2022 07:03AM



Sorry, only registered users may post in this forum.

Click here to login

Online Users

Guests: 105
Record Number of Users: 6 on February 13, 2018
Record Number of Guests: 421 on December 02, 2018
Powered by nginx      Powered by FreeBSD      PHP Powered      Powered by MariaDB      ipv6 ready