Welcome! Log In Create A New Profile

Advanced

Nginx fails on high load on debian 10 vs no problems on debian 9

February 02, 2020 10:25AM
My first post here as we had never any problems with nginx.
We use 5 nginx server as loadbalancers for our spring boot application.

We were running them for years on debian 9 with the default nginx package 1.10.3
Now we switched three of our loadbalancers to debian 10 with nginx 1.14.2

First everything runs smoothly. Then, on high load we encountered some problems. It starts with

2020/02/01 17:10:55 [crit] 5901#5901: *3325390 SSL_write() failed while sending to client, client: ...
2020/02/01 17:10:55 [crit] 5901#5901: *3306981 SSL_write() failed while sending to client, client: ...

In between we get lots of

2020/02/01 17:11:04 [error] 5902#5902: *3318748 upstream timed out (110: Connection timed out) while connecting to upstream, ...
2020/02/01 17:11:04 [crit] 5902#5902: *3305656 SSL_write() failed while sending response to client, client: ...
2020/02/01 17:11:30 [error] 5911#5911: unexpected response for ocsp.int-x3.letsencrypt.org

It ends with
2020/02/01 17:11:33 [error] 5952#5952: unexpected response for ocsp.int-x3.letsencrypt.org

The problem does only exits for 30-120 seconds on high load and disappears afterwards.

In the kernel log we have sometimes:
Feb 1 17:11:04 kt104 kernel: [1033003.285044] TCP: request_sock_TCP: Possible SYN flooding on port 443. Sending cookies. Check SNMP counters.

But on other occasions we don't see any kernel.log messages

On both debian 9 and debian 10 servers we did some identically TCP Tuning

# Kernel tuning settings
# https://www.nginx.com/blog/tuning-nginx/
net.core.rmem_max=26214400
net.core.wmem_max=26214400
net.ipv4.tcp_rmem=4096 524288 26214400
net.ipv4.tcp_wmem=4096 524288 26214400
net.core.somaxconn=1000
net.core.netdev_max_backlog=5000
net.ipv4.tcp_max_syn_backlog=10000
net.ipv4.ip_local_port_range=16000 61000
net.ipv4.tcp_max_tw_buckets=2000000
net.ipv4.tcp_fin_timeout=30
net.core.optmem_max=20480

The nginx config is exactly the same, so I just show some important parts:

user www-data;
worker_processes auto;
worker_rlimit_nofile 50000;
pid /run/nginx.pid;

events {
worker_connections 5000;
multi_accept on;
use epoll;
}

http {
root /var/www/loadbalancer;

sendfile on;
tcp_nopush on;
tcp_nodelay on;
types_hash_max_size 2048;
server_tokens off;
client_max_body_size 5m;

client_header_timeout 20s; # default 60s
client_body_timeout 20s; # default 60s
send_timeout 20s; # default 60s

include /etc/nginx/mime.types;
default_type application/octet-stream;

ssl_protocols TLSv1 TLSv1.1 TLSv1.2; # Dropping SSLv3, ref: POODLE
ssl_session_timeout 1d;
ssl_session_cache shared:SSL:100m;
ssl_buffer_size 4k;
ssl_dhparam /etc/nginx/dhparam.pem;
ssl_prefer_server_ciphers on;
ssl_ciphers 'ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA:ECDHE-RSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-RSA-AES256-SHA256:DHE-RSA-AES256-SHA:ECDHE-ECDSA-DES-CBC3-SHA:ECDHE-RSA-DES-CBC3-SHA:EDH-RSA-DES-CBC3-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA256:AES256-SHA256:AES128-SHA:AES256-SHA:DES-CBC3-SHA:!DSS';

ssl_session_tickets on;
ssl_session_ticket_key /etc/nginx/ssl_session_ticket.key;
ssl_session_ticket_key /etc/nginx/ssl_session_ticket_old.key;

ssl_stapling on;
ssl_stapling_verify on;
ssl_trusted_certificate /etc/ssl/rapidssl/intermediate-root.pem;

resolver 8.8.8.8;

log_format custom '$host $server_port $request_time $upstream_response_time $remote_addr '
'"$http2" "$ssl_session_reused" $upstream_addr $time_iso8601 '
'"$request" $status $body_bytes_sent "$http_referer" "$http_user_agent"';

access_log /var/log/nginx/access.log custom;
error_log /var/log/nginx/error.log;

proxy_set_header Host $http_host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_cache_path /var/cache/nginx/ levels=1:2 keys_zone=imagecache:10m inactive=7d use_temp_path=off;
proxy_connect_timeout 10s;
proxy_read_timeout 20s;
proxy_send_timeout 20s;
proxy_next_upstream off;

map $http_user_agent $outdated {
default 0;
"~MSIE [1-6]\." 1;
"~Mozilla.*Firefox/[1-9]\." 1;
"~Opera.*Version/[0-9]\." 1;
"~Chrome/[0-9]\." 1;
}

include sites/*.conf;
}


The upstream timeout signals some problems with our java machines. But at the same time the debian9 nginx/loadbalancer is running fine and has no problems connecting to any of the upstream servers.
And the problems with letsencrypt and SSL_write are signaling to me some problems with nginx or TCP or whatever.
I really don't know how to debug this situation. But we can reliable reproduce it most of the times we encounter high load on debian10 servers and did never see it on debian 9.

Then I installed the stable version nginx 1.16 on debian10 to see if this is a bug in nginx which is already fixed:

nginx version: nginx/1.16.1
built by gcc 8.3.0 (Debian 8.3.0-6)
built with OpenSSL 1.1.1c 28 May 2019 (running with OpenSSL 1.1.1d 10 Sep 2019)
TLS SNI support enabled
configure arguments: ...

But it didn't help.

Can somebody help me and give me some hints how to start further debugging of this situation?

regards
Janning
Subject Author Posted

Nginx fails on high load on debian 10 vs no problems on debian 9

janning February 02, 2020 10:25AM



Sorry, only registered users may post in this forum.

Click here to login

Online Users

Guests: 70
Record Number of Users: 6 on February 13, 2018
Record Number of Guests: 421 on December 02, 2018
Powered by nginx      Powered by FreeBSD      PHP Powered      Powered by MariaDB      ipv6 ready