Welcome! Log In Create A New Profile

Advanced

Nginx - 56 day old reverse-proxy suddenly unable to connect upstream.

February 21, 2020 04:19PM
I have nginx configured as a reverse proxy to Amazon's AWS IoT MQTT service. This was functioning well for almost 2 months, when suddenly 20 out of 32 instances of this stopped being able to connect upstream. We started seeing sporadic upstream SSL connection errors, followed by sporadic upstream connection refused, and then finally, mostly connection timeouts to upstream. Nothing short of a restart or reload of Nginx fixes this. Debug logging is not enabled, and trying to enable it replaces the worker processes, and effectively ends the issue. Over the next 3 days, the remaining nodes started exhibiting this problem as well. Rather than restarting nginx on these remaining nodes, I isolated them for study, and stood up new nodes to replace them.

But in studying these, I cannot find any indicator as to why this is happening. Now that these have been removed from client traffic, and I can test with curl's... I can hit one of these 5 times, and by the 5th call, I get a repro. Connection timeout to the upstream, resulting in a timeout to me.

==========================================================
Here is the version information for nginx, as it comes from Ubuntu 18.04:
nginx version: nginx/1.14.0 (Ubuntu)
built with OpenSSL 1.1.1 11 Sep 2018
TLS SNI support enabled
configure arguments: --with-cc-opt='-g -O2 -fdebug-prefix-map=/build/nginx-GkiujU/nginx-1.14.0=. -fstack-protector-strong -Wformat -Werror=format-security -fPIC -Wdate-time -D_FORTIFY_SOURCE=2' --with-ld-opt='-Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-z,now -fPIC' --prefix=/usr/share/nginx --conf-path=/etc/nginx/nginx.conf --http-log-path=/var/log/nginx/access.log --error-log-path=/var/log/nginx/error.log --lock-path=/var/lock/nginx.lock --pid-path=/run/nginx.pid --modules-path=/usr/lib/nginx/modules --http-client-body-temp-path=/var/lib/nginx/body --http-fastcgi-temp-path=/var/lib/nginx/fastcgi --http-proxy-temp-path=/var/lib/nginx/proxy --http-scgi-temp-path=/var/lib/nginx/scgi --http-uwsgi-temp-path=/var/lib/nginx/uwsgi --with-debug --with-pcre-jit --with-http_ssl_module --with-http_stub_status_module --with-http_realip_module --with-http_auth_request_module --with-http_v2_module --with-http_dav_module --with-http_slice_module --with-threads --with-http_addition_module --with-http_geoip_module=dynamic --with-http_gunzip_module --with-http_gzip_static_module --with-http_image_filter_module=dynamic --with-http_sub_module --with-http_xslt_module=dynamic --with-stream=dynamic --with-stream_ssl_module --with-mail=dynamic --with-mail_ssl_module

==========================================================
nginx.conf:
user www-data;
worker_processes auto;
pid /run/nginx.pid;
include /etc/nginx/modules-enabled/*.conf;
worker_rlimit_nofile 30500;

events {
worker_connections 10000;
# multi_accept on;
}

http {
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
types_hash_max_size 2048;

include /etc/nginx/mime.types;
default_type application/octet-stream;

#IPV6 also disabled via kernel boot option and sysctl, too.
#Couldn't get nginx to stop AAAA lookups without doing that.
resolver 8.8.8.8 8.8.4.4 valid=3s ipv6=off;
resolver_timeout 10;
# enable reverse proxy
proxy_redirect off;
proxy_set_header Host CENSORED.amazonaws.com;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwared-For $proxy_add_x_forwarded_for;

ssl_protocols TLSv1 TLSv1.1 TLSv1.2; # Dropping SSLv3, ref: POODLE
ssl_prefer_server_ciphers on;

access_log /var/log/nginx/access.log;
error_log /var/log/nginx/error.log error;

gzip on;

# Nginx-lua-prometheus
# Prometheus metric library for Nginx
lua_shared_dict prometheus_metrics 10M;
lua_package_path "/etc/nginx/nginx-lua-prometheus/?.lua";
init_by_lua '
prometheus = require("prometheus").init("prometheus_metrics")
metric_requests = prometheus:counter(
"nginx_http_requests_total", "Number of HTTP requests", {"host", "status"})
metric_latency = prometheus:histogram(
"nginx_http_request_duration_seconds", "HTTP request latency", {"host"})
metric_connections = prometheus:gauge(
"nginx_http_connections", "Number of HTTP connections", {"state"})
';
log_by_lua '
metric_requests:inc(1, {ngx.var.server_name, ngx.var.status})
metric_latency:observe(tonumber(ngx.var.request_time), {ngx.var.server_name})
';

include /etc/nginx/conf.d/*.conf;
include /etc/nginx/sites-enabled/*;
}

==========================================================
iot-proxy config file:
# Define group of backend / upstream servers:
upstream iot-backend
{
server CENSORED.amazonaws.com:443;
}

server
{
#listen 443 default ssl;
listen 443 ssl;
server_name CENSORED.something.com;

ssl_session_cache shared:SSL:1m;
ssl_session_timeout 86400;
ssl_certificate /etc/nginx/ssl/CENSORED.crt;
ssl_certificate_key /etc/nginx/ssl/CENSORED.key;
ssl_verify_client off;
ssl_protocols SSLv3 TLSv1 TLSv1.1 TLSv1.2;
ssl_ciphers RC4:HIGH:!aNULL:!MD5;
ssl_prefer_server_ciphers on;

location /
{
proxy_pass https://iot-backend;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host "CENSORED.amazonaws.com:443";
proxy_read_timeout 86400;
proxy_ssl_session_reuse off;
}
}

==========================================================
nginx-lua-prometheus config file:
server {
listen 9145;
allow 0.0.0.0/0;
allow 127.0.0.1/32;
deny all;
location /metrics {
content_by_lua '
metric_connections:set(ngx.var.connections_reading, {"reading"})
metric_connections:set(ngx.var.connections_waiting, {"waiting"})
metric_connections:set(ngx.var.connections_writing, {"writing"})
prometheus:collect()
';
}
}
Subject Author Posted

Nginx - 56 day old reverse-proxy suddenly unable to connect upstream.

bdarbro February 21, 2020 04:19PM

Re: Nginx - 56 day old reverse-proxy suddenly unable to connect upstream.

Sergey A. Osokin February 21, 2020 04:42PM

Re: Nginx - 56 day old reverse-proxy suddenly unable to connect upstream.

bdarbro February 21, 2020 04:46PM

Re: Nginx - 56 day old reverse-proxy suddenly unable to connect upstream.

bdarbro February 21, 2020 04:52PM

Re: Nginx - 56 day old reverse-proxy suddenly unable to connect upstream.

J.R. February 22, 2020 09:40AM

Re: Nginx - 56 day old reverse-proxy suddenly unable to connect upstream.

bdarbro February 24, 2020 12:15PM



Sorry, only registered users may post in this forum.

Click here to login

Online Users

Guests: 100
Record Number of Users: 6 on February 13, 2018
Record Number of Guests: 421 on December 02, 2018
Powered by nginx      Powered by FreeBSD      PHP Powered      Powered by MariaDB      ipv6 ready