Hi All,
We started seeing an odd issue on 1 specific server recently where Nginx started using dramatically more CPU power to serve each request. I haven't been able to figure out what specifically in Nginx is causing this increase so I'm looking for some help debugging that. Let's go over some details.
I have some screenshots from my monitoring software which show the effects of the issue really clearly. Here are the CPU and Nginx requests graphs.
https://drive.google.com/file/d/1-3kBKfUU7ouc9lY2iJSdAuen5yA3a8zO/view
You can see the issue started on the 7th and the CPU was maxed out for about 48 hours. This server serves a news site with almost no user interaction. It's a high traffic site that people view and that's it. Our cache settings are high as well so it's almost entirely serving cached/static files. Meaning this CPU usage is not caused by PHP, Mariadb, or anything else. It's entirely used by the Nginx worker processes themselves.
Our first thought was that there was an increase in traffic or some kind of attack. But notice that the number of requests handled by Nginx actually goes down during that 48 hour period by about 1/3. The network graphs show the exact same thing. So this is definitely not an increase in traffic in any way.
https://drive.google.com/file/d/1-7NyqYA53123h3Q_nA7cKawWmsv_WQ11/view
We also didn't detect any noticeable difference in traffic patterns, types of traffic, types of users, etc. That leads us to the 9th where the CPU usage goes back to normal. We ended up getting really creative and found a few ways to reduce the traffic hitting this origin server while still serving all the traffic to the real users. Essentially, we artificially reduced the traffic to this origin server
But notice the drop in traffic levels. The effect of all of this is that Nginx is using 5x more CPU power to serve each request than it did before. That's what I'm trying to figure out exactly. What is causing Nginx to use so much CPU power to serve each request now?
We have a custom build of Nginx and there were no changes to it at the time. No OS updates either.
But we've made some changes to try to solve it. We've done things like updated Nginx (we do that regularly anyway), disabled things like Gzip and Brotli, and made various configuration changes as well. Nothing has made any impact at all.
So here I am asking for some help!
Is there any way that anyone can think of to figure out what is causing the CPU usage of Nginx to be so high?
Does anyone have a feeling about which modules could be causing this problem? I can disable any of them except for NJS because its core to our app.
And if you think the problem may be in NJS, is there a good way to find out what specifically in NJS is causing the problem?
Here are some details of our custom build if that helps.
--------------------------------------------------------------------------------
nginx version: nginx/1.27.0
built by gcc 11.4.1 20231218 (Red Hat 11.4.1-3) (GCC)
built with OpenSSL 3.2.2 4 Jun 2024
TLS SNI support enabled
configure arguments: --with-cc-opt='-g -O2 -march=x86-64-v3 -flto=auto -ffat-lto-objects -flto=auto -ffat-lto-objects -fstack-protector-strong -ftree-vectorize -Wformat -Werror=format-security -fPIC -Wdate-time -D_FORTIFY_SOURCE=2 -DTCP_FASTOPEN=23' --with-ld-opt='-Wl,-Bsymbolic-functions -flto=auto -ffat-lto-objects -flto=auto -Wl,-z,relro -Wl,-z,now -fPIC' --with-openssl-opt='enable-tls1_3 enable-ec_nistp_64_gcc_128 no-nextprotoneg no-weak-ssl-ciphers no-ssl3 enable-ktls' --prefix=/usr/share/nginx --conf-path=/opt/nginx/nginx.conf --http-log-path=/var/log/nginx/access.log --error-log-path=/var/log/nginx/error.log --lock-path=/var/lock/nginx.lock --pid-path=/run/nginx.pid --http-client-body-temp-path=/var/lib/nginx/body --http-fastcgi-temp-path=/var/lib/nginx/fastcgi --http-proxy-temp-path=/var/lib/nginx/proxy --user=nobody --group=nobody --with-pcre-jit --with-http_ssl_module --with-http_stub_status_module --with-openssl=/root/openssl-3.2.2 --with-zlib=/root/zlib-ng --with-pcre=/root/pcre2-10.44 --with-http_realip_module --with-http_auth_request_module --with-http_gzip_static_module --with-http_v2_module --with-http_v3_module --with-http_sub_module --with-libatomic=/root/libatomic --with-file-aio --with-http_xslt_module --with-http_flv_module --with-http_mp4_module --with-http_gunzip_module --with-threads --without-http_autoindex_module --without-http_ssi_module --without-http_scgi_module --without-http_uwsgi_module --add-module=/root/ngx_brotli --add-module=/root/ngx_http_geoip2_module-3.4 --add-module=/root/njs-0.8.5/nginx