Hi,
this bugs me for some time now. I have nginx 1.16.0 configured as following on proxy cache:
proxy_cache_path /dev/shm/nginx_cache levels=1:2 keys_zone=proxy:1024m max_size=1024m inactive=60m;
proxy_temp_path /dev/shm/nginx_proxy_tmp;
proxy_cache_use_stale updating;
proxy_cache_lock on;
proxy_cache_lock_timeout 30s;
Most of the time all is fine and working as expected. There is some specialty in the deployment setup where some expected spikes in requests (end clients updating daily data) to few locations occur. Response size varies 1M-1.5M non-gziped. Log snippet from such spike:
[2020-05-03T00:00:44] "GET /api/34/guide?date=2020-05-03 HTTP/1.0" 200 445984 cache: HIT request time: 50.211 sec
[2020-05-03T00:00:44] "GET /api/34/guide?date=2020-05-03 HTTP/1.0" 200 780472 cache: HIT request time: 52.891 sec
[2020-05-03T00:00:44] "GET /api/34/guide?date=2020-05-03 HTTP/1.0" 200 85432 cache: HIT request time: 33.284 sec
[2020-05-03T00:00:44] "GET /api/34/guide?date=2020-05-03 HTTP/1.0" 200 57920 cache: HIT request time: 34.957 sec
[2020-05-03T00:00:44] "GET /api/34/guide?date=2020-05-03 HTTP/1.0" 200 401096 cache: HIT request time: 49.991 sec
[2020-05-03T00:00:44] "GET /api/34/guide?date=2020-05-03 HTTP/1.0" 200 244712 cache: HIT request time: 48.412 sec
[2020-05-03T00:00:44] "GET /api/34/guide?date=2020-05-03 HTTP/1.0" 200 101360 cache: HIT request time: 34.955 sec
[2020-05-03T00:00:44] "GET /api/34/guide?date=2020-05-03 HTTP/1.0" 200 102808 cache: HIT request time: 34.753 sec
...
[2020-03-24T00:02:16] "GET /api/34/guide?date=2020-05-03 HTTP/1.0" 200 1526025 cache: HIT request time: 48.671 sec
Monitoring du on cache location shows max 1.1G, like:
1.1G /dev/shm/nginx_cache
0 /dev/shm/nginx_proxy_tmp
After 2minutes response 'stabilizes' with correct size (in this example 1526025). Problem is also amplified due clients validate response and retry progressively if corrupted.
There are no weird log lines in error log or linux (centos) messages, also there is no cache 'updating', just hits (I guess this omits upstream servers issue). Is it possible we have issue with reading cached entries from /dev/shm during peak times?
I would kindly ask for hints where possibly to start looking and debugging?
Big thanks in advance