Hello everybody,
I recently had a "dirty" cache problem on my nginx configuration in AWS, which balances the load against various containers (in ECS).
I have several nginx (balancing via ELB) that handle requests to containers (nginx points to the various ALBs and therefore targetgroups).
I try to explain the timeline of the problem:
- we had some problems with the containers, the ELBs replied 502
- this 502 has been cached (due to the "proxy_cache_valid any 10s;" directive)
- when we got the container back up and running, however, it continued to have 502s
- The problem is that our container has "no-cache, no-store, no-transform" as its cache-control, and therefore since the cache-control value of the server in response has a higher priority the value in the cache does not has been updated.
- Having set the directive "proxy_cache_lock on;" and "proxy_cache_lock_timeout 1s;" all concurrent requests to this resource were blocked to allow the cached resource to be updated
- having also set the directive "proxy_cache_use_stale error timeout updating http_500;" the cached resource was returned for all concurrent requests that lasted more than one second.
x-cache has the variable "$upstream_cache_status" as value
nginx proxy_* configuration:
"""
proxy_http_version 1.1;
proxy_redirect off;
proxy_set_header Host "$target_host";
proxy_set_header X-Real-IP "$client_ip";
proxy_set_header X-Forwarded-For "$http_x_forwarded_for";
proxy_set_header Accept-Encoding "";
client_max_body_size 1024m;
client_body_buffer_size 128k;
proxy_send_timeout 90;
proxy_read_timeout 330;
proxy_buffers 32 4k;
proxy_buffering on;
proxy_buffer_size 16k;
proxy_busy_buffers_size 64k;
proxy_request_buffering on;
proxy_cache_path /mnt/cache/main levels=2 keys_zone=main:16m inactive=24h max_size=1G;
proxy_ignore_client_abort off;
proxy_max_temp_file_size 1024M;
proxy_ignore_headers X-Accel-Expires;
proxy_next_upstream error timeout non_idempotent;
proxy_connect_timeout 5s;
proxy_cache_valid any 10s;
proxy_cache_use_stale error timeout updating http_500;
proxy_cache_lock on;
proxy_cache_lock_timeout 1s;
proxy_cache_key "$host$request_uri";
proxy_cache main;
proxy_pass http://$elb_endpoint;
"""
response of container:
"""
HTTP/1.1 200 OK
Date: Tue, 08 Jun 2021 08:46:04 GMT
Content-Type: application/json
Transfer-Encoding: chunked
Connection: keep-alive
Vary: Accept-Encoding,Accept
Cache-Control: no-cache, no-store, no-transform
"""
response of container through nginx (502):
"""
HTTP/1.1 502 Bad Gateway
Content-Type: text/html
Date: Tue, 08 Jun 2021 08:46:11 GMT
X-Cache: UPDATING
X-Robots-Tag: noindex
Content-Length: 122
Connection: keep-alive
"""
response of container through nginx (200):
"""
HTTP/1.1 200 OK
Cache-Control: no-cache, no-store, no-transform
Content-Type: application/json
Date: Tue, 08 Jun 2021 08:46:15 GMT
Vary: Accept-Encoding
Vary: Accept-Encoding,Accept
X-Cache: EXPIRED
X-Robots-Tag: noindex
transfer-encoding: chunked
Connection: keep-alive
"""
response of elb (502):
"""
HTTP/1.1 502 Bad Gateway
Server: awselb/2.0
Date: Mon, 07 Jun 2021 22:04:03 GMT
Content-Type: text/html
Content-Length: 122
Connection: close
"""
nginx version: 1.13.6
Is there a way to cache everything (even 5xx errors) for up to 10 seconds avoiding this problem?