{{{
proxy_cache_path /nginx/cache/three levels=1:1:2 keys_zone=three:1000m;
proxy_cache_valid 200 304 302 60m;
}}}
Nginx version: 1.0.1
nginx: configure arguments: --with-pcre --group=admin --user=admin --prefix=/var/www/nginx --with-http_ssl_module --with-http_perl_module --with-http_stub_status_module --with-perl_modules_path=./src/http/modules/perl/ --add-module=../ngx_cache_purge
The cache entries timeout is set as 60 minutes, on production environment(for website hosting) service became unavailable after running several hours. After dive into backtraces and source code, it's a deadlock issue on reload nginx(reload occurs every several minutes), here is the bt information.
{{{
root 28779 0.0 0.1 1124788 6464 ? Ss 11:15 0:00 nginx: master process /var/www/nginx//sbin/nginx
nobody 28781 48.8 0.2 1126424 8268 ? R 11:15 77:41 nginx: worker process is shutting down
nobody 28784 47.2 0.0 1122576 3460 ? R 11:15 75:05 nginx: cache manager process
nobody 28785 47.6 0.0 1122724 3496 ? R 11:15 75:45 nginx: cache loader process
nobody 30156 50.8 0.2 1128800 9648 ? R 11:43 66:27 nginx: worker process
nobody 30157 54.1 0.2 1128800 9648 ? R 11:43 70:46 nginx: worker process
nobody 30158 54.9 0.2 1128800 9572 ? R 11:43 71:52 nginx: worker process
nobody 30159 55.3 0.2 1128800 9648 ? R 11:43 72:22 nginx: worker process
nobody 30160 8.5 26.3 1124788 1069964 ? R 11:43 11:12 nginx: cache manager process
}}}
From the ps list Nginx is under reloading, when all worker process and a cache manager process is spinlock to get the cache, and another cache manager acquired the lock and fall into a while loop.
{{{
#0 ngx_http_file_cache_expire (cache=0x1ea7dc00) at src/http/ngx_http_file_cache.c:1103
1103 if (fcn->deleting) {
(gdb) bt
#0 ngx_http_file_cache_expire (cache=0x1ea7dc00) at src/http/ngx_http_file_cache.c:1103
#1 0x0000000000474e8b in ngx_http_file_cache_manager (data=0x1ea7dc00) at src/http/ngx_http_file_cache.c:1193
#2 0x0000000000439e9d in ngx_cache_manager_process_handler (ev=0x7fff5f5f3120) at src/os/unix/ngx_process_cycle.c:1346
#3 0x00000000004303f0 in ngx_event_expire_timers () at src/event/ngx_event_timer.c:149
#4 0x000000000042e537 in ngx_process_events_and_timers (cycle=0x1e9b5a70) at src/event/ngx_event.c:261
#5 0x0000000000439d6e in ngx_cache_manager_process_cycle (cycle=0x1e9b5a70, data=0x6c4440) at src/os/unix/ngx_process_cycle.c:1328
#6 0x00000000004361bd in ngx_spawn_process (cycle=0x1e9b5a70, proc=0x439bdc <ngx_cache_manager_process_cycle>, data=0x6c4440, name=0x4abc88 "cache manager process",
respawn=-4) at src/os/unix/ngx_process.c:196
#7 0x00000000004382ca in ngx_start_cache_manager_processes (cycle=0x1e9b5a70, respawn=1) at src/os/unix/ngx_process_cycle.c:398
#8 0x0000000000437dc9 in ngx_master_process_cycle (cycle=0x1e9b5a70) at src/os/unix/ngx_process_cycle.c:251
#9 0x000000000040e36c in main (argc=1, argv=0x7fff5f5f36d8) at src/core/nginx.c:405
(gdb) p fcn->deleting
$1 = 1
(gdb) p wait
$2 = -67
}}}
Source code
{{{
1075 for ( ;; ) {
1076
1077 if (ngx_queue_empty(&cache->sh->queue)) {
1078 wait = 10;
1079 break;
1080 }
1081
1082 q = ngx_queue_last(&cache->sh->queue);
1083
1084 fcn = ngx_queue_data(q, ngx_http_file_cache_node_t, queue);
1085
1086 wait = fcn->expire - now;
1087
1088 if (wait > 0) {
1089 wait = wait > 10 ? 10 : wait;
1090 break;
1091 }
.....
}}}
The for loop never exit if "wait" is a negative number and unfortunately it becomes -67, I hope decrease the timeout to 10m would work around it, and it failed, either.