Maxim Dounin Wrote:
-------------------------------------------------------
> Hello!
Hello and thanks for the reply!
> > I assume the mentioned error is due to relatively often nginx
> restarts and
> > is benign. There's nothing else in the error log (except for
> occasional
> > upstream timeouts). I'm aware this likely isn't enough info to debug
> the
> > issue, but do you at least have some ideas on what might be causing
> this
> > issue, where to look? I'm wild guessing cache manager waits for some
> lock to
> > be released, but it never gets released so it just waits
> indefinitely.
>
> The error logged is due to an entry nginx is going to remove an
> inactive cache entry but it is locked by some requests. Unless
> inactive time is very low (not your case) it indicate a problem
> somewhere else.
>
> Such locked entries can't be removed from cache. Addtitionally,
> once there are enough such locked entries, nginx won't be able to
> purge cache based on max_size. That is, it's expected that nginx
> will have problems with removing entries from cache if you see
> such messages.
>
> Most trivial reason for such messages is abnormally killed nginx
> processes. That is, if some processes die due to bugs, or killed
> by an unwary administrator or an incorrect script - the problem
> will appear sooner or later.
I see. I do have 1000-2000 of such errors in log per day, definitely more than couple of months ago. I remember server got crashed in the past, but not recently.
> To further debug things, try the following:
>
> - restart nginx and record pids of all nginx processes;
>
> - once the problem starts to appear again, check if there are the
> same processes running;
>
> - if some processes different from one recorded, dig further to
> find out why.
>
> Some trivial things like looking into logs for "worker process
> exited ..." messages and checking if the problem persists without
> 3rd party modules compiled in (see "nginx -V") may also help.
Thanks, I'll dig deeper. I do have 3rd party modules and there are occasional messages such as "worker process exited on signal 11", but they are rare, i'll try to figure out what causes them, but it'll take time. However, now that this already happens, is it possible so somehow unlock all entries and start clean, but without removing all cached content? Or alternatively, can I delete the locked files manually as a workaround?
Regards,
Vedran