From the patch author:
Hi Maxim,
I apologize for my late reply: I just had now time to sort this out.
The short answer to your remarks is:
the first patch is partially correct, just incomplete, and could be easily completed with the addition of a call to ngx_disable_accept_events(...) addressing the issue of 2 workers accepting connections at the same time.
The second one is windows only and as correct and atomical as the unix only ngx_unlock_mutexes(), addressing the very same issue that one is for (which btw showed up during my tests) but being just one line long.
The long answer follows and, well, is quite long.
So before everything else, and for the sake of clarity:
accept mutex patching has been performed on codebase derived by nginx 1.5.6 after properly re-enabling it (the accept_mutex) removing the following disable lines in ngx_event.c
#if (NGX_WIN32)
/*
* disable accept mutex on win32 as it may cause deadlock if
* grabbed by a process which can't accept connections
*/
ngx_use_accept_mutex = 0;
#endif
Thus submitted patch lines are not useless and are part of a larger patch included in Catepillar 1.5.7.1.
The latter (Caterpillar) fully works distributing correctly the workload to any number of configured process workers (not just one of the configured number of them).
Now getting to specific proposed patches:
A) about the added line ngx_accept_mutex_held=0 in ngx_event.c
> 259 if (ngx_accept_mutex_held) {
> --+ ngx_accept_mutex_held=0;
> 260 ngx_shmtx_unlock(&ngx_accept_mutex);
> 261 }
that is not wrong, it's just incomplete.
In fact, first of all, it pairs with the similar one in
src/event/ngx_event_accept.c, line 402
which was patched in Caterpillar and that you too found later in your in 1.5.7.
The problem with this one (line 402) was that when ngx_accept_mutex_held's process *local* variable and the accept_mutex *global* to all nginx processes (being it allocated in shared memory) got out of synch.
This resulted (as it has been showed in tests) in more than a worker process locally 'thinking' to have the ownership of the accept_mutex at the same time.
The latter interfers in the call of their respective ngx_disable_accept_events(...) in turn leading, because of nasty time synching, to no worker being able to enabling them anymore and amounting to all workers (so the whole server) unable to handle any further connection.
The line above, between 259 and 260, tried to fix this too, but, there, a call to ngx_disable_accept_events(...) is missing.
In fact to be completely correct and not resulting in partially incorrect behaviour as you correctly pointed out, such a call must be added too.
This can be done moving that 'if' completely patched code
if (ngx_accept_mutex_held) {
ngx_disable_accept_events(cycle);
ngx_accept_mutex_held=0;
ngx_shmtx_unlock(&ngx_accept_mutex);
}
to ngx_event_accept.c (being ngx_disable_accept_events(...) internal linkage) in a dedicated function, for example
void ngx_release_accept_mutex(ngx_cycle_t *cycle){ /* the if above */ }
which then gets called at 259 in ngx_event.c instead of the 'if' itself.
BTW to make the patch complete the 'if', in ngx_trylock_accept_mutex, where ngx_disable_accept_events() is called should be removed since substituted by that ngx_release_accept_mutex() call.
All of this is to avoid a partial incorrect behaviour that the 'if', as it is now,
> 259 if (ngx_accept_mutex_held) {
> 260 ngx_shmtx_unlock(&ngx_accept_mutex);
> 261 }
causes at the end of an iteration when the accept mutex was held:
the mutex is released in the 'if' above but the ngx_disable_accept_events(...) won't be called until the next iteration with ngx_trylock_accept_mutex(...).
Thus in the meanwhile another worker could succeed to acquire the accept mutex, via ngx_trylock_accept_mutex(...), and start to accept connections as well ( ngx_enable_accept_events() is called in ngx_trylock_accept_mutex when mutex is acquired successfully).
This means that 2 workers at the same time can accept connections and this goes against the reason of an accept mutex itself reasonably not being not that the intended behaviour.
B) About the second proposed patch:
the 3 lines
if(*ngx_accept_mutex.lock==ngx_processes[n].pid) {
*ngx_accept_mutex.lock=0;
}
make a patch to prevent server's accept_mutex deadlock when its owning worker process dies unexpectedly (program crash while processing a request/answer or task manager's abrupt termination).
This is a scenario that occurred several times while testing and, when showing up, lead to server's workers unable to accept any further connection.
Given that, unlike in the unix version with its ngx_unlock_mutexes(), this scenario is not addressed in the windows version and, as said, the above lines are meant to fix it.
They are correct and thread-safe (the operation is atomic) *in the mentioned specific scenario in which they are meant to be executed* because:
1) they are invoked by master only and just when it detects that a worker process has terminated: worker process handle gets signaled on its termination and a WaitForMultipleObjects (in os/win32/ngx_process_cycle.c) waiting for them wakes up
2) the if condition makes sure 2 things are true at the same time:
a) the spinlock contains the pid of the dead worker and
b) that implies *ngx_accept_mutex.lock != 0 (since no pid could be zero)
3) considering that the accept_mutex is only acquired (during code flow) via
ngx_uint_t
ngx_shmtx_trylock(ngx_shmtx_t *mtx)
{
return (*mtx->lock == 0 && ngx_atomic_cmp_set(mtx->lock, 0, ngx_pid));
}
where
#define ngx_atomic_cmp_set(lock, old, set) \
(InterlockedCompareExchange((void **) lock, (void *) set, (void *) old) \
== (void *) old)
#endif
then, in the above scenario, *mtx->lock == pid (of the terminated worker process) and pid !=0 imply that the ngx_atomic_cmp_set(...) (so its InterlockedCompareExchange(...) ) can't ever be called for any other process different than the (already) dead one (i.e. code can't get to the atomic InterlockedXXX operation at all, so no atomic operation is ever executed...).
4) furthermore accept_mutex is released (when appropriate) only via calling ngx_shmtx_unlock() by the owning worker process and in such scenario that process is obviously dead before having had such a chance (or the 'if( )' fix's condition wouldn't trigger)
5) no other worker can release a mutex it didn't previously acquire (and couldn't even if, mistakenly, tried to; just enough to look at ngx_shmtx_unlock() implementation to see why).
This last point shows also that, always in that scenario, ngx_shmtx_unlock() can't ever be called by master to release the accept mutex (not being its owner).
Hopefully at this point it should be clear enough that releasing the accept_mutex directly with
*ngx_accept_mutex.lock=0;
is as much safe as calling ngx_shmtx_force_unlock(), the choice of which one I just deem cosmetic (they're functionally equivalent): my choice then went for the more performant, straight variable assignment.
NOt wanting to make this post longer than it is I conclude just mentioning that, as of now, the accept mutex is the only mutex living in shared memory so unlocking other possibly shared mutexes is as well unrequired.
I realize that for more easily readable code and for possible future extensions (more mutexes in shared memory) it would be better to port that function but as of now it doesn't really make a difference from the correctness of program execution point of view.
The patch was still unpolished for full submission, sorry for that.
HTH and regards,
Vittorio F Digilio