Welcome! Log In Create A New Profile

Advanced

Fix for ngx_unlock() and thus race conditions in AIO mechanism

Mindaugas Rasiukevicius
April 18, 2016 05:42PM
Hi,

Some background: enabling AIO threads can result in "stuck" connections,
permanently waiting for handler callback (in the system these connections
appear in a CLOSE_WAIT state). This happens due to race condition(s) in
the AIO thread pool code. Since the window is tiny, it may take hundreds
of thousands of requests on a many-core machine to trigger a single race,
but it is present in the real world: on some systems we accumulate over
a hundred of them a day.

Fix: add a compiler barrier to ngx_unlock(); since it is a macro, it is
subject to the compiler reordering optimisations. Consider the following
fragment of the ngx_thread_pool_cycle() function:

ngx_spinlock(&ngx_thread_pool_done_lock, 1, 2048);
*ngx_thread_pool_done.last = task;
ngx_thread_pool_done.last = &task->next;
ngx_unlock(&ngx_thread_pool_done_lock);
(void) ngx_notify(ngx_thread_pool_handler);

On nginx 1.9.14 RHEL 6 build (nginx-1.9.14-1.el6.ngx.x86_64.rpm), you can
see the following assembly code:

438a05: bf 78 f6 70 00 mov $0x70f678,%edi
...
438a1c: e8 df a6 fe ff callq 423100 <ngx_spinlock>
438a21: 48 8b 05 60 6c 2d 00 mov 0x2d6c60(%rip),%rax
438a28: 48 c7 05 45 6c 2d 00 movq $0x0,0x2d6c45(%rip) # <--
438a2f: 00 00 00 00
438a33: bf b0 8a 43 00 mov $0x438ab0,%edi
438a38: 48 89 2d 49 6c 2d 00 mov %rbp,0x2d6c49(%rip)
438a3f: 48 89 28 mov %rbp,(%rax)
438a42: ff 15 08 72 2d 00 callq *0x2d7208(%rip)

You can see that at 0x438a28 it performs "*lock = 0" before inserting the
task into the queue. This can race with ngx_thread_pool_handler(), where
ngx_thread_pool_done list is reset before the "task" is inserted (that is,
tasks just get lost).

On x86/amd64, the unlock operation only needs a *compiler* barrier, it does
not need a *memory* barrier. I used ngx_memory_barrier() in the patch, but
you can optimise x86/amd64 to use __asm __volatile("":::"memory").

The patch is attached. Even better fix would be to use pthread_spin_lock
or just pthread_mutex_lock. Why roll out your own one?

--
Mindaugas
diff -r 640288d0e1bc src/os/unix/ngx_atomic.h
--- a/src/os/unix/ngx_atomic.h Tue Apr 12 19:01:56 2016 +0300
+++ b/src/os/unix/ngx_atomic.h Mon Apr 18 18:28:52 2016 +0100
@@ -307,7 +307,14 @@
void ngx_spinlock(ngx_atomic_t *lock, ngx_atomic_int_t value, ngx_uint_t spin);

#define ngx_trylock(lock) (*(lock) == 0 && ngx_atomic_cmp_set(lock, 0, 1))
-#define ngx_unlock(lock) *(lock) = 0
+
+
+static ngx_inline void
+ngx_unlock(ngx_atomic_t *lock)
+{
+ ngx_memory_barrier();
+ *lock = 0;
+}


#endif /* _NGX_ATOMIC_H_INCLUDED_ */
_______________________________________________
nginx-devel mailing list
nginx-devel@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx-devel
Subject Author Views Posted

Fix for ngx_unlock() and thus race conditions in AIO mechanism

Mindaugas Rasiukevicius 435 April 18, 2016 05:42PM

Re: Fix for ngx_unlock() and thus race conditions in AIO mechanism

Maxim Dounin 208 April 19, 2016 10:02AM

Re: Fix for ngx_unlock() and thus race conditions in AIO mechanism

Mindaugas Rasiukevicius 178 April 19, 2016 12:12PM

Re: Fix for ngx_unlock() and thus race conditions in AIO mechanism

Maxim Dounin 230 April 19, 2016 12:48PM



Sorry, you do not have permission to post/reply in this forum.

Online Users

Guests: 170
Record Number of Users: 8 on April 13, 2023
Record Number of Guests: 421 on December 02, 2018
Powered by nginx      Powered by FreeBSD      PHP Powered      Powered by MariaDB      ipv6 ready