I've run some tests and I'm pretty sure the reason I was getting 5MB stuck on nginx side was because of RCVBUF on upstream socket uses default socket buffers and by default it ends up with 5MB RCV buffer.
I added logs to check that value and even after I configured sndbuf and rcvbuf inside listen directive I was getting 5MB buffer on upstream socket. After I configured these buffers on upstream (by fiddling with nginx code) I got immediate results and right away that 5MB delay disappeared.
Similar to 'X-Accel-Buffering' I added X-Accel-Up-RCVBUF and X-Accel-Down-SNDBUF headers and they seem to be working as expected.
I'm testing this scenario: downstream has limited bandwidth and upstream (node) can generate data much faster. My goal is to ensure that overall read speed from upsteram gets limited by downstream, so that nginx doesn't try to read faster than downstream is capable of. Basically I'm ok if nginx buffers some constant amount of data (e.g. not more than 1 sec of data at downstream speed.
Even after fixing it, nginx doesn't work as well as simple single-threaded vanilla test proxy that I wrote for testing.
That vanilla proxy delivers perfect results for this simple reason: I set upstream and downstream buffers to some low value (e.g. 128KB) and then I use blocking recv and blocking send in the same thread. This way whatever it reads from upstream it sends right away downstream in the same loop
Any reason why would this not work with nginx?.. I don't see why it wouldn't work with async sockets the same way as with blocking read/send loop in vanilla proxy.