Welcome! Log In Create A New Profile

Advanced

Re: [PATCH] SO_REUSEPORT support for listen sockets (round 3)

Sepherosa Ziehau
September 02, 2013 10:34PM
On Mon, Sep 2, 2013 at 10:49 PM, Maxim Dounin <mdounin@mdounin.ru> wrote:

> Hello!
>
> (Sorry again for late reply. See below for comments.)
>
>
Thank you for the reply.


> On Fri, Aug 02, 2013 at 01:16:53PM +0800, Sepherosa Ziehau wrote:
>
> > Here is another round of SO_REUSEPORT support. The plot is changed a
> > little bit to allow smooth configure reloading and binary upgrading.
> > Here is what happens when so_reuseport is enable (this does not affect
> > single process model):
> > - Master creates the listen sockets w/ SO_REUSEPORT, but does not
> configure them
> > - The first worker process will inherit the listen sockets created by
> > master and configure them
> > - After master forked the first worker process all listen sockets are
> closed
> > - The rest of the workers will create their own listen sockets w/
> SO_REUSEPORT
> > - During binary upgrade, listen sockets are no longer passed through
> > environment variables, since new master will create its own listen
> > sockets. Well, the old master actually does not have any listen
> > sockets opened :).
> >
> > The idea behind this plot is that at any given time, there is always
> > one listen socket left, which could inherit the syncaches and pending
> > sockets on the to-be-closed listen sockets. The inheritance itself is
> > handled by the kernel; I implemented this inheritance for DragonFlyBSD
> > recently (
> http://gitweb.dragonflybsd.org/dragonfly.git/commit/02ad2f0b874fb0a45eb69750219f79f5e8982272
> ).
> > I am not tracking Linux's code, but I think Linux side will
> > eventually get (or already got) the proper fix.
> >
> > The patch itself:
> > http://leaf.dragonflybsd.org/~sephe/ngx_soreuseport3.diff
> >
> > Configuration reloading and binary upgrading will not be interfered as
> > w/ the first 2 patches.
> >
> > Binary upgrading reverting method 1 ("Send the HUP signal to the old
> > master process. ...") will not be interfered as w/ the first 2
> > patches. There still could be some glitch (but not that worse as w/
> > the first 2 patches) if binary upgrading reverting method 2 ("Send the
> > TERM signal to the new master process. ...") is used. I think we
> > probably just need to mention that in the document.
>
> While this look like better that what was with previous patches
> (mostly due to inheritance handled by kernel), it still looks very
> fragile for me. In particular, I really dislike the trick with
> making first worker process special.
>
>
Well, the idea is to keep at least one listen socket opened. Maybe I could
find other way in kernel to make it less tricky. However, that may add
extra syscall or socket option.


> It's probably should either left in the state "nothing is
> guaranteed" (with some understanding of what will happen in
> various common situations like reconfiguration, upgrade, switching
> so_reuseport on/off) or some way should be found to make things
> less tricky.
>

To be frank, at least interfering the reconfigure probably is not wanted.
And I don't want "nothing is guaranteed" (which probably is the first 2
patches).


>
> Additional question to consider is what happens with security
> checks? Linux seems to require processs user id match on
> SO_REUSEPORT sockets, and I would expect this to fail if there are
>

BSD's SO_REUSEPORT don't check uid. However, as far as I understand the
code, when nginx worker creates SO_REUSEPORT listen socket, the uid is not
changed yet.


> sockets opened both in master and in worker processes; and
> privileged port checks might cause problems as well.
>

See the above comment.


>
> (We've also discussed this here in office serveral times, and it
> seems that general consensus is that SO_REUSEPORT for TCP balancing
> isn't really good interface. It would be much easier for everyone
> if normal workflow with inherited listen socket descriptors just
> worked. Especially given the fact that in nginx case it's mostly
> about benchmarking, since in real life load distribution between
> worker processes is good enough.)


In DragonFly, SO_REUSEPORT is more than load balance: it makes the accepted
sockets network processing completely CPU localized (from user land to
kernel land on both RX and TX path). This level of network processing CPU
localization could not be achieved by the old listen socket inheritance
usage model (even if I could divide listen socket's completion queue to
each CPU base on RX hash, the level of CPU localization achieved by
SO_REUSEPORT still could not be achieved easily). In addition to the CPU
localization, it also avoids nginx's accept mutex contention (I have not
measured the contention rate though, but no contention should be better,
imho).

Best Regards,
sephe
_______________________________________________
nginx-devel mailing list
nginx-devel@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx-devel
Subject Author Views Posted

[PATCH] SO_REUSEPORT support for listen sockets (round 3)

Sepherosa Ziehau 1707 August 02, 2013 01:18AM

Re: [PATCH] SO_REUSEPORT support for listen sockets (round 3)

Sepherosa Ziehau 667 August 29, 2013 09:26AM

Re: [PATCH] SO_REUSEPORT support for listen sockets (round 3)

Maxim Dounin 629 August 29, 2013 03:52PM

Re: [PATCH] SO_REUSEPORT support for listen sockets (round 3)

Sepherosa Ziehau 597 August 30, 2013 04:20AM

Re: [PATCH] SO_REUSEPORT support for listen sockets (round 3)

Maxim Dounin 543 September 02, 2013 10:50AM

Re: [PATCH] SO_REUSEPORT support for listen sockets (round 3)

Sepherosa Ziehau 842 September 02, 2013 10:34PM

Re: [PATCH] SO_REUSEPORT support for listen sockets (round 3)

Maxim Dounin 621 September 03, 2013 10:38AM

Re: [PATCH] SO_REUSEPORT support for listen sockets (round 3)

Sepherosa Ziehau 574 September 05, 2013 02:48AM

Re: [PATCH] SO_REUSEPORT support for listen sockets (round 3)

Maxim Dounin 1386 September 05, 2013 01:30PM

accept_mutex

Myla John-B22173 599 October 22, 2013 01:08PM

Re: accept_mutex

Maxim Dounin 605 October 22, 2013 06:06PM



Sorry, you do not have permission to post/reply in this forum.

Online Users

Guests: 309
Record Number of Users: 8 on April 13, 2023
Record Number of Guests: 421 on December 02, 2018
Powered by nginx      Powered by FreeBSD      PHP Powered      Powered by MariaDB      ipv6 ready