Welcome! Log In Create A New Profile

Advanced

Re: upstream timeouts I can not explain

Maxim Dounin
January 11, 2017 01:42PM
Hello!

On Wed, Jan 11, 2017 at 08:54:31PM +0300, Руслан Закиров wrote:

> On Wed, Jan 11, 2017 at 6:44 PM, Maxim Dounin <mdounin@mdounin.ru> wrote:
>
> > Hello!
> >
> > On Wed, Jan 11, 2017 at 06:07:01PM +0300, Руслан Закиров wrote:
> >
> > > On Mon, Jan 9, 2017 at 10:59 PM, Maxim Dounin <mdounin@mdounin.ru>
> > wrote:
> > >
> > > > Typical kern.sched.quantum is about 100ms, so several
> > > > CPU-intensive tasks can delay processing of the events enough to
> > > > trigger a timeout if a context switch happens at a bad time.
> > > >
> > >
> > > Here what I see in truss' output:
> > >
> > > 38.820523207 0.000006568 kevent(28,{ },0,{
> > > 198,EVFILT_WRITE,EV_CLEAR,0x0,0x8218,0x821405071 },512,{ 6.215000000 })
> > = 1
> > > (0x1)
> > > 39.783094188 0.000022875 kevent(28,{ },0,{
> > > 52,EVFILT_READ,0x0,0x0,0x30b,0x81f800068
> > > 204,EVFILT_WRITE,EV_CLEAR,0x0,0x8218,0x821401588
> > > 51,EVFILT_READ,0x0,0x0,0xec,0x81f800000
> > > 68,EVFILT_READ,EV_CLEAR,0x0,0x8bf,0x81f816580
> > > 7,EVFILT_READ,EV_CLEAR,0x0,0x27f,0x81f813869
> > > 57,EVFILT_READ,EV_CLEAR,0x0,0x767,0x81f817bd8
> > > 203,EVFILT_READ,EV_CLEAR|EV_EOF,0x0,0x248,0x81f8030c1
> > > 181,EVFILT_READ,EV_CLEAR|EV_EOF,0x0,0x9b77,0x81f80ea68
> > > 178,EVFILT_READ,EV_CLEAR,0x0,0x39d,0x81f8010a9
> > > 198,EVFILT_READ,EV_CLEAR,0x0,0x3d3,0x81f805071
> > > 204,EVFILT_READ,EV_CLEAR,0x0,0x9da,0x81f801588
> > > 190,EVFILT_READ,EV_CLEAR,0x0,0x4ff,0x81f80fc48
> > > 154,EVFILT_READ,EV_CLEAR,0x0,0x88e,0x81f8130b1
> > > 151,EVFILT_READ,EV_CLEAR|EV_EOF,0x0,0xc1db,0x81f814290
> > > 157,EVFILT_READ,EV_CLEAR|EV_EOF,0x0,0xe841,0x81f80c029
> > > 195,EVFILT_READ,EV_CLEAR,0x0,0x952,0x81f8090a1
> > > 194,EVFILT_READ,EV_CLEAR,0x0,0x929,0x81f809ac8
> > > 201,EVFILT_READ,EV_CLEAR,0x0,0x4ef,0x81f80c980
> > > 174,EVFILT_READ,EV_CLEAR,0x0,0x51e,0x81f816518
> > > 77,EVFILT_READ,EV_CLEAR|EV_EOF,0x0,0x1168,0x81f811c61 },512,{
> > 5.253000000
> > > }) = 20 (0x14)
> > >
> > >
> > > 1 second delay between two syscalls. Then nginx goes nuts processing all
> > it
> > > missed during this second. I can not tell from this output how much time
> > > was spent in these syscalls. Can anyone?
> >
> > Using ktrace / kdump might be a better option, it shows both
> > syscall enter and syscall return with exact timestamps.
> >
>
> Tried. Found at least one issue with sendfile blocking:
>
> 66193 nginx 1484152162.182601 CALL
> openat(AT_FDCWD,0x80639aac4,0x4<O_NONBLOCK>,<unused>0)
> 66193 nginx 1484152162.182607 NAMI ...
> 66193 nginx 1484152162.182643 RET openat 40/0x28
> ... gz lookup and stats ...
> 66193 nginx 1484152162.182683 CALL
> setsockopt(0x211,0x6,0x4,0x7fffffffd96c,0x4)
> 66193 nginx 1484152162.182687 RET setsockopt 0
> 66193 nginx 1484152162.182689 CALL
> sendfile(0x28,0x211,0,0x806f,0x7fffffffe1d8,0x7fffffffe240,<invalid=0>)
> 66193 nginx 1484152163.541770 RET sendfile 0
>
> Sendfile blocks for 1.3 seconds. However, it's:
>
> $ sysctl hw.model hw.machine hw.ncpu
> hw.model: Intel(R) Xeon(R) CPU E5620 @ 2.40GHz
> hw.machine: amd64
> hw.ncpu: 16
>
> So sfbufs don't apply here, but documentation doesn't tell what resource
> applies... or I couldn't find correct doc.

On amd64 sendfile() uses mbufs / mbuf clusters. Try looking into
"vmstat -z" to see if there are enough mbuf clusters in various
zones. Note well tha sendfile() normally blocks on disk, and it
might simply mean that your disk subsystem is (occasionally)
overloaded. Try gstat to see some details.

Either way, blocking for 1.3s in a syscall perfectly explains why
300ms timeouts are sometimes triggered unexpectedly. If a timer
is set after such a blocking call and just before kevent() (that
is, connection is highly unlikely to be established in before
kevent() returns), on the next event loop iteraction nginx will
think that 1.3+ seconds has passed since the timer was set, and
will trigger a timeout (after processing all the events reported by
kevent()).

--
Maxim Dounin
http://nginx.org/
_______________________________________________
nginx mailing list
nginx@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx
Subject Author Posted

upstream timeouts I can not explain

Руслан Закиров January 08, 2017 09:06AM

Re: upstream timeouts I can not explain

Maxim Dounin January 09, 2017 10:16AM

Re: upstream timeouts I can not explain

Руслан Закиров January 09, 2017 01:58PM

Re: upstream timeouts I can not explain

Maxim Dounin January 09, 2017 03:02PM

Re: upstream timeouts I can not explain

Руслан Закиров January 10, 2017 12:48PM

Re: upstream timeouts I can not explain

pbooth January 10, 2017 02:14PM

Re: upstream timeouts I can not explain

Руслан Закиров January 11, 2017 10:08AM

Re: upstream timeouts I can not explain

Maxim Dounin January 11, 2017 10:46AM

Re: upstream timeouts I can not explain

Руслан Закиров January 11, 2017 12:56PM

Re: upstream timeouts I can not explain

Maxim Dounin January 11, 2017 01:42PM

Re: upstream timeouts I can not explain

Руслан Закиров January 11, 2017 02:26PM

Re: upstream timeouts I can not explain

Руслан Закиров January 11, 2017 04:06PM

Re: upstream timeouts I can not explain

Maxim Konovalov January 11, 2017 04:16PM



Sorry, only registered users may post in this forum.

Click here to login

Online Users

Guests: 282
Record Number of Users: 8 on April 13, 2023
Record Number of Guests: 421 on December 02, 2018
Powered by nginx      Powered by FreeBSD      PHP Powered      Powered by MariaDB      ipv6 ready