Welcome! Log In Create A New Profile

Re: upstream timeouts I can not explain

Forum List Message List New Topic Print View

Maxim Dounin

January 11, 2017 01:42PM

Hello!

On Wed, Jan 11, 2017 at 08:54:31PM +0300, Руслан Закиров wrote:

> On Wed, Jan 11, 2017 at 6:44 PM, Maxim Dounin <mdounin@mdounin.ru> wrote:
>
> > Hello!
> >
> > On Wed, Jan 11, 2017 at 06:07:01PM +0300, Руслан Закиров wrote:
> >
> > > On Mon, Jan 9, 2017 at 10:59 PM, Maxim Dounin <mdounin@mdounin.ru>
> > wrote:
> > >
> > > > Typical kern.sched.quantum is about 100ms, so several
> > > > CPU-intensive tasks can delay processing of the events enough to
> > > > trigger a timeout if a context switch happens at a bad time.
> > > >
> > >
> > > Here what I see in truss' output:
> > >
> > > 38.820523207 0.000006568 kevent(28,{ },0,{
> > > 198,EVFILT_WRITE,EV_CLEAR,0x0,0x8218,0x821405071 },512,{ 6.215000000 })
> > = 1
> > > (0x1)
> > > 39.783094188 0.000022875 kevent(28,{ },0,{
> > > 52,EVFILT_READ,0x0,0x0,0x30b,0x81f800068
> > > 204,EVFILT_WRITE,EV_CLEAR,0x0,0x8218,0x821401588
> > > 51,EVFILT_READ,0x0,0x0,0xec,0x81f800000
> > > 68,EVFILT_READ,EV_CLEAR,0x0,0x8bf,0x81f816580
> > > 7,EVFILT_READ,EV_CLEAR,0x0,0x27f,0x81f813869
> > > 57,EVFILT_READ,EV_CLEAR,0x0,0x767,0x81f817bd8
> > > 203,EVFILT_READ,EV_CLEAR|EV_EOF,0x0,0x248,0x81f8030c1
> > > 181,EVFILT_READ,EV_CLEAR|EV_EOF,0x0,0x9b77,0x81f80ea68
> > > 178,EVFILT_READ,EV_CLEAR,0x0,0x39d,0x81f8010a9
> > > 198,EVFILT_READ,EV_CLEAR,0x0,0x3d3,0x81f805071
> > > 204,EVFILT_READ,EV_CLEAR,0x0,0x9da,0x81f801588
> > > 190,EVFILT_READ,EV_CLEAR,0x0,0x4ff,0x81f80fc48
> > > 154,EVFILT_READ,EV_CLEAR,0x0,0x88e,0x81f8130b1
> > > 151,EVFILT_READ,EV_CLEAR|EV_EOF,0x0,0xc1db,0x81f814290
> > > 157,EVFILT_READ,EV_CLEAR|EV_EOF,0x0,0xe841,0x81f80c029
> > > 195,EVFILT_READ,EV_CLEAR,0x0,0x952,0x81f8090a1
> > > 194,EVFILT_READ,EV_CLEAR,0x0,0x929,0x81f809ac8
> > > 201,EVFILT_READ,EV_CLEAR,0x0,0x4ef,0x81f80c980
> > > 174,EVFILT_READ,EV_CLEAR,0x0,0x51e,0x81f816518
> > > 77,EVFILT_READ,EV_CLEAR|EV_EOF,0x0,0x1168,0x81f811c61 },512,{
> > 5.253000000
> > > }) = 20 (0x14)
> > >
> > >
> > > 1 second delay between two syscalls. Then nginx goes nuts processing all
> > it
> > > missed during this second. I can not tell from this output how much time
> > > was spent in these syscalls. Can anyone?
> >
> > Using ktrace / kdump might be a better option, it shows both
> > syscall enter and syscall return with exact timestamps.
> >
>
> Tried. Found at least one issue with sendfile blocking:
>
> 66193 nginx 1484152162.182601 CALL
> openat(AT_FDCWD,0x80639aac4,0x4<O_NONBLOCK>,<unused>0)
> 66193 nginx 1484152162.182607 NAMI ...
> 66193 nginx 1484152162.182643 RET openat 40/0x28
> ... gz lookup and stats ...
> 66193 nginx 1484152162.182683 CALL
> setsockopt(0x211,0x6,0x4,0x7fffffffd96c,0x4)
> 66193 nginx 1484152162.182687 RET setsockopt 0
> 66193 nginx 1484152162.182689 CALL
> sendfile(0x28,0x211,0,0x806f,0x7fffffffe1d8,0x7fffffffe240,<invalid=0>)
> 66193 nginx 1484152163.541770 RET sendfile 0
>
> Sendfile blocks for 1.3 seconds. However, it's:
>
> $ sysctl hw.model hw.machine hw.ncpu
> hw.model: Intel(R) Xeon(R) CPU E5620 @ 2.40GHz
> hw.machine: amd64
> hw.ncpu: 16
>
> So sfbufs don't apply here, but documentation doesn't tell what resource
> applies... or I couldn't find correct doc.

On amd64 sendfile() uses mbufs / mbuf clusters. Try looking into
"vmstat -z" to see if there are enough mbuf clusters in various
zones. Note well tha sendfile() normally blocks on disk, and it
might simply mean that your disk subsystem is (occasionally)
overloaded. Try gstat to see some details.

Either way, blocking for 1.3s in a syscall perfectly explains why
300ms timeouts are sometimes triggered unexpectedly. If a timer
is set after such a blocking call and just before kevent() (that
is, connection is highly unlikely to be established in before
kevent() returns), on the next event loop iteraction nginx will
think that 1.3+ seconds has passed since the timer was set, and
will trigger a timeout (after processing all the events reported by
kevent()).

--
Maxim Dounin
http://nginx.org/
_______________________________________________
nginx mailing list
nginx@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx

Reply Quote

RSS

Subject	Author	Posted
upstream timeouts I can not explain	Руслан Закиров	January 08, 2017 09:06AM
Re: upstream timeouts I can not explain	Maxim Dounin	January 09, 2017 10:16AM
Re: upstream timeouts I can not explain	Руслан Закиров	January 09, 2017 01:58PM
Re: upstream timeouts I can not explain	Maxim Dounin	January 09, 2017 03:02PM
Re: upstream timeouts I can not explain	Руслан Закиров	January 10, 2017 12:48PM
Re: upstream timeouts I can not explain	pbooth	January 10, 2017 02:14PM
Re: upstream timeouts I can not explain	Руслан Закиров	January 11, 2017 10:08AM
Re: upstream timeouts I can not explain	Maxim Dounin	January 11, 2017 10:46AM
Re: upstream timeouts I can not explain	Руслан Закиров	January 11, 2017 12:56PM
Re: upstream timeouts I can not explain	Maxim Dounin	January 11, 2017 01:42PM
Re: upstream timeouts I can not explain	Руслан Закиров	January 11, 2017 02:26PM
Re: upstream timeouts I can not explain	Руслан Закиров	January 11, 2017 04:06PM
Re: upstream timeouts I can not explain	Maxim Konovalov	January 11, 2017 04:16PM

Sorry, only registered users may post in this forum.

Click here to login

Online Users

Guests: 282

Record Number of Users: 8 on April 13, 2023

Record Number of Guests: 421 on December 02, 2018