Update,
It seems that I've succeeded to narrowing the circumstances of this strange
behavior. The chain of events is as follows:
1. a php worker receives a request from nginx and takes some time to work
on it. During this time, nginx reaches the fastcgi timeout and tries to
close the connection.
2. the php worker doesn't receive or ignores the RST packet from nginx and
continues
3. when done, it starts writing to the socket, fills the buffer and the
socket becomes CLOSE_WAIT
4. at this point, the worker is stuck in
# cat /proc/16983/stack
[<ffffffff8140b756>] sk_stream_wait_memory+0x186/0x270
[<ffffffff8144f585>] tcp_sendmsg+0x705/0xa30
[<ffffffff81400ef1>] sock_aio_write+0x151/0x160
[<ffffffff8116d05a>] do_sync_write+0xfa/0x140
[<ffffffff8116d424>] vfs_write+0x184/0x1a0
[<ffffffff8116dd91>] sys_write+0x51/0x90
[<ffffffff81013172>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
# strace -f -p 16983
Process 16983 attached - interrupt to quit
write(3, "ef=\"http://www.\" >"..., 51048
tcp 9 14608 php:9019 nginx:36970 CLOSE_WAIT
16983/php-fpm
and the kernel is backloging the connections.
5. now, if I kill the process, the situation recovers immediately, because
the freshly spawned process picks up a backloged connection that is already
abandoned by nginx, and again, the socket is in CLOSE_WAIT.
6. The only way to make it work again is to restart the master, so the
backloged connections are dropped.
There are two problems that I don't know who to blame. Firstly, I don't
know why backloged connections remain in established state and are not
dropped. Secondly, I don't know why the RST packets from nginx never arrive
to the php.
The latter problem is obviously seem to be the network problem. I think I
could mitigate it at this point by setting up a point-to-point tunnel
between the hosts, so no switch or firewall interferes with the traffic.
Most probably I've hit some "intelligent" configuration in the Amazon's EC2
network. It's been 3 days since I've started tunneling the traffic and the
problem doesn't represent itself anymore. I'll give it a week and will try
to ask clarifications from Amazon. I don't expect much to come out of this
conversation.
As for the first problem, I am not sure what's wrong here. I have a gut
feeling that something is wrong with it. Any comments anyone?
Thanks!
On Tuesday, May 21, 2013 12:33:36 PM UTC+2, Michael Tabolsky wrote:
>
> Hi List,
>
> I really hope someone can help to debug this problem since I am trying to
> run out of options here ...
>
> I have a setup of two nodes, one running nginx, the other php-fpm. about
> 30 hosts/pools. All was good for a few months, but since some upgrade and I
> just can't track it back which one, I've started to get into a problem with
> random pools at random intervals, children that stuck like this:
> [pid 7939] write(3, "122\"><a href=\"http://www.b"..., 456 <unfinished
> ...>
> [pid 7672] write(3, "122\"><a href=\"http://www.b"..., 456^C <unfinished
> ...>
>
> The fd is the connection to the nginx (tcp, naturally), which is already
> dropped by nginx because of timeout. There are no errors or warnings in the
> debug log. As soon as the pool hits the children limit, master starts to
> refuse connections from nginx. Just before the stuck writing of response
> starts, php processes don't do anything suspicious, just normally mmapping
> the files without any errors. If I kill these children, the master spawns
> the new ones as it should and they get stuck immediately in the same way.
> This doesn't affect other pools running under different or the same UIDs,
> they are still going. The only way to "recover" the "broken" pool is to
> restart the master.
>
> the php (5.3.23) is running on centos 6.4 x86_64 with memcache for
> sessions and no accelerators.
>
> I also cannot correlate the problem to any external factor, like high
> loads or network outages.
>
> Any guess please?
>
> Thanks a lot in advance!
>
--
---
You received this message because you are subscribed to the Google Groups "highload-php-en" group.
To unsubscribe from this group and stop receiving emails from it, send an email to highload-php-en+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.