Igor Sysoev wrote:
> On Sun, Aug 30, 2009 at 10:55:57PM -0400, Jim Ohlstein wrote:
>
>
>> Igor Sysoev wrote:
>>
>>> On Sun, Aug 30, 2009 at 11:52:51AM -0400, Jim Ohlstein wrote:
>>>
>>>
>>>
>>>>>> 2009/08/30 10:29:00 [alert] 2042#0: open socket #1023 left in
>>>>>> connection 1015
>>>>>> 2009/08/30 10:29:00 [alert] 2042#0: aborting
>>>>>>
>>>>>> Other servers seem to be running fine including ones with busy sites.
>>>>>> For the moment I have reverted that VPS to 0.8.10.
>>>>>>
>>>>>>
>>>>>>
>>>>> Could you do the following:
>>>>>
>>>>> 1) enable coredumps
>>>>> 2) set in nginx.conf:
>>>>> debug_points abort;
>>>>> 3) reconfigure nginx, if there are open connections, then nginx creates
>>>>> coredump on exit
>>>>>
>>>>>
>>>>>
>>>> Do you want nginx reconfigured "--with-debug" or is there another option
>>>> you need?
>>>>
>>>>
>>> No. The coredump is enough, it's just should have debug info (gcc -g
>>> option).
>>>
>>>
>>>
>>>>> 4) look in log for alerts: open socket #... left in connection NN
>>>>> 5) run "gdb /path/to/nginx /path/to/core", then
>>>>>
>>>>> p ((ngx_connection_t *) ngx_cycle->connections[NN]->data)->uri
>>>>> p ((ngx_connection_t *) ngx_cycle->connections[NN]->data)->main->count
>>>>>
>>>>> where NN is NN from log message.
>>>>>
>>>>>
>> Unfortunately I don't think it gave too much information.
>>
>> I watched connections gradually rise. I have ulimit -n set to 1024, two
>> workers, 1024 connections/worker. As connections neared 2048 the site
>> became unresponsive and load went up dramatically.
>>
>> I began to see the same errors in the log. Nginx did not abort on its
>> own so I killed it after a few minutes. I then saw the same entries in
>> the error log like:
>>
>> 2009/08/30 22:22:40 [alert] 6118#0: open socket #980 left in connection 993
>>
>
> nginx aborts only when you send -HUP and it found leaked connections.
>
>
>> I ran gdb on the core but this was the output from three connections:
>>
>> [root@mars proc]# gdb /vz/private/101/fs/root/usr/local/sbin/nginx ./kcore
>> GNU gdb Red Hat Linux (6.5-37.el5_2.2rh)
>> Copyright (C) 2006 Free Software Foundation, Inc.
>> GDB is free software, covered by the GNU General Public License, and you are
>> welcome to change it and/or distribute copies of it under certain
>> conditions.
>> Type "show copying" to see the conditions.
>> There is absolutely no warranty for GDB. Type "show warranty" for details.
>> This GDB was configured as "x86_64-redhat-linux-gnu"...Using host
>> libthread_db library "/lib64/libthread_db.so.1".
>>
>> warning: core file may not match specified executable file.
>> Core was generated by `ro root=LABEL=/ console=tty0
>> console=ttyS1,19200n8 debug'.
>> #0 0x0000000000000000 in ?? ()
>> (gdb) p ((ngx_connection_t *) ngx_cycle->connections[1014]->data)->uri
>> Cannot access memory at address 0x130
>> (gdb) p ((ngx_connection_t *) ngx_cycle->connections[1014]->data)->uri
>> Cannot access memory at address 0x130
>> (gdb) p ((ngx_connection_t *) ngx_cycle->connections[1010]->data)->uri
>> Cannot access memory at address 0x130
>> (gdb) p ((ngx_connection_t *)
>> ngx_cycle->connections[1014]->data)->main->count
>> Cannot access memory at address 0x130
>> (gdb) p ((ngx_connection_t *)
>> ngx_cycle->connections[1010]->data)->main->count
>> Cannot access memory at address 0x130
>> (gdb) p ((ngx_connection_t *) ngx_cycle->connections[993]->data)->uri
>> Cannot access memory at address 0x130
>> (gdb) p ((ngx_connection_t *)
>> ngx_cycle->connections[993]->data)->main->count
>> Cannot access memory at address 0x130
>> (gdb) quit
>> [root@mars proc]#
>>
>> During this time there were hundreds of connections in "CLOSE_WAIT"
>> state. They gradually increased to just over 1000 when it crashed.
>>
>
> Sorry, I've mistaked:
>
> p ((ngx_http_request_t *) ngx_cycle->connections[1014].data)->uri
> p ((ngx_http_request_t *) ngx_cycle->connections[1014].data)->main->count
>
>
>
It looks as though you got the data that you needed overnight in my time
zone. That server does use a try_files directive:
location /forums/ {
try_files $uri $uri/ /forums/vbseo.php;
...
}
Previously we used a rewrite:
#if (!-e $request_filename) {
#rewrite ^/forums/(.*)$ /forums/vbseo.php last;
#}
which ironically would probably not have caused this difficulty.
I'll try 0.8.12 and report if any difficulties unless you want me to
generate another coredump with 0.8.11
Jim