Hi,
Not only loopback resolver address, but any addresses without UDP:53
port listened can trigger this segfault. I've confirmed the problem on
my Linux box. Here's the backtrace of nginx-0.8.49 when segfault happens:
#0 0x08059acd in ngx_log_error_core (level=4, log=0x810c16c, err=111,
fmt=0x80e92cd "recv() failed") at src/core/ngx_log.c:93
#1 0x08062442 in ngx_connection_error (c=0x812c538, err=111,
text=0x80e92cd "recv() failed") at src/core/ngx_connection.c:1015
#2 0x0806ef06 in ngx_udp_unix_recv (c=0x812c538, buf=0xbfffdcfc
"m\220", size=4096) at src/os/unix/ngx_udp_recv.c:99
#3 0x080689b1 in ngx_resolver_read_response (rev=0x81424f8) at
src/core/ngx_resolver.c:952
#4 0x08073035 in ngx_epoll_process_events (cycle=0x810d528, timer=5000,
flags=<value optimized out>) at src/event/modules/ngx_epoll_module.c:642
#5 0x0806baa2 in ngx_process_events_and_timers (cycle=0x810d528) at
src/event/ngx_event.c:245
#6 0x08070edc in ngx_single_process_cycle (cycle=0x810d528) at
src/os/unix/ngx_process_cycle.c:306
#7 0x08059576 in main (argc=1, argv=0xbfffef74) at src/core/nginx.c:398
The bug exists in all versions later than nginx-0.8.36 but not in
nginx-0.7.x versions.
This problem is due to the incorrectly copy of cycle->new_log when
ngx_resolver_create() initializing udp_connection->log. Because
cycle->new_log only gets initialized in ngx_init_cycle() after all
configurations have been processed, the ngx_resolver_create() will be
called BEFORE cycle->new_log has anything meaningful. And because it
COPIED cycle->new_log, udp_connection->log will always be invalid even
cycle->new_log does get initialized properly later. When resolver is
used to resolve names and failed to connect the specified nameserver,
nginx tries to log a timeout error using the invalid log structure
stored in udp_connection, and then boom...
IMHO, by changing the definition of ngx_udp_connection_t to make log
field a pointer instead of a ngx_log_t structure (also with a bunch of
related reference modifications, of course), this bug will gone without
pains.
The attached patch should fix the bug.
Maxim Dounin wrote:
> Hello!
>
> On Sun, Mar 21, 2010 at 04:21:38AM +0200, Marcus Clyne wrote:
>
>
>> Hi,
>>
>> With a standard (i.e. no options) installation of 0.8.34 on my Linux
>> machine I get a segfault if the resolver named in the conf file is a
>> loopback address/IP, but the resolver does not exist. Other
>> non-existing resolvers don't cause a problem (they just hang, and
>> probably will time out), only loopback ones.
>>
>> The debug log is :
>>
>>
>
> [...]
>
>
>> 2010/03/21 04:07:30 [debug] 4287#0: *3 http cleanup add: 0000000001A44E60
>> 2010/03/21 04:07:30 [debug] 4287#0: *3 http finalize request: -4, "/?" a:1, c:2
>>
>
> Could you please provide full debug log (i.e. switched on at global level, not
> http/server/location)? E.g. between the above lines should be some valuable
> resolving information which isn't logged in request context but in global one
> instead.
>
> It should look like this:
>
> ...
> 2010/03/25 04:53:49 [debug] 86639#0: *1 http cleanup add: 0811F8F4
> 2010/03/25 04:53:49 [debug] 86639#0: malloc: 08126580:68
> 2010/03/25 04:53:49 [debug] 86639#0: resolve: "google.com"
> 2010/03/25 04:53:49 [debug] 86639#0: malloc: 0810D580:60
> 2010/03/25 04:53:49 [debug] 86639#0: malloc: 08133460:10
> 2010/03/25 04:53:49 [debug] 86639#0: malloc: 0810AD00:28
> 2010/03/25 04:53:49 [debug] 86639#0: resolve: "google.com" 12007
> 2010/03/25 04:53:49 [debug] 86639#0: UDP socket 11
> 2010/03/25 04:53:49 [debug] 86639#0: connect to 127.0.0.1:53, fd:11 #2
> 2010/03/25 04:53:49 [debug] 86639#0: kevent set event: 11: ft:-1 fl:0025
> 2010/03/25 04:53:49 [debug] 86639#0: send: fd:11 28 of 28
> 2010/03/25 04:53:49 [debug] 86639#0: malloc: 0810D640:60
> 2010/03/25 04:53:49 [debug] 86639#0: event timer add: -1: 30000:2466707655
> 2010/03/25 04:53:49 [debug] 86639#0: event timer add: -1: 5000:2466682655
> 2010/03/25 04:53:49 [debug] 86639#0: *1 http finalize request: -4, "/?" a:1, c:2
> ...
>
>
>> 2010/03/21 04:07:30 [debug] 4287#0: *3 http request count:2 blk:0
>> 2010/03/21 04:07:30 [debug] 4287#0: *3 http run request: "/?"
>> 2010/03/21 04:07:30 [debug] 4287#0: *3 http upstream check client,
>> write event:1, "/"
>> 2010/03/21 04:07:30 [debug] 4287#0: *3 http upstream recv(): -1 (11:
>> Resource temporarily unavailable)
>> 2010/03/21 04:07:30 [alert] 4275#0: worker process 4287 exited on signal 11
>>
>>
>> I've not got time to look into and write a patch for it right now,
>> but will do if no-one gets around to it at some point.
>>
>
> Config and backtrace should be helpfull too. Unfortunately I'm not able to
> reproduce the problem.
>
> Maxim Dounin
>
> _______________________________________________
> nginx mailing list
> nginx@nginx.org
> http://nginx.org/mailman/listinfo/nginx
>
>
_______________________________________________
nginx mailing list
nginx@nginx.org
http://nginx.org/mailman/listinfo/nginx