Welcome! Log In Create A New Profile

Advanced

nginx reload fails with [emerg] host not found in upstream

December 06, 2012 09:37PM
hello --

nginx will not reload on some of our proxy servers, but does on others. all are running the same version: nginx/1.0.15. the reload fails with error:

[emerg] 26903#0: host not found in upstream "webappNNx:8080" in /etc/nginx/upstream.conf:N

the issue appears to be related to nginx's ability to resolve a hostname. our proxy servers use BIND servers that we run ourselves. the BIND servers are returning answers just fine afaict. and when i reproduce this problem on a proxy server, i sniff the network and can confirm the proxy is asking the nameserver for an A record, and gets that answer back successfully.


there is a workaround i found, but i would really really rather not resort to this: putting backend (aka upstream :<) app nodes' into /etc/hosts. i have also heard suggestions to put the backend nodes' IPs into the proxy pool file (upstream.conf), but again, i'd rather not because it's not human readable, especially when firefighting. i'm hoping there is a better solution out there than these workarounds.

we are using a thirdparty module: https://github.com/yaoweibin/nginx_upstream_check_module. no i have not tried to reproduce this problem without the module. i don't know how i would since we need the functionality that it provides. and yes i will follow up with the module author.


any help? thank you very much in advance. all the gory details follow.

kallen

straces available upon request :>


a proxy server where the problem does occur:
============================================

i'd like to note that the nginx parent on this server has been running for about 6 months.

i try to reload, but the reload will not complete due to the error

[emerg] 26903#0: host not found in upstream "webapp04a:8080" in /etc/nginx/upstream.conf:3


12/07 01:28[root@proxy2-prod-ue1 ~]# nginx -t
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful

12/07 01:28[root@proxy2-prod-ue1 ~]# ps wwwwaxuf | grep ngin[x]
root 20569 0.0 0.2 25652 5364 ? Ss Jun20 0:03 nginx: master process /usr/sbin/nginx -c /etc/nginx/nginx.conf
nginx 3401 0.4 0.8 37056 15960 ? S Dec05 8:39 \_ nginx: worker process
nginx 3402 0.4 1.1 40916 19836 ? S Dec05 8:36 \_ nginx: worker process

12/07 01:29[root@proxy2-prod-ue1 ~]# cat /etc/nginx/upstream.conf
## Tomcat via HTTP
upstream tomcats_http {
server webapp02c:8080 max_fails=2;
server webapp06c:8080 max_fails=2;
server roapp02c:8080 backup;
check interval=3000 rise=3 fall=3 timeout=1000 type=http default_down=false;
check_http_send "GET /healthcheck/version HTTP/1.0\r\n\r\n";
}

12/07 01:29[root@proxy2-prod-ue1 ~]# tcpdump -nvv -i eth0 -s0 -X port 53 and host 10.24.27.66

12/07 01:30[root@proxy2-prod-ue1 ~]# strace -f -s 2048 -ttt -T -p 20569 -o nginx-parent-strace
Process 20569 attached - interrupt to quit


12/07 01:27[root@proxy2-prod-ue1 ~]# /etc/init.d/nginx reload; tail -f /var/log/nginx/error.log
Reloading nginx: [ OK ]
2012/12/07 00:05:29 [debug] 12290#0: bind() 0.0.0.0:80 #6
2012/12/07 00:05:29 [debug] 12290#0: bind() 0.0.0.0:443 #7
2012/12/07 00:05:29 [debug] 12290#0: counter: B7F38080, 1
2012/12/07 01:28:37 [debug] 22928#0: bind() 0.0.0.0:80 #6
2012/12/07 01:28:37 [debug] 22928#0: bind() 0.0.0.0:443 #7
2012/12/07 01:28:37 [debug] 22928#0: counter: B7F8F080, 1
2012/12/07 01:31:44 [debug] 23383#0: bind() 0.0.0.0:80 #6
2012/12/07 01:31:44 [debug] 23383#0: bind() 0.0.0.0:443 #7
2012/12/07 01:31:44 [debug] 23383#0: counter: B7F56080, 1
2012/12/07 01:31:44 [emerg] 20569#0: host not found in upstream "webapp02c:8080" in /etc/nginx/upstream.conf:3


as soon as that reload fires, i do see nameservice traffic on the wire. so it is NOT a matter of DNS service being unavailable. i note that it does ask for the A record twice. i don't know why.

01:31:44.426376 IP (tos 0x0, ttl 64, id 30918, offset 0, flags [DF], proto: UDP (17), length: 72) 10.45.33.82.60723 > 10.24.27.66.domain: [bad udp cksum 799c!] 18875+ A? webapp02c.prod.romeovoid.com. (44)
0x0000: 4500 0048 78c6 4000 4011 934e 0af5 2b52 E..Hx.@.@..N..+R
0x0010: 0af4 ed55 ed33 0035 0034 2ed6 49bb 0100 ...U.3.5.4..I...
0x0020: 0001 0000 0000 0000 0977 6562 6170 7030 .........webapp0
0x0030: 3263 0470 726f 6407 7361 6173 7572 6503 2c.prod.romeovoid.
0x0040: 636f 6d00 0001 0001 com.....
01:31:44.427301 IP (tos 0x0, ttl 63, id 42228, offset 0, flags [none], proto: UDP (17), length: 156) 10.24.27.66.domain > 10.45.33.82.60723: [udp sum ok] 18875* q: A? webapp02c.prod.romeovoid.com. 1/2/2 webapp02c.prod.romeovoid.com. A 10.51.23.17 ns: prod.romeovoid.com. NS ns1.prod.romeovoid.com., prod.romeovoid.com. NS ns2.prod.romeovoid.com. ar: ns1.prod.romeovoid.com. A 10.192.83.14, ns2.prod.romeovoid.com. A 10.24.27.66 (128)
0x0000: 4500 009c a4f4 0000 3f11 a7cc 0af4 ed55 E.......?......U
0x0010: 0af5 2b52 0035 ed33 0088 e8c5 49bb 8580 ..+R.5.3....I...
0x0020: 0001 0001 0002 0002 0977 6562 6170 7030 .........webapp0
0x0030: 3263 0470 726f 6407 7361 6173 7572 6503 2c.prod.romeovoid.
0x0040: 636f 6d00 0001 0001 c00c 0001 0001 0000 com.............
0x0050: 003c 0004 0a73 2aab c016 0002 0001 0001 .<...s*.........
0x0060: 5180 0006 036e 7331 c016 c016 0002 0001 Q....ns1........
0x0070: 0001 5180 0006 036e 7332 c016 c048 0001 ..Q....ns2...H..
0x0080: 0001 0000 003c 0004 0ac0 530e c05a 0001 .....<....S..Z..
0x0090: 0001 0000 003c 0004 0af4 ed55 .....<.....U
01:31:44.427420 IP (tos 0x0, ttl 64, id 30918, offset 0, flags [DF], proto: UDP (17), length: 72) 10.45.33.82.60723 > 10.24.27.66.domain: [bad udp cksum 8c21!] 50344+ A? webapp02c.prod.romeovoid.com. (44)
0x0000: 4500 0048 78c6 4000 4011 934e 0af5 2b52 E..Hx.@.@..N..+R
0x0010: 0af4 ed55 ed33 0035 0034 2ed6 c4a8 0100 ...U.3.5.4......
0x0020: 0001 0000 0000 0000 0977 6562 6170 7030 .........webapp0
0x0030: 3263 0470 726f 6407 7361 6173 7572 6503 2c.prod.romeovoid.
0x0040: 636f 6d00 0001 0001 com.....
01:31:44.428050 IP (tos 0x0, ttl 63, id 42229, offset 0, flags [none], proto: UDP (17), length: 156) 10.24.27.66.domain > 10.45.33.82.60723: [udp sum ok] 50344* q: A? webapp02c.prod.romeovoid.com. 1/2/2 webapp02c.prod.romeovoid.com. A 10.51.23.17 ns: prod.romeovoid.com. NS ns2.prod.romeovoid.com., prod.romeovoid.com. NS ns1.prod.romeovoid.com. ar: ns1.prod.romeovoid.com. A 10.192.83.14, ns2.prod.romeovoid.com. A 10.24.27.66 (128)
0x0000: 4500 009c a4f5 0000 3f11 a7cb 0af4 ed55 E.......?......U
0x0010: 0af5 2b52 0035 ed33 0088 6dd8 c4a8 8580 ..+R.5.3..m.....
0x0020: 0001 0001 0002 0002 0977 6562 6170 7030 .........webapp0
0x0030: 3263 0470 726f 6407 7361 6173 7572 6503 2c.prod.romeovoid.
0x0040: 636f 6d00 0001 0001 c00c 0001 0001 0000 com.............
0x0050: 003c 0004 0a73 2aab c016 0002 0001 0001 .<...s*.........
0x0060: 5180 0006 036e 7332 c016 c016 0002 0001 Q....ns2........
0x0070: 0001 5180 0006 036e 7331 c016 c05a 0001 ..Q....ns1...Z..
0x0080: 0001 0000 003c 0004 0ac0 530e c048 0001 .....<....S..H..
0x0090: 0001 0000 003c 0004 0af4 ed55 .....<.....U
01:31:44.428142 IP (tos 0x0, ttl 64, id 30918, offset 0, flags [DF], proto: UDP (17), length: 72) 10.45.33.82.60723 > 10.24.27.66.domain: [bad udp cksum 1632!] 45086+ A? webapp06c.prod.romeovoid.com. (44)
0x0000: 4500 0048 78c6 4000 4011 934e 0af5 2b52 E..Hx.@.@..N..+R
0x0010: 0af4 ed55 ed33 0035 0034 2ed6 b01e 0100 ...U.3.5.4......
0x0020: 0001 0000 0000 0000 0977 6562 6170 7030 .........webapp0
0x0030: 3663 0470 726f 6407 7361 6173 7572 6503 6c.prod.romeovoid.
0x0040: 636f 6d00 0001 0001 com.....
01:31:44.428791 IP (tos 0x0, ttl 63, id 42230, offset 0, flags [none], proto: UDP (17), length: 156) 10.24.27.66.domain > 10.45.33.82.60723: [udp sum ok] 45086* q: A? webapp06c.prod.romeovoid.com. 1/2/2 webapp06c.prod.romeovoid.com. A 10.195.76.80 ns: prod.romeovoid.com. NS ns1.prod.romeovoid.com., prod.romeovoid.com. NS ns2.prod.romeovoid.com. ar: ns1.prod.romeovoid.com. A 10.192.83.14, ns2.prod.romeovoid.com. A 10.24.27.66 (128)
[snip]


the workaround, put all backend nodes (in upstream.conf) into /etc/hosts :<

12/07 01:34[root@proxy2-prod-ue1 ~]# tail -3 /etc/hosts
10.51.23.17 webapp02c.prod.romeovoid.com webapp02c
10.195.76.80 webapp06c.prod.romeovoid.com webapp06c
10.96.23.87 roapp02c.prod.romeovoid.com roapp02c

and now, it will reload just fine:

12/07 01:34[root@proxy2-prod-ue1 ~]# /etc/init.d/nginx reload; tail -f /var/log/nginx/error.log
Reloading nginx: [ OK ]
2012/12/07 01:35:39 [debug] 24076#0: bind() 0.0.0.0:80 #6
2012/12/07 01:35:39 [debug] 24076#0: bind() 0.0.0.0:443 #7
2012/12/07 01:35:39 [debug] 24076#0: counter: B7FCD080, 1
2012/12/07 01:35:39 [debug] 20569#0: http upstream check, find oshm_zone:092C6390, opeers_shm: B7451000
2012/12/07 01:35:39 [debug] 20569#0: http upstream check: inherit opeer:10.51.23.17:8080
2012/12/07 01:35:39 [debug] 20569#0: http upstream check: inherit opeer:10.195.76.80:8080
2012/12/07 01:35:39 [debug] 20569#0: http upstream check: inherit opeer:10.96.23.87:8080
2012/12/07 01:35:39 [notice] 20569#0: using the "epoll" event method
2012/12/07 01:35:39 [notice] 20569#0: start worker processes
2012/12/07 01:35:39 [debug] 20569#0: channel 3:5
2012/12/07 01:35:39 [notice] 20569#0: start worker process 24078
2012/12/07 01:35:39 [debug] 20569#0: pass channel s:2 pid:24078 fd:3 to s:0 pid:3401 fd:9
2012/12/07 01:35:39 [debug] 20569#0: pass channel s:2 pid:24078 fd:3 to s:1 pid:3402 fd:11
2012/12/07 01:35:39 [debug] 20569#0: channel 14:15
2012/12/07 01:35:39 [notice] 20569#0: start worker process 24079
2012/12/07 01:35:39 [debug] 20569#0: pass channel s:3 pid:24079 fd:14 to s:0 pid:3401 fd:9
2012/12/07 01:35:39 [debug] 20569#0: pass channel s:3 pid:24079 fd:14 to s:1 pid:3402 fd:11
2012/12/07 01:35:39 [debug] 20569#0: pass channel s:3 pid:24079 fd:14 to s:2 pid:24078 fd:3
2012/12/07 01:35:39 [debug] 20569#0: child: 0 3401 e:0 t:0 d:0 r:1 j:0
2012/12/07 01:35:39 [debug] 20569#0: child: 1 3402 e:0 t:0 d:0 r:1 j:0
2012/12/07 01:35:39 [debug] 20569#0: child: 2 24078 e:0 t:0 d:0 r:1 j:1
2012/12/07 01:35:39 [debug] 20569#0: child: 3 24079 e:0 t:0 d:0 r:1 j:1
2012/12/07 01:35:39 [debug] 20569#0: sigsuspend
2012/12/07 01:35:39 [debug] 24078#0: malloc: 09340600:6144
2012/12/07 01:35:39 [debug] 24079#0: malloc: 09340600:6144
2012/12/07 01:35:39 [debug] 24078#0: malloc: 0931D3E0:102400




a proxy server where the problem does NOT occur:
================================================

i'd like to note that the nginx parent on this server has been running for only about 1 month.


12/07 01:04[root@proxy5-prod-ue1 ~]# nginx -t
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful

12/07 01:40[root@proxy5-prod-ue1 ~]# cat /etc/nginx/upstream.conf
## Tomcat via HTTP
upstream tomcats_http {
server webapp09e:8080 max_fails=2;
server webapp10e:8080 max_fails=2;
server roapp05e:8080 backup;
check interval=3000 rise=3 fall=3 timeout=1000 type=http default_down=false;
check_http_send "GET /healthcheck/version HTTP/1.0\r\n\r\n";
}
12/07 01:40[root@proxy5-prod-ue1 ~]# grep webapp /etc/hosts
12/07 01:41[root@proxy5-prod-ue1 ~]# # nothing as expected

12/07 01:42[root@proxy5-prod-ue1 ~]# ps wwwwaxuf | grep ngin[x]
root 4817 0.0 0.3 106184 5528 ? Ss Nov07 0:00 nginx: master process /usr/sbin/nginx -c /etc/nginx/nginx.conf
nginx 8396 0.6 0.8 116692 15488 ? S 00:36 0:25 \_ nginx: worker process
nginx 8397 0.6 0.8 116296 15096 ? S 00:36 0:25 \_ nginx: worker process



12/07 01:42[root@userproxy5-prod-ue1 ~]# /etc/init.d/nginx reload; tail -f /var/log/nginx/error.log
Reloading nginx: [ OK ]
2012/12/07 01:42:44 [debug] 8396#0: posted event 0000000000000000
2012/12/07 01:42:44 [debug] 8396#0: worker cycle
2012/12/07 01:42:44 [debug] 8396#0: accept mutex locked
2012/12/07 01:42:44 [debug] 8396#0: epoll timer: 399
2012/12/07 01:42:44 [notice] 4817#0: signal 1 (SIGHUP) received, reconfiguring
2012/12/07 01:42:44 [debug] 4817#0: wake up, sigio 0
2012/12/07 01:42:44 [notice] 4817#0: reconfiguring
2012/12/07 01:42:44 [debug] 4817#0: posix_memalign: 00000000007F1BA0:16384 @16
2012/12/07 01:42:44 [debug] 4817#0: posix_memalign: 000000000081FB60:16384 @16
2012/12/07 01:42:44 [debug] 4817#0: malloc: 00000000008C1980:4096
2012/12/07 01:42:44 [debug] 4817#0: read: 6, 00000000008C1980, 4096, 0
2012/12/07 01:42:44 [debug] 4817#0: malloc: 00000000006E0A80:6912
2012/12/07 01:42:44 [debug] 4817#0: malloc: 00000000007E59C0:4280
2012/12/07 01:42:44 [debug] 4817#0: malloc: 00000000007A0610:4280
2012/12/07 01:42:44 [debug] 4817#0: malloc: 0000000000731E00:4280
2012/12/07 01:42:44 [debug] 4817#0: malloc: 0000000000774AD0:4280
2012/12/07 01:42:44 [debug] 4817#0: malloc: 0000000000873750:4280
2012/12/07 01:42:44 [debug] 4817#0: malloc: 0000000000781760:4280
2012/12/07 01:42:44 [debug] 4817#0: posix_memalign: 00000000008D1170:16384 @16
2012/12/07 01:42:44 [debug] 4817#0: malloc: 00000000007EEA40:4096
2012/12/07 01:42:44 [debug] 4817#0: include /etc/nginx/mime.types
2012/12/07 01:42:44 [debug] 4817#0: include /etc/nginx/mime.types
2012/12/07 01:42:44 [debug] 4817#0: malloc: 000000000080F300:4096
2012/12/07 01:42:44 [debug] 4817#0: read: 8, 000000000080F300, 3463, 0
2012/12/07 01:42:44 [debug] 4817#0: malloc: 00000000006DCA90:4096
2012/12/07 01:42:44 [debug] 4817#0: posix_memalign: 00000000007642B0:16384 @16
2012/12/07 01:42:44 [debug] 4817#0: posix_memalign: 00000000008B5F40:16384 @16
2012/12/07 01:42:44 [debug] 4817#0: posix_memalign: 000000000075B000:16384 @16
2012/12/07 01:42:44 [debug] 4817#0: posix_memalign: 000000000087E390:16384 @16
2012/12/07 01:42:44 [debug] 4817#0: include upstream.conf
2012/12/07 01:42:44 [debug] 4817#0: include /etc/nginx/upstream.conf



our config
=====================
upstream.conf:

## Tomcat via HTTP
upstream tomcats_http {
server webapp02c:8080 max_fails=2;
server webapp06c:8080 max_fails=2;
server roapp02c:8080 backup;
check interval=3000 rise=3 fall=3 timeout=1000 type=http default_down=false;
check_http_send "GET /healthcheck/version HTTP/1.0\r\n\r\n";
}

nginx.conf:

user nginx;
worker_processes 2;
syslog local2 nginx;
error_log syslog:warn|/var/log/nginx/error.log;
pid /var/run/nginx.pid;
worker_rlimit_core 500M;
working_directory /var/coredumps/;
events {
worker_connections 1024;
}
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
proxy_buffers 8 16k;
proxy_buffer_size 32k;
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
access_log syslog:warn|/var/log/nginx/access.log main;
sendfile on;
keepalive_timeout 65;
gzip on;
server {
listen 80;
server_name _;
# put X-Purpose: preview into the trash. thank you Safari
if ($http_x_purpose ~* "preview") {
return 444;
break;
}
# http://wiki.nginx.org/HttpStubStatusModule
location /nginx-status {
stub_status on;
access_log off;
allow 10.0.0.0/8;
allow 127.0.0.1;
deny all;
}
location /upstream-status {
check_status;
access_log off;
allow 10.0.0.0/8;
allow 127.0.0.1;
deny all;
}
error_page 404 /404.html;
location = /404.html {
root /usr/share/nginx/error;
}
error_page 403 /403.html;
location = /403.html {
root /usr/share/nginx/error;
}
error_page 500 502 504 /500.html;
location = /500.html {
root /usr/share/nginx/error;
}
error_page 503 /503.html;
location = /503.html {
root /usr/share/nginx/error;
}
set $global_ssl_redirect 'yes';
if ($request_filename ~ "nginx-status") {
set $global_ssl_redirect 'no';
}
if ($request_filename ~ "upstream-status") {
set $global_ssl_redirect 'no';
}
if ($global_ssl_redirect ~* '^yes$') {
rewrite ^ https://$host$request_uri? permanent;
break;
}
}
## Keep upstream defs in a separate file for easier pool membership control
include upstream.conf;
server {
listen 443;
server_name _;
# put X-Purpose: preview into the trash. thank you Safari
if ($http_x_purpose ~* "preview") {
return 444;
break;
}
ssl on;
ssl_certificate certs/wildcard_void_com.crt;
ssl_certificate_key certs/wildcard_void_com.key;
ssl_protocols SSLv3 TLSv1;
ssl_ciphers HIGH:!ADH:!MD5;
ssl_session_cache shared:SSL:10m;
ssl_session_timeout 10m;
set_real_ip_from 10.0.0.0/8;
real_ip_header X-Forwarded-For;
add_header Cache-Control public;
## Tomcat via HTTP
location / {
proxy_pass http://tomcats_http;
proxy_connect_timeout 10s;
proxy_next_upstream error invalid_header http_503 http_502 http_504;
proxy_set_header Host $host;
proxy_set_header X-Server-Port $server_port;
proxy_set_header X-Server-Protocol https;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Strict-Transport-Security max-age=315360000;
proxy_set_header X-Secure true;
proxy_set_header Transfer-Encoding ""; # OPS-475 remove if/when we update/punt Tomcat
if ($request_uri ~* "\.(ico|css|js|gif|jpe?g|png)") {
expires 365d;
break;
}
}
error_page 404 /404.html;
location = /404.html {
root /usr/share/nginx/error;
}
error_page 403 /403.html;
location = /403.html {
root /usr/share/nginx/error;
}
error_page 500 502 504 /500.html;
location = /500.html {
root /usr/share/nginx/error;
}
error_page 503 /503.html;
location = /503.html {
root /usr/share/nginx/error;
}
}
}
SubjectAuthorPosted

nginx reload fails with [emerg] host not found in upstream

groknautDecember 06, 2012 09:37PM

Re: nginx reload fails with [emerg] host not found in upstream

姚伟斌December 06, 2012 11:26PM

Re: nginx reload fails with [emerg] host not found in upstream

groknautDecember 07, 2012 12:33AM

Re: nginx reload fails with [emerg] host not found in upstream

groknautDecember 07, 2012 01:22AM

Re: nginx reload fails with [emerg] host not found in upstream

Ruslan ErmilovDecember 07, 2012 02:24AM

Re: nginx reload fails with [emerg] host not found in upstream

groknautDecember 07, 2012 05:11PM



Sorry, only registered users may post in this forum.

Click here to login

Online Users

Guests: 64
Record Number of Users: 5 on December 17, 2014
Record Number of Guests: 154 on December 17, 2014
Powered by nginx    Powered by FreeBSD    PHP Powered    Powered by Percona     ipv6 ready