Hi,
I'm facing a strange problem using Nginx with reverse proxy to a ruby Thin Server.
When someone calls an unrouteable path in our Rails App, Nginx launches a 502 - Bad Gateway correctly.
When Nginx receives another resquest (just after the 502), that is valid and routeable, it launches directly again a 502 - Bad Gateway without passing the request to the Thin Server (we can check this in the nginx logs and the app logs).
When I call the same valid URL again, the Nginx passes the call to the Thin Server and it returns to normal operation.
It seems to cache something in Nginx. When this happens and only in the third call it gets back to normal track.
I've researched the Internet for this issue but nothing conclusive...
As a workaround, I've changed the configuration disabiling the next_upstream.
This resolved my problem, but I've lost the next_upstream feature that calls the others servers when one of them fail...
Here is our nginx configuration:
---------------------------------------------------------------------------
upstream reg-prov {
server 127.0.0.1:5010;
server 127.0.0.1:5011;
server 127.0.0.1:5012;
server 127.0.0.1:5013;
}
server {
listen 443;
ssl on;
ssl_certificate /etc/nginx/conf.d/server.crt;
ssl_certificate_key /etc/nginx/conf.d/server.key;
server_name reg-prov.registro.systemintegration.locaweb.com.br;
access_log /var/log/nginx/reg-prov.access.log;
root /var/www/reg-prov/public;
location / {
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $http_host;
proxy_set_header X-Forwarded-Proto https;
proxy_redirect off;
proxy_next_upstream off;
if (-f $request_filename/index.html) {
rewrite (.*) $1/index.html break;
}
if (-f $request_filename.html) {
rewrite (.*) $1.html break;
}
if (!-f $request_filename) {
proxy_pass http://reg-prov;
break;
}
}
}
---------------------------------------------------------------------------
Nginx version: 0.7.67-3+squeeze2
Here is the behavior I described (log files).
With the next upstream DISABLED (the workaround):
---------------------------------------------------------------------------
********************
* Nginx Access Log *
********************
>>> First Call (invalid)
187.45.255.164 - - [28/Mar/2012:10:14:11 -0300] "GET /domain_availability/disponivel.mx?ticket=ST-28035-WPQ4MirFJIKxwagfIIUe-cas HTTP/1.1" 502 173 "-" "Ruby"
>>> Second Call (valid)
187.45.255.164 - - [28/Mar/2012:10:14:21 -0300] "GET /domain_availability/disponivel.com?ticket=ST-28035-WPQ4MirFJIKxwagfIIUe-cas HTTP/1.1" 200 125 "-" "Ruby"
***********
* App Log *
***********
>>> First Call (invalid)
[2012-03-28 10:14:11] INFO -- : Started GET "/domain_availability/disponivel.mx?ticket=ST-28035-WPQ4MirFJIKxwagfIIUe-cas" for 187.45.255.164 at 2012-03-28 10:14:11 -0300
>>> Second Call (valid)
[2012-03-28 10:14:21] INFO -- : Started GET "/domain_availability/disponivel.com?ticket=ST-28035-WPQ4MirFJIKxwagfIIUe-cas" for 187.45.255.164 at 2012-03-28 10:14:21 -0300
[2012-03-28 10:14:21] INFO -- : Processing by DomainAvailabilityController#verisign as XML
[2012-03-28 10:14:21] INFO -- : Parameters: {"ticket"=>"ST-28035-WPQ4MirFJIKxwagfIIUe-cas", "id"=>"disponivel.com"}
[2012-03-28 10:14:21] INFO -- : Guessed service url: "nmst.api.systemintegration.locaweb.com.br"
[2012-03-28 10:14:21] INFO -- : Validate CAS URI = https://sso.systemintegration.locaweb.com.br/cas-sys/serviceValidate?service=nmst.api.systemintegration.locaweb.com.br&ticket=ST-28035-WPQ4MirFJIKxwagfIIUe-cas
[2012-03-28 10:14:21] INFO -- : Ticket "ST-28035-WPQ4MirFJIKxwagfIIUe-cas" for service "nmst.api.systemintegration.locaweb.com.br" belonging to user "registro" is VALID.
[2012-03-28 10:14:21] INFO -- : [127.0.0.1 DomainAvailabilityController#verisign disponivel.com registro] User-Agent: Ruby
[2012-03-28 10:14:21] INFO -- : [127.0.0.1 DomainAvailabilityController#verisign disponivel.com registro] Accept: */*; q=0.5, application/xml
[2012-03-28 10:14:21] INFO -- : [127.0.0.1 DomainAvailabilityController#verisign disponivel.com registro] Referer:
[2012-03-28 10:14:21] INFO -- : Rendered domain_availability/verisign.verisign_domain_availability.builder (5.3ms)
[2012-03-28 10:14:21] INFO -- : [127.0.0.1 DomainAvailabilityController#verisign disponivel.com registro] Response:
<?xml version="1.0" encoding="UTF-8"?>
<domain_availability>
<available>true</available>
</domain_availability>
[2012-03-28 10:14:21] INFO -- : Completed 200 OK in 224ms (Views: 15.4ms | ActiveRecord: 0.0ms) - Path: /domain_availability/disponivel.com?ticket=ST-28035-WPQ4MirFJIKxwagfIIUe-cas
---------------------------------------------------------------------------
Now with the next upstream ENABLED:
---------------------------------------------------------------------------
********************
* Nginx Access Log *
********************
>>> First Call (invalid)
187.45.255.164 - - [28/Mar/2012:10:21:51 -0300] "GET /domain_availability/disponivel.mx?ticket=ST-28035-WPQ4MirFJIKxwagfIIUe-cas HTTP/1.1" 502 173 "-" "Ruby"
>>> Second Call (valid)
187.45.255.164 - - [28/Mar/2012:10:21:55 -0300] "GET /domain_availability/disponivel.com?ticket=ST-28035-WPQ4MirFJIKxwagfIIUe-cas HTTP/1.1" 502 173 "-" "Ruby"
>> Nginx does not call the app, it returns 502 directly. Here it seems to cache something...
>>> Third Call (valid)
187.45.255.164 - - [28/Mar/2012:10:22:00 -0300] "GET /domain_availability/disponivel.com?ticket=ST-28035-WPQ4MirFJIKxwagfIIUe-cas HTTP/1.1" 200 125 "-" "Ruby"
>> Success...
***********
* App Log *
***********
>>> First Call (invalid)
[2012-03-28 10:21:51] INFO -- : Started GET "/domain_availability/disponivel.mx?ticket=ST-28035-WPQ4MirFJIKxwagfIIUe-cas" for 187.45.255.164 at 2012-03-28 10:21:51 -0300
[2012-03-28 10:21:51] INFO -- : Started GET "/domain_availability/disponivel.mx?ticket=ST-28035-WPQ4MirFJIKxwagfIIUe-cas" for 187.45.255.164 at 2012-03-28 10:21:51 -0300
[2012-03-28 10:21:51] INFO -- : Started GET "/domain_availability/disponivel.mx?ticket=ST-28035-WPQ4MirFJIKxwagfIIUe-cas" for 187.45.255.164 at 2012-03-28 10:21:51 -0300
[2012-03-28 10:21:51] INFO -- : Started GET "/domain_availability/disponivel.mx?ticket=ST-28035-WPQ4MirFJIKxwagfIIUe-cas" for 187.45.255.164 at 2012-03-28 10:21:51 -0300
>> Here we have the 4 calls because the next upstream is enabled, only when the last server is called the error is returned by nginx, so he keep trying with all servers.
>>> Second Call (valid)
>> Nothing happens in the app log. It gets nothing from nginx...
>>> Third Call (valid)
[2012-03-28 10:22:00] INFO -- : Started GET "/domain_availability/disponivel.com?ticket=ST-28035-WPQ4MirFJIKxwagfIIUe-cas" for 187.45.255.164 at 2012-03-28 10:22:00 -0300
[2012-03-28 10:22:00] INFO -- : Processing by DomainAvailabilityController#verisign as XML
[2012-03-28 10:22:00] INFO -- : Parameters: {"ticket"=>"ST-28035-WPQ4MirFJIKxwagfIIUe-cas", "id"=>"disponivel.com"}
[2012-03-28 10:22:00] INFO -- : Guessed service url: "nmst.api.systemintegration.locaweb.com.br"
[2012-03-28 10:22:00] INFO -- : Validate CAS URI = https://sso.systemintegration.locaweb.com.br/cas-sys/serviceValidate?service=nmst.api.systemintegration.locaweb.com.br&ticket=ST-28035-WPQ4MirFJIKxwagfIIUe-cas
[2012-03-28 10:22:00] INFO -- : Ticket "ST-28035-WPQ4MirFJIKxwagfIIUe-cas" for service "nmst.api.systemintegration.locaweb.com.br" belonging to user "registro" is VALID.
[2012-03-28 10:22:00] INFO -- : [127.0.0.1 DomainAvailabilityController#verisign disponivel.com registro] User-Agent: Ruby
[2012-03-28 10:22:00] INFO -- : [127.0.0.1 DomainAvailabilityController#verisign disponivel.com registro] Accept: */*; q=0.5, application/xml
[2012-03-28 10:22:00] INFO -- : [127.0.0.1 DomainAvailabilityController#verisign disponivel.com registro] Referer:
[2012-03-28 10:22:00] INFO -- : Rendered domain_availability/verisign.verisign_domain_availability.builder (6.1ms)
[2012-03-28 10:22:00] INFO -- : [127.0.0.1 DomainAvailabilityController#verisign disponivel.com registro] Response:
<?xml version="1.0" encoding="UTF-8"?>
<domain_availability>
<available>true</available>
</domain_availability>
[2012-03-28 10:22:00] INFO -- : Completed 200 OK in 233ms (Views: 16.9ms | ActiveRecord: 0.0ms) - Path: /domain_availability/disponivel.com?ticket=ST-28035-WPQ4MirFJIKxwagfIIUe-cas
---------------------------------------------------------------------------
Thanks in advance!
Daniel Silvestre