Welcome! Log In Create A New Profile

Advanced

RE: limit_rate based on User-Agent; how to exempt /robots.txt ?

Cameron Kerr
August 07, 2018 06:28PM
Hi Maxim, that's very helpful...

> -----Original Message-----
> From: nginx [mailto:nginx-bounces@nginx.org] On Behalf Of Maxim Dounin
> On Tue, Aug 07, 2018 at 02:45:02AM +0000, Cameron Kerr wrote:


> > Option 3: (does not work)

> This approach is expected to work fine (assuming you've used limit_req
> somewhere), and I've just tested the exact configuration snipped provided
> to be sure. If it doesn't work for you, the problem is likely elsewhere.

Thank you for the confirmation; I've retried it, and testing with ab, it seems to work, so I'm not sure what I was doing wrong previously.

I like the pattern of chaining maps; its nicely functional in my way of thinking.

For the sake of others, my configuration looks like the following:

http {

map $http_user_agent $user_agent_rate_key {
default "";
"~*(bot[/-]|crawler|robot|spider)" "robot";
"~ScienceBrowser/Nutch" "robot";
"~Arachni/" "robot";
}

map $uri $rate_for_spider_exempting {
default $user_agent_rate_key;
"/robots.txt" '';
}

limit_req_zone $rate_for_spider_exempting zone=per_spider_class:1m rate=100r/m;

limit_req_status 429;
server_tokens off;

server {
limit_req zone=per_spider_class;

location / {
proxy_pass http://routing_layer_http/;
}
}
}


And my testing:

// spider with non-exempted (ie. rate-limited for spiders) URI

$ ab -H 'User-Agent: spider' -n100 https://.../hostname | grep -e '^Complete requests:' -e '^Failed requests:'
Complete requests: 100
Failed requests: 98

// spider with exempted (ie. no-rate-limiting for spiders) URI

$ ab -H 'User-Agent: spider' -n100 https://.../robots.txt | grep -e '^Complete requests:' -e '^Failed requests:'
Complete requests: 100
Failed requests: 0

// non-spider with exempted (ie. rate-limited for spiders) URI

$ ab -n100 https://.../robots.txt | grep -e '^Complete requests:' -e '^Failed requests:'
Complete requests: 100
Failed requests: 0

// non-spider with non-exempted (ie. no-rate-limiting for spiders) URI

$ ab -n100 https://.../hostname | grep -e '^Complete requests:' -e '^Failed requests:'
Complete requests: 100
Failed requests: 0


Thanks again for your feedback

Cheers,
Cameron

_______________________________________________
nginx mailing list
nginx@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx
Subject Author Posted

limit_rate based on User-Agent; how to exempt /robots.txt ?

Cameron Kerr August 06, 2018 10:46PM

Re: limit_rate based on User-Agent; how to exempt /robots.txt ?

Peter Booth via nginx August 07, 2018 01:58AM

Re: limit_rate based on User-Agent; how to exempt /robots.txt ?

Maxim Dounin August 07, 2018 08:00AM

RE: limit_rate based on User-Agent; how to exempt /robots.txt ?

Cameron Kerr August 07, 2018 06:28PM



Sorry, only registered users may post in this forum.

Click here to login

Online Users

Guests: 232
Record Number of Users: 8 on April 13, 2023
Record Number of Guests: 421 on December 02, 2018
Powered by nginx      Powered by FreeBSD      PHP Powered      Powered by MariaDB      ipv6 ready