Anonymous User
July 29, 2012 12:22PM
I'm attempting to tame (minimize or eliminate) Yandex bot access.

I'd like to understand the application/precedence of the rules I apply.

To my site config I've added

map $http_user_agent $bad_bot {
default 0;
~(Yandex|YandexBot) 1;
}

map $http_referrer $bad_referrer {
default 0;
~*(yandex) 1;
}

valid_referers mydomain.com *.mydomain.com localhost 127.0.0.1
[::1];

location / {
if ($bad_bot) {return 403;}
if ($bad_referrer) {return 403;}
if ($invalid_referer) {return 444;}
...
}

and

cat /robots.txt
User-agent: *
Disallow: /

cat /robot_ssl.txt
User-agent: *
Disallow: /

In my logs I see repeating '444' rejections:

100.43.83.148 - - [28/Jul/2012:06:02:14 -0500] GET /robots.txt
HTTP/1.1 "444" 0 "-" "Mozilla/5.0 (compatible; YandexBot/3.0;
+http://yandex.com/bots)" "-"
100.43.83.148 - - [28/Jul/2012:06:06:23 -0500] GET /robots.txt
HTTP/1.1 "444" 0 "-" "Mozilla/5.0 (compatible; YandexBot/3.0;
+http://yandex.com/bots)" "-"


With my rules above, I'd expect that to be a '403' rejection, as
specified for the "$bad_bot" check.

Why am I seeing the '444' instead of the '403'?

_______________________________________________
nginx mailing list
nginx@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx
Subject Author Posted

bot-taming: which rules should apply, and in which order?

Anonymous User July 29, 2012 12:22PM



Sorry, only registered users may post in this forum.

Click here to login

Online Users

Guests: 263
Record Number of Users: 8 on April 13, 2023
Record Number of Guests: 421 on December 02, 2018
Powered by nginx      Powered by FreeBSD      PHP Powered      Powered by MariaDB      ipv6 ready