Welcome! Log In Create A New Profile

Advanced

Re: rewrite question

June 11, 2018 09:42AM
I see another poster have written this, and deleted it afterwards.

`This is almost certainly not Google as they obey robots.txt. The & to &
conversion is another sign of a poor quality crawler. Check the RDNS and
you will find it's probably some IP faking Google UA, I suggest blocking at
network level.`

My actual reply:


1 - It is Google
2 - They do not always a user friendly user agent. That is a fact.
3 - When they don't, they also don't follow robots.txt.

So my problem remains.

I don't want to block those IP ranges at iptables level because it's Google. So a rewrite or redirect - I'm not sure exactly which ATM is badly needed. Depends on the URL.

Here are the IP ranges, definetely Google. Referenced in https://github.com/mitchellkrogza/nginx-ultimate-bad-bot-blocker/issues/175

And here is a copy of my original message.

"Hi,

I'm still faithful to your script. It does great things to my websites. Thanks for that.

Not a bug properly speaking, just a constatation you might like,

Recently, 1-2 months in time, I got a lot of strange impossible requests all with the same User-Agent, no referrer and HTTP/1.1. All came from Google. They do not respect robots.txt and sniff everywhere they're not supposed to. I thought you should be make aware of it.

I know you whitelist Google IPs, but after inspection from other users, you might want to revisit those.

User-agent:
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.71 Safari/537.36"

Ranges:
66.249.64.0/19
72.14.199.0/24

Examples of request:
72.14.199.18 - - [27/May/2018:14:12:01 -0700] "GET /page.php?page%3Dabout_himeji_forklifts&amp HTTP/1.1" 301 178 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.71 Safari/537.36"
72.14.199.4 - - [27/May/2018:14:12:24 -0700] "GET /page.php?page%3Dabout_himeji_forklifts&amp HTTP/1.1" 302 165 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.71 Safari/537.36"

In the meantime, I circumvented your whitelist by issuing manual range bans. After 6 weeks, no more of those strange requests, and bandwidth has dropped significantly since those 2 ranges were requestings quite a few hundred of megabytes each day!

Thanks again."
Subject Author Posted

rewrite question

shiz June 07, 2018 07:57PM

Re: rewrite question

shiz June 11, 2018 09:42AM

Re: rewrite question

shiz June 11, 2018 09:50AM



Sorry, only registered users may post in this forum.

Click here to login

Online Users

Guests: 274
Record Number of Users: 8 on April 13, 2023
Record Number of Guests: 421 on December 02, 2018
Powered by nginx      Powered by FreeBSD      PHP Powered      Powered by MariaDB      ipv6 ready