Forum List Message List New Topic Print View

davidr

April 22, 2009 08:17PM

Registered: 15 years ago
Posts: 1

Guys,

I need some help. In the past few months, the site that I administer (it's a large medical non-profit/charity) has been attacked by content scraping bots. Basically, these content thieves scrape our sites and then repost the information on their own domains and also intersperse it with malware, ads. They quite often rank fairly high on Google because of it and when a user gets infected, they blame us. I've been asking google to delist these sites but that takes days/weeks.

These scrapers obviously don't care about robots.txt and they just indiscriminately scrape the content and ignore all the rules. I've been blocking these scrapers manually but by the time I'm aware of the problem, it's already too late. They really inflict a lot of damage to our database performance and many users complain that the site is too slow at times. When we correlate the data, we see that the slowdown occurs while these thieves are scraping the site.

What's the best way to limit the number of requests an IP can make in a, say 15 min, time period, for example? Is there a way to block them on a webserver (nginx) layer and move it away from an application layer since app layer blocking incurs too much of a performance hit? I'm looking for something that would simply count for the number of requests over a particular time period and just add the IP to iptables if it ever crosses the limit.

Any advice is much appreciated!!

Thank you,

Dave

Reply Quote

RSS

Subject	Author	Posted
Help: How to deal with content scrapers?	davidr	April 22, 2009 08:17PM
Re: Help: How to deal with content scrapers?	Kon Wilms	April 22, 2009 08:44PM
Re: Help: How to deal with content scrapers?	Jonathan Vanasco	April 22, 2009 09:41PM

Sorry, only registered users may post in this forum.

Click here to login

Online Users

Guests: 179

Record Number of Users: 8 on April 13, 2023

Record Number of Guests: 421 on December 02, 2018

Help: How to deal with content scrapers?

Re: Help: How to deal with content scrapers?

Re: Help: How to deal with content scrapers?

Online Users