Cloudflare Adds "Block AI Scrapers and Crawlers" to Security Options

That's huge. Do AI bots not honor robots.txt entries?
 
Some don't! Several months ago I was contacted by an org I volunteer for and they told me their Mediawiki site was shut down. After investigating,there appeared to be a sort of DOS attack coming from Amazon servers that was "indexing" the site extremely fast and in an incorrect way. If the robots.txt was obeyed it would have crawled correctly, and trying to access 10-12 pages a second was not expected. It also didn't trigger Cloudflare's or the VPS host's DDOS protection. I blocked the Amazon servers for the time being and restarted the host server and it stopped crashing from overbearing traffic.

Imagine a bot completely ignoring this and the sitemap and trying to open pages with about 6 or more ? queries on a Mediawiki site:
Code:
Disallow: /index.php?diff=
Disallow: /index.php?oldid=
Disallow: /index.php?title=Help
Disallow: /index.php?title=Image
Disallow: /index.php?title=MediaWiki
Disallow: /index.php?title=Special:
Disallow: /index.php?title=Template

I was not the only one to notice such strange behavior.

More recently I was made aware of this company and how it behaved while using Amazon servers, and finally understood what had happened.
 
1720067153369.png

Honestly surprising to me, didn't expect it that high.

But yeah for basically everything we use we'd block AI bots from, it isn't like search crawlers where they're specifically useful to our interests at all, and the load of providing content to bots that are just sucking everything up is less than desirable.
 
Back
Top