Cloudflare Adds "Block AI Scrapers and Crawlers" to Security Options

Galaxy Littlepaws · Jul 3, 2024

Declare your AIndependence: block AI bots, scrapers and crawlers with a single click

To help preserve a safe Internet for content creators, we’ve just launched a brand new “easy button” to block all AI bots. It’s available for all customers, including those on our free tier.

blog.cloudflare.com

brb3 · Jul 3, 2024

That's huge. Do AI bots not honor robots.txt entries?

Galaxy Littlepaws · Jul 3, 2024

Some don't! Several months ago I was contacted by an org I volunteer for and they told me their Mediawiki site was shut down. After investigating,there appeared to be a sort of DOS attack coming from Amazon servers that was "indexing" the site extremely fast and in an incorrect way. If the robots.txt was obeyed it would have crawled correctly, and trying to access 10-12 pages a second was not expected. It also didn't trigger Cloudflare's or the VPS host's DDOS protection. I blocked the Amazon servers for the time being and restarted the host server and it stopped crashing from overbearing traffic.

Imagine a bot completely ignoring this and the sitemap and trying to open pages with about 6 or more ? queries on a Mediawiki site:

Code:

Disallow: /index.php?diff=
Disallow: /index.php?oldid=
Disallow: /index.php?title=Help
Disallow: /index.php?title=Image
Disallow: /index.php?title=MediaWiki
Disallow: /index.php?title=Special:
Disallow: /index.php?title=Template

I was not the only one to notice such strange behavior.

More recently I was made aware of this company and how it behaved while using Amazon servers, and finally understood what had happened.

Amazon probing AI startup Perplexity for ‘scraping’ websites without permission: report

Scrutiny of Perplexity’s practices has intensified after Forbes accused the company of “directly ripping off” articles written by its reporters.

nypost.com

StrangeWill · Jul 4, 2024

Honestly surprising to me, didn't expect it that high.

But yeah for basically everything we use we'd block AI bots from, it isn't like search crawlers where they're specifically useful to our interests at all, and the load of providing content to bots that are just sucking everything up is less than desirable.

Cloudflare Adds "Block AI Scrapers and Crawlers" to Security Options

Galaxy Littlepaws

New member

Declare your AIndependence: block AI bots, scrapers and crawlers with a single click

brb3

Archduke of the devanoogans

Galaxy Littlepaws

New member

Amazon probing AI startup Perplexity for ‘scraping’ websites without permission: report

StrangeWill

Administrator