This comic follows on from the Previous comic which will almost certainly provide context.
You might not wanna be famous, but when you’re level 10, every organization within a mile is watching what you’re doing.
This won’t fix it but it might help.
Make sure you have a robots.txt file with a crawl delay set for all agents once every 30 seconds and that you are disallowing most of the WordPress directories such as WP admin, the media directory, etc.
I would also strongly recommend that you use a caching system if you are not using one. It’s a lot more efficient to serve the same image a hundred times to different bots from the ram than loading it off your drive.
Just my personal opinions working in a web hosting environment.
That’ll probably help if it’s i/o issues.
most of these AI scrapers don’t respect robots.txt, so I’m not sure that really helps much, but… we have tried doing all of these things.
Someone on lemmy suggested to create a dummy endpoint that normal people won’t be able to navigate to, and disallow it in robots.txt
Then when somebody crawls it you know they are ignoring robots.txt, and you ip ban them
That’s pretty clever.
I think that these AI scrapers might be smart enough that this doesn’t really work though - at least if I were designing them I’d have them all come from dynamic IPs and not have any of them bother hitting the same target more than once. These things are very dedicated to acquiring content without consent, and if they’re capable of causing problems for (say) Reddit, I’m not sure my little website is going to have much luck deterring them.
Honestly a better strategy might be to just glaze everything I draw.
Yeah, you might need some combination of fail2ban for rude AI and cloudflare caching or something.
Whoever their host is, they already appear to have some type of load balancing based on the four IPS. But I would also agree that a free cloudflare account does wonders for most WordPress users. But that’s probably mostly because it filters out a shitload of bots and known bad actors. Just make sure you set up your origin certificates if you use a cloudflare account.