Along with the new host, I finally got around to installing some web stats software .. and by install I mean I clicked the “Install Webalizer” button on the host’s management page .. sometimes it’s nice to do stuff the easy way 😉
Anyway, I’m looking at these stats .. and one single host has accounted for nearly 70% of the traffic (>200 megs of it) and 40% of the hits. Holy crap! About every 4 seconds (!?) I get a hit from rss.allresearch.com with a user agent of "Mozilla/4.0 (WebClipping.com)"
. And if just grabbing all of spoon.cx in less than a minute wasn’t bad enough, they’re fetching each page multiple times .. in the same day .. not the front page of the blog or anything, but the archives, pages which haven’t changed in years, literally! Some of these pages they fetched 100 – 200 times in the last 2 days!
So this new host also has the ability to block IP’s .. that address again was rss.allresearch.com or 38.144.36.19. For now, this is faster than telling them their crawler is broken, assuming they don’t already know (which I bet they do) and that they care (bet they don’t) .. At least I get a chance to try out that new nofollow tag 😉