« Good Bot | Main | RBN »
Saturday, April 11, 2009
Blocking the *.amazonaws.com domain with ZB Block, and why.
This domain has been a continual source of content theft and hacking attempts.
Now first, I must admit that I have seen a couple good services using a *.amazonaws.com domain name, but all of the domain names are cryptic, and you just can't be sure you aren't dealing with a spoofed user client string. Now onto some finds!
Tynted
Host:
ec2-67-202-60-246.compute-1.amazonaws.com
User Agent: Java/1.6.0_02
Here's the most egregious of the lot, tynt.com. This site claims straight out that it's copying the content of your site. Who da #&*%! gave them that right, especially when I claim copyright? Also, they will cause duplicate content to appear on the web, and in the eyes of google, this messes up your page rank, badly! But, that's not the worst thing...
EVEN WORSE tynt.com / tynted.net act as a no-registration-required proxy server! This allows previously blocked hackers, to come right back in and start pushing, pulling, tweaking, and investigating your site. This bad behaviour was the genesis of me blocking them. This by itself is bad, but wait, there's MORE...
REDIFF
Host:
ec2-72-44-45-196.compute-1.amazonaws.com
User Agent: rdfbot/1.0
(Indian Language Web Search Engine; Rediff.com; rdfbotsupport AT
rediffmailpro DOT com)
No habla hindi senõr! This is actually a content scraper, and their site seemed to be in English.
SimilarPages
Host:
ec2-174-129-187-47.compute-1.amazonaws.com
User Agent:
SimilarPages/Nutch-1.0-dev (SimilarPages Nutch Crawler;
http://www.similarpages.com; info@similarpages.com)
If this isn't saying "Hi, I'm an SEO scraper!" I don't know what it's saying. Buhbyenow. Usually Nutch is used by scrapers.
Conductor
Host:
ec2-72-44-52-94.compute-1.amazonaws.com
User Agent: Caliperbot/1.0
(+http://www.conductor.com/caliperbot)
They say (here): "Perfect ads are only possible when the publisher retains 100% editorial control over content and advertising. It's possible with Conductor. If interested, first review our publisher requirements and then submit your site for review."
I say: "I never submitted my site for review, so why are you here? I use, and am happy with adsense."
They say (here): "So if you can compete with those other articles, other competitors, those other affiliates and aggregators that are in front of you - you can discover millions of dollars of revenue every year - without even taking into consideration brand value or the synergy that results when you appear on the first page in both paid and natural search."
I say: "So you're really keyword spamming SEO scum. Get lost. My site is high ranked for content, not stolen words."
***
I am sure there will be more as time goes on, the next version of ZB Block's signatures should have bypasses for the valid bots (currently under test), but for now, the AmazonAWS cloud is banned.
Zap.
UPDATE: The bypasses are in. Amazon AWS can be blocked from your site with impunity, without harming any valid search engines.
Edited on: Tuesday, June 02, 2009 3:07 PM Mountain Daylight Time
Categories: Content Thieves, Odd Bot, Scrape Bot
Monday, March 23, 2009
Possible new kind of attack on your website, and revenue stream... Defamation by HTTP Referer!
I am not going to pretend I know ALL the inner workings of google adsense, but if bots are hitting your page, and dropping fake icky referrers like...
http://www.cigarclub.tld
http://cigarettes.cheap-24h.tld
http://www.pillthrills.tld
and
http://slimy-tentacle-hentai.pornshop.tld
They must be trying to convince something that actually sees the referrer of the visit, that you are linked from their crap pages... something perhaps like - - - Google Ad Sense?
That is my best guess, and now ZB Block has rules designed to block connections that contain reputation damaging HTTP_REFERERs. Included in Signature Update #24. Sure am glad I wrote it from the beginning to be extensible, as this only needed a signature update.
Now I just wonder how much damage they've done to my reputation
allready.
UPDATE: It's amazing how often the word sex pops up in referers. One of my most important ones, NOAA/NWS, has the word in their site exit page URL. It's buried in the string "nwsexit". OOPS ON ME.
Removed that one detection...
UPDATE 2: I think I am going to mothball these detections till I can find a narrower way to detect these problem fake linkers. Let us just pray that Google is smart enough to ignore crap referrers.
Edited on: Tuesday, March 24, 2009 10:08 AM Mountain Daylight Time
Categories: Odd Bot, Security Musings, Spam Bot
