« Stop Keyword Poaching - It's mutiny on your bounty! | Main | PROOF THAT DEDIBOX.FR IS HOSTILE, and possibly laycat too. »

Saturday, April 11, 2009

Blocking the *.amazonaws.com domain with ZB Block, and why.

This domain has been a continual source of content theft and hacking attempts.

Now first, I must admit that I have seen a couple good services using a *.amazonaws.com domain name, but all of the domain names are cryptic, and you just can't be sure you aren't dealing with a spoofed user client string. Now onto some finds!

Tynted
Host: ec2-67-202-60-246.compute-1.amazonaws.com
User Agent: Java/1.6.0_02

Here's the most egregious of the lot, tynt.com. This site claims straight out that it's copying the content of your site. Who da #&*%! gave them that right, especially when I claim copyright? Also, they will cause duplicate content to appear on the web, and in the eyes of google, this messes up your page rank, badly! But, that's not the worst thing...

EVEN WORSE tynt.com / tynted.net act as a no-registration-required proxy server! This allows previously blocked hackers, to come right back in and start pushing, pulling, tweaking, and investigating your site. This bad behaviour was the genesis of me blocking them. This by itself is bad, but wait, there's MORE...

REDIFF
Host: ec2-72-44-45-196.compute-1.amazonaws.com
User Agent: rdfbot/1.0 (Indian Language Web Search Engine; Rediff.com; rdfbotsupport AT rediffmailpro DOT com)

No habla hindi senõr! This is actually a content scraper, and their site seemed to be in English.

SimilarPages
Host: ec2-174-129-187-47.compute-1.amazonaws.com
User Agent: SimilarPages/Nutch-1.0-dev (SimilarPages Nutch Crawler; http://www.similarpages.com; info@similarpages.com)

If this isn't saying "Hi, I'm an SEO scraper!" I don't know what it's saying. Buhbyenow. Usually Nutch is used by scrapers.

Conductor
Host: ec2-72-44-52-94.compute-1.amazonaws.com
User Agent: Caliperbot/1.0 (+http://www.conductor.com/caliperbot)

They say (here): "Perfect ads are only possible when the publisher retains 100% editorial control over content and advertising. It's possible with Conductor. If interested, first review our publisher requirements and then submit your site for review."

I say: "I never submitted my site for review, so why are you here? I use, and am happy with adsense."

They say (here): "So if you can compete with those other articles, other competitors, those other affiliates and aggregators that are in front of you - you can discover millions of dollars of revenue every year - without even taking into consideration brand value or the synergy that results when you appear on the first page in both paid and natural search."

I say: "So you're really keyword spamming SEO scum. Get lost. My site is high ranked for content, not stolen words."

***

I am sure there will be more as time goes on, the next version of ZB Block's signatures should have bypasses for the valid bots (currently under test), but for now, the AmazonAWS cloud is banned.

Zap.

UPDATE: The bypasses are in. Amazon AWS can be blocked from your site with impunity, without harming any valid search engines.

Posted by Zaphod at 2:58 PM Mountain Daylight Time
Edited on: Tuesday, June 02, 2009 3:07 PM Mountain Daylight Time
Categories: Content Thieves, Odd Bot, Scrape Bot