Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

One specific and ubiquitous example of webspam has been driving me nuts this week: An enterprising spammer has won huge on Amazon Web Services (AWS)-related keywords.

For example: https://encrypted.google.com/search?hl=en&q=aws+s3+emr+p...

Result #4 at the moment is "AWS Developer Forums: Interactions between S3, EMR and HDFS ..." on http://www.hackzq8search.appspot.com/developer...com/...

What's sublime about this example is that:

1. hackzq8search is clone of AWS's websites amazonwebservices.com, aws.typepad.com, etc

2. hackzq8search is hosted on appspot.com, Google's App Engine domain

3. hackzq8search is over quota, so the site doesn't show any content anyway.

Yet this site was the top search result, beating out the site it was cloning, time and time again on my AWS/EMR-related searches this week.

The one mitigating aspect as that hackzq8search's URL naming scheme is easily decodable -- the hackzq8search URL includes the full URL of the cloned URL, so I can write a Greasemonkey script to extract the proper original URL.

I found a glimmer of optimism in that the site has been slowly fading in SEO-success this week: I complained about https://encrypted.google.com/search?q=aws+s3+security+sox+pc...  on Thursday, but on Friday the hackzq8search Search Result was gone from the first search result page.

It's still not hard to slam some AWS-related keywords into Google and get these bogus results, though.



Someone else already reported this. There's been some weird stuff going on with AWS-type pages, e.g. see http://news.ycombinator.com/item?id=2103401 for example. I don't know the exact cause, but I know the indexing team is aware of this issue and working on it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: