Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A less aggressive approach I've encountered is to insert links to other pages of your website with full URLs (http://www.example.com/page.html over just /page.html). Usually, a scraper will copy the links too. This should then make it obvious that the content's been scraped.

This could become a nightmare to maintain if you don't automate it. It'd be trivial to automate on a CMS. I know WordPress has loads of plugins for exactly this. I don't think I've come across something that can do this for static websites though, which make up for the brunt of the websites I maintain.



wget --mirror -p --convert-links -P ./LOCAL-DIR WEBSITE-URL


If your data matters at all to the scraper, this won't present much of an obstacle. Fixing up internal links is really easy. Your average script kiddie could probably figure out how to work around this in a matter of hours. Like so many "tricks", it sounds like more trouble for you than for them.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: