Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This looks incorrect. http://web.archive.org/web/20110713000446id_/http://www.goog... is only 28K according to ls -lh.

Perhaps OP forgot to put id_ after the capture timestamp in the archive URLs? The id_ makes sure that the Wayback Machine only returns the page as it was when it was indexed.



I definitely used id_ . I saved the source that I used; I'll recheck things.


Should be fixed now. The last point is still ~100k. Running Chromium, if I'm logged in, google.com's HTML saves as 104k. Logged out, it's 94k, as measured by ls -lh.


Looks good. I was using cURL, so that explains the ~100k vs. ~28k discrepancy.


You are correct, some of them slipped past, and contain archive.org code. It's quick to fix.

Thank you.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: