I really wish that the Internet Archive would provide bulk access to the Wayback...

nekopa · on Aug 17, 2015

Is that even possible? I don't know the latest size of the IA, but it must be ridiculously huge by now, (1 billion pages a week added) bandwidth cost would be massive.

Maybe they could offer a mail-us-a-multi-petabyte-hdd service... Returned a few weeks later full of data :)

Titanous · on Aug 17, 2015

It's totally possible, they already have the infrastructure in place and 14PB of data available for download. Unfortunately the Wayback Machine data is not currently exposed publicly.

nekopa · on Aug 17, 2015

Why do you think that is? It seems like they are really open with most of their stuff, so why haven't they exposed the wayback with an api?

Then again, wouldn't it be pretty trivial to scrape?

(I say this as I'm working on a hellish scraping project, and the wayback machine seems like it would be a walk in the park to scrape)

justin66 · on Aug 17, 2015

> I really wish that the Internet Archive would provide bulk access to the Wayback Machine dataset.

Have you asked them? Did they refuse outright?

Titanous · on Aug 17, 2015

I have not personally asked, but I think the Archive Team has. I don't know the reasons behind the policy.