Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How on earth do they backup all that to tape?

Must be a massive amount of tape.



If you look at http://www.oracle.com/us/products/servers-storage/storage/ta... it talks about the reduced cost of ownership for a 20PB tape library, so there are tape products available for large volumes of data. The SL8500 supports 500PB of raw storage.


If everybody filled their mailbox, 500 PB would back up 66.1 million Gmail accounts. Since they don't, we're probably looking at everybody on Earth: 73.8 MB per mailbox, which sounds like a pretty good shot at an average.

Looking at the SL8500, I think I'd intentionally lose data just so I could play with it to restore it.


>> Looking at the SL8500, I think I'd intentionally lose data just so I could play with it to restore it.

And this is why high-paid sysadmins are serious nasty hardasses when they're clocked in. :)


Would 74 MB really cut it? I'm probably "above average" but mine uses 3 gb.


There's probably a lot of people that don't "get" Gmail and delete all of their messages. There's probably also a lot of people who sign up for an account and don't use it, ever, and they contribute to the numbers. Keep in mind ~80 MB is for everyone on the planet (all nearly 7 billion) in 500 PB.

I'm betting the average is a lot lower than we'd think.


74MB seems little to me too (I am thinking of how often people have sent me ppt of cute cats, word documents of a todo list and tiff scans of some thing they drew).

It would be nice if grandparent has some reference for such a number to share


Consider de-duplication.


The hard part is backing up that kind of volume of data frequently enough that you always have a recent backup :)


To quote from The Ringworld Engineers: "The Kzinti Patriarchy is not normally terrified by sheer magnitude." I can safely say that the same statement applies to Google.


Total conjecture, but 3 ways come to mind: De-duplication, compression, and hierarchical storage management. Keep less-frequently accessed parts of the data 'Nearline', and snapshot the lower-demand, older data slightly less often. Continuous data protection appliances were probably designed and tuned specifically for this problem set, and one could assume it was built into the original architecture so that an appropriate amount of the infrastructure already had backup space in mind and accounted for.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: