Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don't have any special knowledge of GFS, but one purpose of distributed file systems of that sort is generally to avoid the need for additional backup by replicating the data many times as part of its routine operation. The data storage layer itself replicates data many times, across multiple data centers, and continually verifies the replicas' integrity and re-replicates as needed.

The GFS paper describes replication in a bit more detail:

> Users can specify different replication levels for different parts of the file namespace. The default is three. The master clones existing replicas as needed to keep each chunk fully replicated as chunkservers go offline or detect corrupted replicas through checksum verification

Some other reasons you wouldn't backup a distributed file system (not saying there are no reasons ever [1]):

(1) Difficult to add another layer of backup without impacting performance unpredictably at backup time. It's more predictable to implement much of the replication synchronously within the request (while optionally some replicas to catch up out-of-band) (2) Files are differently important - some may warrant a greater degree of redundancy than others. The file system can understand this and take advantage of it; a separate backup system on top of the file system probably can't. (3) A standard backup/restore process often implies downtime during recovery. One goal of distributed systems is to avoid downtime by handling faults transparently. They continuously repair themselves. See: recovery-oriented computing. (4) A backup and restore process that's in any way intrusive on the operation of the system will not be easy to test on an ongoing basis the way that failure recovery will be tested constantly within the distributed file system. (In a big server fleets, drives will fail all the time, giving you no end of opportunities to exercise your recovery process.)

[1] One reason might be a defense against "unknown unknown" faults in the file system itself that cause it to irrecoverably lose track of data.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: