Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Maybe my understanding of ZFS is wrong, but what about storage requirements? As far as I understand ZFS, you take snapshots of all versions of all files and that can quickly lead to a lot of space usage.

But maybe there is also an automatic cleanup tool that deletes old snapshots after some time?



You take snapshots when you want to take snapshots, and a snapshot allows you to view the data as it was at the time the snapshot was taken. Not entirely unlike a git commit.

The snapshot refers to the storage blocks/records on disk, not files as such, an important distinction since ZFS can expose block storage (zvol) as well as a "regular" file system.

Since ZFS is copy-on-write, the only storage you pay for with a snapshot are the blocks that have changed since the snapshot was taken (plus a little overhead). Thus for data that does not change much, a snapshot is almost "free".

The blocks are reference counted. Once a snapshot is deleted, ZFS decreases the reference count of the blocks referenced by the snapshot. Any block with a refcount of zero is considered free and thus that space is reclaimed. This happens when the block has changed since the snapshot was taken and there were no other snapshots referencing that block.

ZFS itself has no automatic deletion of old snapshots AFAIK, but there are tools built around ZFS that allow for periodic snapshotting and cleanup.


Side-topic: the way you described how blocks and snapshots work is _exactly_ how git works, with references and blobs: as long as a blob can be reached by a reference (branch or tag), it will not be garbage-collected.

Turns out a DAG, as is used in both ZFS and git, is a good data structure in many use cases.


Snapshots are something you control - they don't happen by themselves.

Furthermore, you only "pay" their storage cost for data which actually changes. On a 2TB volume, last snapshotted 1 week ago, if only 5GB have changed since then, that's the only storage overhead (ignoring for simplicity the folder structure itself).

Also, if a single data block changes 20 times since its last snapshot, you still only "pay" storage costs for its current version + the one sitting in the last snapshot.


It's unfair you've been downvoted. This is an excellent question and the responses were likewise excellent.

We need to encourage more of this :)


sanoid is popular tool for homelab user to scheduling snapshot creation/deletion.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: