I know that ZFS is cool and LVM isn't, but I literally just finished repairing m...

skeletonjelly · on Jan 15, 2016

ZFS is a filesystem which does waaaay more than LVM does as a container. It's not just that it's "cool". Rapid filesystem snapshots, checksums, dedupe, do some research and you'll see why it's recommended.

bryanlarsen · on Jan 15, 2016

LVM does snapshots & checksums. dedupe and compression require huge amounts of memory and cause problems with databases. I've used ZFS before, and I switched back.

rincebrain · on Jan 15, 2016

Since when does LVM do block-level checksums and recovery?

Dedup uses a ton of memory, and has a lot of "please don't do this unless you really know what you're doing" flags, but the compression is basically free.

bryanlarsen · on Jan 15, 2016

Since forever. A parity stripe is a checksum. To use it for data integrity, perform a regular scrub, just like recommended practice for ZFS.

voltagex_ · on Jan 15, 2016

I have a ~8TB NAS on an Atom C2758 system. I've loaded less than 4TB so far and FreeNAS is really getting on my nerves. Am I "protected" from a single drive failure with LVM? Can I grow the storage size by swapping out one drive at a time? If so, I might just go to Debian and be done with it, although FreeBSD 10's bhyve hypervisor looks really good.

benjaminl · on Jan 15, 2016

No, LVM doesn't provide any protection, you need to layer mdadm below LVM to provide any sort of protection.

Even with the use of mdadm, it doesn't provide near the sort of protection that ZFS does. Due to pervasive checksuming of data, ZFS handles the bit-rot and corruption that a dying disk does much better than the traditional raid that mdadm provides. For example if you have your disks mirrored or RAIDed, if the disk doesn't provide a read error, mdadm will pass the data back to the OS. Since the data isn't checksumed, there is no way for it to know if it needs to read from the mirrored disk or the parity drives.

bryanlarsen · on Jan 15, 2016

- as of RHEL 6.3, LVM supports raid4/5/6 without mdadm. It has supported raid1 (mirroring) and raid0 (striping) for much longer.

- any LVM or mdadm mode with parity contains a functional checksum. To use it for data integrity, do a regular scrub. You should be doing a regular scrub with ZFS anyways, so ZFS's checksum on read doesn't add much except for slowing things down.

Ded7xSEoPKYNsDd · on Jan 15, 2016

> any LVM or mdadm mode with parity contains a functional checksum. To use it for data integrity, do a regular scrub.

That doesn't work. Scrubbing the RAID can detect errors, but when they occur, the block layer has no idea which copy is the correct one. I haven't verified for LVM, but at least for mdraid, Linux explicitly does not make any attempts at recovering a 'correct' block even in cases where there is more than one copy. It just randomly picks a winner and overwrites the other versions. You still want to scrub for the error detection.

bryanlarsen · on Jan 16, 2016

I've seen it work. It knows which block has the error because the disk reported the error. It then rewrites the sector with the correct data, the disk moves the sector, you see "read error detected, corrected" or some such in your kernel logs.

voltagex_ · on Jan 15, 2016

Thanks. I'm hanging out for FreeNAS 10, which seems like it will solve a lot of things.

unethical_ban · on Jan 15, 2016

What's the problem?

voltagex_ · on Jan 15, 2016

The FreeBSD 9 base makes it really hard to mount ext4 drives (no FUSE until 10), the security patching regime relies on upgrading the whole rootfs (see the SSH client vuln today) and I'm in unfamiliar territory on a BSD having used Linux for the last decade.

benjaminl · on Jan 15, 2016

I am running ZFS on Linux and it is working great for me. ZFS on Linux is considered production ready[0]. A lot of the showstoppers, like being able to boot from ZFS have been worked out.

[0] - https://clusterhq.com/2014/09/11/state-zfs-on-linux/

XorNot · on Jan 15, 2016

Here here for ZFS on Linux as well. My home server is built in one of these http://www.u-nas.com/product/nsc800.html with a C2550D4I motherboard and 8 4TB drives in RAIDZ2. Runs great, and I've got the drives configured for spin-down when not in use to conserve power.

It's the successor to the 20-disk system I setup while I was still at my parents house, though that's a lot noisier (but fortunately lives in the basement service room) - downside is it's all based on 1.5 and 2TB disks, upside is RAIDZ3 is really nice to have.

Dylan16807 · on Jan 15, 2016

In my experience dedup in ZFS performs horribly on hard drives.

jandrese · on Jan 15, 2016

I thought the problem was that it required an epic amount of system RAM to be efficient? It seemed a bit counterproductive to me to try to save some cheap disk space by buying hundreds of dollars of RAM.

Dylan16807 · on Jan 15, 2016

I gave it much more than the already-high recommended amount of RAM. At one point I tested something like 6GB of RAM for 100GB of data.

As far as I know dedup scatters small data chunks by hash across the disk. Absolutely awful performance when your seek times are non-zero. I was looking at speeds in the single digit megabytes per second. Compared to saturating things just fine with dedup off.

I gave it a chunk of SSD for L2ARC and that didn't help either, and it never wrote more than a few hundred megabytes of data to it.

Currently I'm doing out-of-band deduplication on btrfs and it works great. Dedup uses the same copy on write as snapshots do, and causes zero problems.

cm3 · on Jan 15, 2016

FWIW, DragonflyBSD HAMMER1's dedup runs well on small machines (2GB) without hiccups.