My first ever large (> 4TB) ZFS pool is still stuck with dedup. It's a backup se...

My first ever large (> 4TB) ZFS pool is still stuck with dedup. It's a backup server, gets about 2x with deduplication.

At the time, it was the difference between slow and impossible: I couldn't afford another 2x of disks.

These days, the pool could fit on a portable SSD that would fit in my pocket.

Careful, file-based dedup on top of ZFS might be more effective.

Small changes to single, large files see some advantage with block based deduplication. You see this in collections disk images for virtual machines.

You might see that in database applications, depending on log structure. I don't know, I don't have that experience.

For most of us, file-based deduplication might work out better, and is almost certainly easier to understand. You can come with a mental model of what you're working with, dealing with successive collections of files.

Even though files are just another abstraction over blocks, it's an abstraction that leaks less without the deduplication.

I haven't used a combination of encryption and deduplication. That was Really Hard for ZFS to implement, and I'm not sure how meaningful such a combination is in practice.