OpenZFS Merged to FreeBSD

ggm · on Aug 25, 2020

It is amazing that two disparate OS communities can take a shared codebase and integrate it deeply into their structure.

There are many places where it would be almost impossible to achieve this kind of share. Witness the 'compatability library' work which exists to avoid ABI blackholes running one platforms code on another platform.

BTW I have been able to carry FreeBSD ZFS LUN on a NetAPP and mount them on Debian ZFS, and vice-versa, since I started using ZFS. What we got here is better code alignment but with qualifications, ZFS has always been a common platform (with fringing differences)

I never had illumos or Solaris, I cannot comment on how well it would have worked.

kev009 · on Aug 25, 2020

Brian Behlendorf and Matt Macy are both fantastic engineers, other people contributed but this is ultimately the kind of thing that must be led by a small number of (in this case two) extremely senior engineers working together for a common outcome with raw technical ability to get everything off the ground and prove it works; and refined leadership skills to stick around til it's actually done!

nosequel · on Aug 25, 2020

Matt (Kip) Macy also showed "refined" leadership when he

> told workers in September 2006 to cut the beams that supported (his tenant's) apartment's floor... They also shut off Morrow's electricity, cut his phone line and had workers saw a hole in his living room floor from below, prosecutors said.

So yes, he has really refined leadership qualities!

https://www.sfgate.com/bayarea/article/S-F-landlords-charged...

kev009 · on Aug 26, 2020

illumos concern trolling, this has nothing to do with engineering leadership and is just lashing out a bad faith personal attack because you are friends with joyent people. Being unable to acknowledge that he served his sentence and is reformed indicates you are just hung up on a failed project that is as useful as spilled milk.

athms · on Aug 26, 2020

Somebody that continues to blame the victims for their troubles is NOT reformed.

kev009 · on Aug 26, 2020

Like Bryan Cantrill?

tal8d · on Aug 25, 2020

Looks like the CoC mob has failed to run off a capable developer... for all we know that tenant might have one day contributed code - this is a gross violation of "applicable spaces" clause!

gjm11 · on Aug 25, 2020

Whatever does this have to do with codes of conduct? I mean, maybe software projects should have CoCs that say "do not do things likely to kill your tenants", but normally they're concerned with conduct within a project, and in any case "the CoC mob" seems to indicate you're somehow arguing against having and enforcing CoCs. Is there some background here that makes your comment make more sense than I'm currently able to get out of it?

tal8d · on Aug 25, 2020

> Whatever does this have to do with codes of conduct?

Whatever does a rental property issue have to do with OpenZFS? What about political contributions? Are you genuinely unaware of all the crazy witchhunts, totally unrelated to the code base, that have occurred as a result of this CoC push? Have you ever heard of ESR? http://esr.ibiblio.org/?p=8609

gjm11 · on Aug 26, 2020

> Whatever does a rental property have to do with OpenZFS?

Not much (though I think it's fair to say that when someone is praised for their "refined leadership" it's talking about more than just the quality of their code). In any case, "that other guy also said something irrelevant" isn't really to the point.

> What about political contributions?

What about them?

> Are you genuinely unaware of all the crazy witchhunts [...] this CoC push?

My confusion is because so far as I can see "this CoC push" has nothing to do with anything here, it's just something you want to talk about. The OP isn't about CoCs. The thread you replied to isn't about CoCs. The comment you replied to isn't about CoCs. Kip Macy's mistreatment of his tenants isn't about CoCs, or caused by CoCs, or prohibited by CoCs. The OpenZFS merge isn't about CoCs. (Well, quite possibly the OpenZFS project and/or the FreeBSD project have them, but so what?)

> Have you ever heard of ESR?

Yes. I don't look to him for political guidance, though.

(Two other things, besides codes of conduct, that have nothing to do with anything in this discussion aside from your apparent wish to talk about them: political contributions, and Eric Raymond.)

nosequel · on Aug 25, 2020

To be fair, I was originally replying to the complements about how amazing Kip Macy is, not OpenZFS. If the parent comment was about OpenZFS, I wouldn't have replied. I thought posting about Macy's multiple felony counts is a solid response about how amazing he is at leading folks. Also gtfo with your anti-CoC outrage with this. He didn't get caught using the wrong pronoun in a code comment.

> Kip Macy is accused of three felony conspiracy charges, three burglary charges, two stalking charges, two grand theft charges and one felony count of shutting off service

athms · on Aug 26, 2020

That is why Kip goes by his middle name now. He also moved from California to Washington.

This was big news in SF Bay Area when it happened. It was FreeBSD's Hans Reiser moment.

Another gem, Kip kicked one tenant in the chest while he was sitting on the couch and threatened him with a gun.

I am curious if he ever reimbursed his parents after they drained their retirement savings and offered their house as collateral to pay for his $500k bail, which was forfeited after Kip and his wife Nicole decided to skedaddle to Europe instead of appearing in court. Kip continued to work on FreeBSD while on the lam in Italy with full knowledge of the core team, with many members actually defending his behavior. Kip and Nicole also never showed for the civil trial and lost by default.

The original plea deal offered by the DA, before they absconded, was 6 months in county jail with charges reduced to misdemeanors. Instead, both were sentenced for 4 felonies and got 4.5 years (reduced for good behavior), with Kip ending up in San Quentin -- a maximum security prison housing California's death row.

I remember ABC News did an interview with Kip while he was incarcerated and he blamed his tenants for all his troubles. At least he can no longer (legally) possess a gun since he is a convicted felon.

senderista · on Aug 26, 2020

This is indeed Hans Reiser-level drama. After reading about their shenanigans I can't believe anyone would choose to work with such a person.

https://abcnews.go.com/US/exclusive-landlord-hell-defends-te...

athms · on Aug 26, 2020

So that is what Kip was doing in Italy while being on the lam from the law. He is really dedicated. I wonder if San Quentin allowed him to code while he was incarcerated.

znpy · on Aug 25, 2020

Well you're citing netapp, zfs, freeebsd and debian. Those are all top notch.

In all honesty I can't understand why would anyone use anything other than zfs nowadays for important data.

retro64 · on Aug 25, 2020

In all honesty I can't understand why would anyone use anything other than zfs nowadays for important data.

At a personal level, I opted out of it based on simplicity and familiarity. If I have a data drive where I need to move and mount in another box, I do not want to mess with the complexities of supporting ZFS to get at that data or the risk that the only OS I have available can't read it.

Again, this is a non-commercial environment, but I consider my data important as well. I've decided to go the route of keeping multiple backups of my data spread across multiple drives. I've already experienced bitrot several times over the years, but for me, this approach is more practical than relying on ZFS.

radiowave · on Aug 25, 2020

I don't mean to be criticizing your choices, but I'll note here that if you're willing to avoid enabling the latest and greatest ZFS features, bringing up a machine that can mount your disks can be as simple as booting from Ubuntu install media, and having internet available so you can run "apt install zfsutils-linux".

(And somebody's probably going to correct my ignorance and point out that beyond some release 'x', the ZFS stuff is already built-in.)

retro64 · on Aug 25, 2020

Is it that easy now? I actually was in this situation about 3 years ago and found I could not simply mount the ZFS drive - and I cannot recall exactly why. I think it had something to do with the pool…configuration(?), and I went down a rabbit hole trying to sort it out. Fortunately in this case the tradeoff in time was more valuable than the data (it was a system drive on a small SSD) so I had the option to ditch my efforts with little to lose. But, I did spend a handful of hours on it with no traction.

bpfrh · on Aug 25, 2020

I was able to transfer a zfs pool from a linux server to a freebsd server with different hardware simply by moving the physical disks to the new server.

For me ZFS has been rock solid, even with power cuts during running vms(linux and windows) and a scrub.

I even managed to steal away enough ram from a server so that ZFS produced a stacktrace in dmesg, no data corruption, even on running vms.

znpy · on Aug 25, 2020

Yes it is :)

asveikau · on Aug 25, 2020

I've used FreeBSD install media to recover too. You can get a shell and there's no need to download other stuff.

I haven't done this to recover ZFS as written by Linux or Solaris/illumos, not sure how well that works but wouldn't be surprised if it does.

traceroute66 · on Aug 25, 2020

@znpy

Why people aren't just using ZFS (or BTRFS) ?

Well...

Maybe because XFS is well proven in certain critical environments, especially ones which might involve frequent manipulations of large quantities of small files (think mailserver or database). Indeed, some developers will tell you in the docs that they only support XFS or ext4 and that you're on your own if you choose something else.

Maybe because ZFS and BTRFS still have their bugs, small and big (hello RAID5 in BTRFS ;-) ) and can be heavily dependent on particular implementations.

Maybe because in a virtualised environment, you loose many of the benefits of ZFS or BTRFS (they like to see the raw disk, not some abstract notation).

Maybe because the combination of XFS/ext4 plus LVM is more than good enough for most people as that route still provides snapshots etc.

Maybe because BTRFS needs babysitting[1] (don't know about ZFS, I suspect it might).

Maybe because not everyone feels the need to "keep up with the Jonses".

[1] https://www.youtube.com/watch?v=8YUC-r1aXAc

linsomniac · on Aug 25, 2020

As far as any of your reliability concerns, my experience with ZFS has been that it is amazingly reliable.

If I had to pick one place to put precious data that I needed to access for 10-20 years, ZFS would be it. Of course, I wouldn't put it in one place, so if I picked places to put that data, ZFS would definitely be one of them.

I've been using ZFS for over a decade, both for my personal storage systems and for work (largely backup servers), and despite heavy use and abuse, I've never once had data loss while using it. I've had some white-knuckle times, but in the end ZFS was able to recover all the data.

I say "abuse" because I was able to write a stress test program based on my usage that uncovered several ZFS+FUSE issues early in that port.

Just a data point.

radiowave · on Aug 25, 2020

Likewise. Been using ZFS for over 10 years. Been through intermittent controller faults, backplane faults, cabling faults on a variety of ZFS, ext, and NTFS systems.

The one time I lost data from a ZFS system (which was a faulty controller silently writing garbage to multiple disks in the array), ZFS told me exactly which few files were uncorrectable, and I was able to restore them from backup.

Not one single second of doubt as to which storage system I want my data on.

masklinn · on Aug 25, 2020

That you're equating ZFS and BTRFS, and tarring ZFS with the issues of BTRFS, clearly shows how much you don't know about it. ZFS was production-stable more than 10 years ago.

znpy · on Aug 25, 2020

This is the main point most people just refuse to accept out of blind religion: zfs was production stable 10 years ago and is production stable TODAY.

With btrfs you're still waiting for stuff to be properly implemented.

danudey · on Aug 25, 2020

> With btrfs you're still waiting for stuff to be properly implemented.

Redhat gave up on waiting, and is removing btrfs as a supported option in RHEL. Mostly, I assume, because the btrfs team has yet to ship something anyone in enterprise would be comfortable betting their customers' data on.

fomine3 · on Aug 26, 2020

RedHat is not "just waiting" company for core component but hiring maintainers and strongly contributing company if needed. it's happened on XFS adoption.

freedomben · on Aug 26, 2020

RedHat is actively working on Stratis too (which I'm excited for): https://stratis-storage.github.io/

xxpor · on Aug 25, 2020

I think the biggest issues with using ZFS as a linux user come down to

A) ZFS-on-Linux was always behind BSD ZFS until recently.

B) The license issue, so you have to compile it as a separate module using DKMS, which means having a compiler, the linux headers, etc etc. No where near as low friction as just choosing ext4 or btrfs.

derefr · on Aug 25, 2020

> ZFS-on-Linux was always behind BSD ZFS until recently.

Sure, but the sort of people who need ZFS are the same sort of people who are doing their storage on a SAN, and so the sort of people for whom choice of OS (for the SAN itself) can be contingent on choice of storage layer, rather than the other way around.

This is half the reason ZFS-on-Linux didn't have much momentum: most ZFS deployments were happy to just run BSD on their SAN, then serve SAN volumes out over the network (or into a hypervisor cluster) for Linux systems to consume.

andoriyu · on Aug 25, 2020

> ZFS-on-Linux was always behind BSD ZFS until recently.

FreeBSD was never really working on ZFS in a way ZOL was. Whole reason for the switch is because main contributor upstream used by FreeBSD stopped using ZFS.

ZOL had more features for a while now as well.

asveikau · on Aug 26, 2020

I think there is a reason the part you quoted was written in the past tense. The situation has changed. But these things have momentum and people might have adopted FreeBSD years ago because of the former status quo, where FreeBSD had it integrated in the default install and Linux distros did not.

tw04 · on Aug 25, 2020

I think it's a disservice to even mention btrfs and zfs in the same breath. ZFS is a battle tested, enterprise ready, production grade filesystem and lvm.

BTRFS is a science experiment by comparison. They tried to copy zfs features because Oracle won't release it under GPL and did a fairly poor job of it. The fact we're this many years later and their RAID5 code still has total data loss bugs pretty much sums up btrfs.

danudey · on Aug 25, 2020

Anecdotal, but I had a power outage at work last year and my relatively freshly-installed 18.04 system, for which I chose btrfs as the root filesystem, completely corrupted.

I managed to extract most of my data to another disk on another system, but it was pretty shocking. I haven't seen unfixable filesystem corruption like that since ext2 in 2001. I'd completely forgotten it was a thing, but there I was in 2019 trying to use a second linux box to salvage my data.

Of course, whenever I mention this the btrfs zealots always come out of the woodwork saying "you must not know what you're doing if you broke btrfs" or "you obviously didn't really try if you couldn't get it fixed" or a half-dozen other excuses, but in the end I've just completely given up on btrfs. It's just not worth the hassle.

necheffa · on Aug 25, 2020

Do people really create zpools within a VM for more than playing around? The best practice is to point a VM at a zvol the way you might use LVM on bare metal to back storage for VMs.

ianai · on Aug 25, 2020

Right. The storage pool for VMs should be zfs (if you’re going to use zfs as an assumption) while the VMs should be something else unless you’re doing something specific to use zfs in the VM. Zfs is engineered to almost turn a system into a file system server in exclusion so use it for its strengths appropriately.

znpy · on Aug 25, 2020

Big thumbs up for xfs!

It's just... Snapshot thing and replication (zfs send) are so damn easy in zfs...

danudey · on Aug 25, 2020

I would avoid lumping BTRFS in with ZFS. The two are very different.

> Maybe because XFS is well proven in certain critical environments, especially ones which might involve frequent manipulations of large quantities of small files (think mailserver or database). Indeed, some developers will tell you in the docs that they only support XFS or ext4 and that you're on your own if you choose something else.

XFS is great, except it doesn't do any of the things that ZFS does well. For example: snapshots (to the fantastic extent that ZFS does), multi-volume support, shared storage pooling, etc. The only thing XFS offers that ZFS doesn't is support for reflink copies on Linux (i.e. "cp -R --reflink=always src dst" for near-instantaneous copies of large directories with no extra disk usage).

> Maybe because ZFS and BTRFS still have their bugs, small and big (hello RAID5 in BTRFS ;-) ) and can be heavily dependent on particular implementations.

ZFS has been, in my experience, amazingly bug-free over the last however long I've been using it (5 years in production, more personally). ZFS has working and reliable RAID support; only BTRFS is lacking it.

BTRFS, on a stock Ubuntu install with default settings, corrupted after a power outage, and could only recover most of my data, but not all. I haven't had filesystem-related data loss since ext2 in 2001.

Also, for everyone except people running Solaris, there is (now) effectively one implementation, and it works fine.

> Maybe because in a virtualised environment, you loose many of the benefits of ZFS or BTRFS (they like to see the raw disk, not some abstract notation).

You still get point-in-time snapshots on a copy-on-write filesystem, which can support different block sizes per subvolume, on-the-fly compression, and incremental or full exports, all from a shared, extendable storage pool. Those alone are huge enough to justify it for me.

ZFS wants to see the underlying disks so that it can make sane judgements about block allocation, alignment, etc. On a VM, that shouldn't matter, because your host filesystem/SAN/etc. should be doing that anyway.

> Maybe because the combination of XFS/ext4 plus LVM is more than good enough for most people as that route still provides snapshots etc.

LVM snapshots are awful from a performance standpoint.

In ZFS, when you create a snapshot, any new data is written to new blocks (read the old block, make the change, write it to the new block), so your snapshot points to the old blocks on disk.

From what I can tell[1], LVM snapshots mean that when you write data to a block, LVM reads that block, writes it somewhere else, and then updates the original block in-place. This makes every write a synchronous write, because you have to write the old data to its new block, sync to make sure it actually gets stored, and then do your new write.

BTRFS snapshots cannot be recursive (i.e. you cannot snapshot /data/ and /data/gitlab and /data/svn and /data/backups atomically). They argue that this is a feature, but I would consider it a massive bug. LVM snapshots by their nature are not recursive because LVM has no concept that I can find of nesting.

> Maybe because BTRFS needs babysitting[1] (don't know about ZFS, I suspect it might).

ZFS, as far as I can tell, does not need babysitting. I used it for storing backups at my last job; we had a QNAP NAS exporting over iSCSI, ZFS was using the iSCSI storage, and we wrote data to it. Nothing ever went wrong, and I literally never checked on it unless the power went out or something of the sort. It was three years before I ever got around to setting up monitoring for it, because literally nothing ever went wrong so I completely forgot that there was anything to monitor. Eventually I did set it up, but it never went off. Literally set it and forget it.

In summary: ZFS is fantastic, and basically never needs to be taken care of. It does its thing and it works great, and that's it. BTRFS has corrupted itself arbitrarily after a power outage (which I thought we'd solved with ReiserFS back in 2001), LVM snapshots are slow and awful, and the tooling for LVM and BTRFS is a usability disaster while also losing a lot of features that ZFS has (like on-the-fly compression, block deduplication, etc.).

fomine3 · on Aug 26, 2020

ZFS is great but CoW makes performance bad on some situation in theory. So there are some situations better to use XFS (without using LVM snapshot all time).

lousken · on Aug 25, 2020

if i remember correctly LVM snapshots were deemed to be lowering performance of a system if you had lots of them - is that still a thing?

post-factum · on Aug 25, 2020

Thin LVM snapshots solve that.

andoriyu · on Aug 25, 2020

Did you include BTRFS in here just to mentioned all weak points of BTRFS and act like they also true for ZFS? Because that's not true at all.

> Maybe because in a virtualised environment, you loose many of the benefits of ZFS or BTRFS (they like to see the raw disk, not some abstract notation).

ZVOL exist...You can snapshot, close, send/recv it just like dataset. You can fine-tune it like any other dataset. ZVOL works just fine for VMs.

> Maybe because ZFS and BTRFS still have their bugs, small and big (hello RAID5 in BTRFS ;-) ) and can be heavily dependent on particular implementations.

Everything has bugs. Again, you put BTRFS here just to drive your point like if it's try BTRFS then it's true for ZFS?

> Maybe because BTRFS needs babysitting[1] (don't know about ZFS, I suspect it might).

Same as above.

> Maybe because the combination of XFS/ext4 plus LVM is more than good enough for most people as that route still provides snapshots etc.

Very different kind of snapshots...

I've been using ZFS on my desktop and laptops for over a decade now. You gotta make really kick-ass FS for me to switch off ZFS even for desktop.

squarefoot · on Aug 25, 2020

> In all honesty I can't understand why would anyone use anything other than zfs nowadays for important data.

In datacenters many probably do, however most home/soho NAS machines are severely limited wrt memory and CPU power, which could be a limit. My self assembled NAS uses a Atom board to keep power requirements low since it stays always on, and I can't complain about its performance, however that CPU doesn't support more than 4GB RAM which is considered the minimum to properly use ZFS (NAS4free). Many cheap commercial NAS boxes used in small offices are even more limited. I wonder if there are any technical reasons preventing ZFS to operate (or be adapted to) in low memory environments.

amarshall · on Aug 25, 2020

> 4GB RAM which is considered the minimum to properly use ZFS

This is mostly a myth with origins in the high-memory requirements of ZFS deduplication (which few should use, anyway). Of course, more memory allows for more caching, but that’s true of any filesystem on a “modern” OS.

djsumdog · on Aug 25, 2020

Yea I only have 4GB of ram on my little HP MicroServer I use as a NAS with ZFS. The only issues I seem to have is that ZFS encryption seems to use more CPU than luks encryption, but it's still not too bad (the AMD Turion(tm) II Neo on it doesn't have AES instructions).

I might upgrade it to something with a better processor because I do want to run de-duplication on my 14TB drive at some point.

Wowfunhappy · on Aug 25, 2020

I have used ZFS with 1 GB of RAM. It worked fine, it just wasn’t able to cache stuff to memory.

linsomniac · on Aug 25, 2020

I wonder if it would be useful to have a Optane for the memory-hungry portions of ZFS, particularly the dedup tables, especially to be able to run it on very small, efficient machines.

My work backup server has "only" 32GB of RAM which is nowhere near enough to do dedup, even though there's a ton of stuff that could be deduplicated. Looks like it needs at least 64GB RAM to dedup, and that machine is maxed out.

But at home I'm in a similar situation to the parent. I'd like to set up a NAS, but I use it mostly for backups long term storage and don't want a big machine sitting idle all the time. A tiny Atom with Optane for cache and DDT might be ideal, in my thinking.

But for most of the storage use Backblaze is probably the right answer.

Rafuino · on Aug 25, 2020

Would the Optane in this case be used as an L2ARC to handle part or all of the dedupe table? I'm trying to brush up on my ZFS terminology at the moment...

linsomniac · on Aug 25, 2020

Currently, AFAIK, the dedup table (DDT) has to reside entirely in RAM. The L2ARC is a disk-based cache of filesystem blocks (a pretty clever design, worth reading up on), but a very different thing from the DDT.

You could definitely put the L2ARC on Optane, but Optane is probably better suited to the ZIL (ZFS Intent Log, basically the journal so writes can be quickly acked) if you aren't using literal NVRAM.

Breaking the DDT out so it could exist on a Optane isn't currently available (again, AFAIK, but I recall hearing about some changes in that department, I don't recall specifics and may be misremembering).

I've basically never had a machine big enough to comfortably use dedup except on trivial loads. My backup boxes, where I'd really like to use it, have all fallen over when enabling dedup because of the RAM requirements.

fomine3 · on Aug 26, 2020

ZFS 0.8 supports a feature called "Allocation Class" that makes DDT able to working on "special" storage device (aimed to use NVMe) rather than RAM. It also supports to put metadata and small blocks on special device.

https://forums.servethehome.com/index.php?threads/zfs-alloca...

znpy · on Aug 25, 2020

I've been running zfs on a 2gb machine with little I/O (it mostly receives snapshots) and had no issue so far.

Generally speaking, I will go with lower specs machines until I run into some actual problem. Haven't so far.

curt15 · on Aug 25, 2020

Google still uses ext4 for its servers, while Facebook uses mostly btrfs nowadays.

nix23 · on Aug 25, 2020

Google has on top of ext4 his GoogleFS, in theory you could even go 'without' a underlying FS, a bit like BlueStore from Ceph.

dannypgh · on Aug 25, 2020

This does not sound right to me.

If you're talking about GFS, that's been gone for a decade, and was not a filesystem as far as the kernel was concerned.

nix23 · on Aug 25, 2020

That's why i brought up the Ceph example, but counts for databases etc too, actually the early databases had all their own fs/data-structure on the platter.

vaylian · on Aug 25, 2020

Maybe my understanding of ZFS is wrong, but what about storage requirements? As far as I understand ZFS, you take snapshots of all versions of all files and that can quickly lead to a lot of space usage.

But maybe there is also an automatic cleanup tool that deletes old snapshots after some time?

magicalhippo · on Aug 25, 2020

You take snapshots when you want to take snapshots, and a snapshot allows you to view the data as it was at the time the snapshot was taken. Not entirely unlike a git commit.

The snapshot refers to the storage blocks/records on disk, not files as such, an important distinction since ZFS can expose block storage (zvol) as well as a "regular" file system.

Since ZFS is copy-on-write, the only storage you pay for with a snapshot are the blocks that have changed since the snapshot was taken (plus a little overhead). Thus for data that does not change much, a snapshot is almost "free".

The blocks are reference counted. Once a snapshot is deleted, ZFS decreases the reference count of the blocks referenced by the snapshot. Any block with a refcount of zero is considered free and thus that space is reclaimed. This happens when the block has changed since the snapshot was taken and there were no other snapshots referencing that block.

ZFS itself has no automatic deletion of old snapshots AFAIK, but there are tools built around ZFS that allow for periodic snapshotting and cleanup.

rakoo · on Aug 25, 2020

Side-topic: the way you described how blocks and snapshots work is _exactly_ how git works, with references and blobs: as long as a blob can be reached by a reference (branch or tag), it will not be garbage-collected.

Turns out a DAG, as is used in both ZFS and git, is a good data structure in many use cases.

gruturo · on Aug 25, 2020

Snapshots are something you control - they don't happen by themselves.

Furthermore, you only "pay" their storage cost for data which actually changes. On a 2TB volume, last snapshotted 1 week ago, if only 5GB have changed since then, that's the only storage overhead (ignoring for simplicity the folder structure itself).

Also, if a single data block changes 20 times since its last snapshot, you still only "pay" storage costs for its current version + the one sitting in the last snapshot.

chungy · on Aug 25, 2020

It's unfair you've been downvoted. This is an excellent question and the responses were likewise excellent.

We need to encourage more of this :)

fomine3 · on Aug 26, 2020

sanoid is popular tool for homelab user to scheduling snapshot creation/deletion.

whereistimbo · on Aug 25, 2020

Hats off to anyone who involved there. Amazing achievement indeed.

Edit: change heads up to hats off

nucleardog · on Aug 25, 2020

You probably mean “hats off to anyone involved” — generally meaning to give praise. “Heads up” would give the sentence more a meaning of “warning to anyone involved”.

Apologies if I’m being presumptuous or if this is just a late night typo. My wife speaks English as a second language so I know how confusing some of these idioms can be.

whereistimbo · on Aug 25, 2020

Thank you for the correction. I'm not really sure about that but I type it anyway, hoping for correction from someone.

eitland · on Aug 25, 2020

I had to look up the "heads up" phrase a few years ago and IIRC it is a warning from baseball.

Crontab · on Aug 25, 2020

I would like to see NetBSD follow suite but I don't know if they have the manpower to do so.

y04nn · on Aug 25, 2020

FreeBSD was staring to lack features compared to ZFS on Linux (ZOL), for example I'm not able to read encrypted datasets on FreeBSD but I'm able to do it on MacOS and Linux. More interoperability is very welcome.

rdc12 · on Aug 25, 2020

From what I have heard, the amount of work needed in porting native encryption to the original FreeBSD port of ZFS was actually what led to this project.

nix23 · on Aug 25, 2020

You can, just compile sysutils/openzfs

cryptonector · on Aug 25, 2020

Two?? Three. More!

hestefisk · on Aug 25, 2020

FreeBSD used to need a kernel module called opensolaris.ko in order to load ZFS (not sure if that’s still the case). So whilst ZFS has been ported to many OSes, it still relies / relied on a paravirtualized Solaris kernel. Note: it’s not a critique; I love ZFS and use it to secure all my data.

markjdb · on Aug 25, 2020

"paravirtualized Solaris kernel" isn't really accurate - it's a collection of kernel interface shims. The whole thing is quite small, about 4KLOC on FreeBSD.

hinkley · on Aug 25, 2020

This is only peripherally related to this great news, and I apologize if this makes me a wet blanket to ask, but I figure someone here will know the answer:

I went back to read some ZFS roadmap documentation from the Before (Oracle) Times this winter, and was reminded why my outlook for ZFS was initially rosier.

I was super excited about ZFS because if and when it got the ability to add new hard drives to an existing volume, then we had something I've always wanted: a straightforward way to grow a filesystem without having to tear it down and rebuild it from scratch.

That was still listed in future enhancements, shortly before Sun started to fall apart. Does anyone know where we are on that? Did that functionality get painted into a corner by other design concerns?

It's in that category of things that I might not use that much, but family and customers could use the hell out of and it would simplify my life just knowing it exists. I feel we are all lesser for that never making it off of the drawing board.

kylegordon · on Aug 25, 2020

I think it's happening soon! Been watching this for a while too... https://github.com/openzfs/zfs/pull/8853

hinkley · on Aug 25, 2020

This PR has been sitting for a long time :(

wtallis · on Aug 25, 2020

This capability is why I've always regarded ZFS as great for people who buy drives by the dozen, but a fairly poor choice for consumers and hobbyists. The ability to rebalance a btrfs array and switch to and from any of the RAID modes makes it vastly easier to cobble together useful redundancy from miscellaneous drives, or to recover from a drive failure without having to immediately go out and buy a suitably large replacement drive. If btrfs RAID5/6 hadn't taken so long to get to the current barely-usable-with-caveats state, I doubt ZFS could have caught on among home Linux users.

hinkley · on Aug 25, 2020

We need a consumer-grade NAS where you can push a button, pop out the crappiest drive, shove in a new drive, and after an hour it goes 'ding' and you are ready to go to the family reunion and take a crapload of pictures. Those are, of course, never going to be an all-drives-at-once affair like you illustrate.

I wish Synology or Drobo had invested in solving this problem for everyone instead of doing their own proprietary solutions. Drobo has a whole filesystem, Synology has logical layer on top of RAID and/or btrfs to simulate this for some configurations of disk. In both cases, slapping a single higher capacity drive into the array gets you no increase in disk capacity. So every nth upgrade you are looking at 2 sequential rebuilds.

Someone · on Aug 25, 2020

“after an hour it goes 'ding'”

Most modern hard disks are too large/too slow for that. A write speed of 200MB/s translates to 12GB/minute, or 720GB/hour.

So, a 2TB hard disk (‘small’ for today) already would take 2½ hours to fill, and that assumes you hit 200MB/s continuously.

For a 12TB disk, “after a _day_ it goes 'ding'” would be closer to the truth.

I also think that, for most consumers, storage in the cloud is a better option than a “consumer-grade NAS”.

hinkley · on Aug 25, 2020

With my synology it was faster to back up the array, break the volume, recreate it and copy the data back than it was to swap the 2nd of 3 disks I was trying to upgrade. The first ding was 6 hours, the second was on track for 48 hours, and I had another one to go. It shouldn't be like this.

I'm not convinced, in the specific case of a consumer system, that every track on every disk needs to be touched during a build/rebuild. It amplifies the read and write traffic at exactly the point where any traffic is a ticking time bomb waiting to take your data away (which is exactly what happened the first time I tried to upgrade the Synology - a drive started reporting bad sectors during the verification after rebuild, but before marking the array healthy).

In a production server you might be more uncomfortable with a drive array that experiences an amortized cost spread over every read or write operation, especially one that upticks every time your total storage exceeds the previous high water mark. But even there, I think I could make a reasonable if not airtight case for that as well.

Last conversation, I believe someone told me that the cost of resilvering a ZFS array is proportional to the contents and not the capacity. I don't know if that would be true in the case of the patch being discussed.

The best any of us can expect is for the costs to be similar to the kind of service disruption you get when adding or removing servers from a cluster that is using consistent hashing with m-way redundancy. There are always m machines involved in every write, and you are going to have to move 1 to 2 nths of the data every time n changes, and that may be as high as m/n reads and writes. But 2/5th is a lot smaller than 3 reads, 1 write, 3 reads per block for 100% of the blocks in the volume. But I haven't seen a system that is anywhere near that minimum cost.

boring_twenties · on Aug 25, 2020

Where I live, "most consumers" are stuck with Comcast which features a 2MByte upload speed (at very best -- currently I'm getting <1MB), and a 1TB monthly transfer limit.

Cloud storage is a non-option.

Someone · on Aug 26, 2020

I still disagree. Running with only cloud storage wouldn’t work, but a local SSD and cloud storage, why not?

For “most consumers”, I would think the data they create mostly is created in their phones (photos, videos), anyways, and I don’t see them syncing that to their local NAS, instead of using the more convenient sync to the cloud.

boring_twenties · on Aug 27, 2020

I guess I don't know how much phone videos take up, but I know that when I was using my GoPro I was able to fill up several hundred GB in a few weeks. And that was with an ancient 720p one.

For photos, I suppose it would be good enough. Actually, perversely enough my phone does have unlimited data via its LTE connection. If only phones had the option to "download over cellular network only" instead of the reverse!

Lammy · on Aug 25, 2020

It looks like OpenZFS still has xattr support which is the one Illumos ZFS feature I always wanted on my FreeBSD systems. Hype! :D

nix23 · on Aug 25, 2020

Has the 'old' freebsd zfs not xattr too? I can see it in the options (but no tested)

zfs get all:

zroot/var/tmp xattr off temporary

Lammy · on Aug 26, 2020

It's listed in the manpage too, but as:

  xattr=off | on
  The xattr property is currently not supported on FreeBSD.

FullyFunctional · on Aug 25, 2020

This is very awesome. It makes replication (via zfs send) between FreeNAS and Linux better. Native encryption should also make zfs cloud replication much simpler.

FooBarWidget · on Aug 25, 2020

How come Linux has licensing issues with ZFS, but the BSDs don't?

qalmakka · on Aug 25, 2020

Because the CDDL (by design, as inherited from the MPL 1.x) is not compatible with the GPL, due to the fact that both try to enforce their own copyleft without allowing relicensing (i.e the MPL2 avoids this thanks to a annex clause that allows you to consider everything as covered by the GPL instead). You can't cover the ZFS modules with the GPL in the final agglomerated result (i.e. kernel + module), so it's not allowed technically by the kernel license (see the difference between the LGPL and the GPL for more info). In practice no one ever questioned this in court, so who knows what the final result would be.

FreeBSD uses a super weak permissive BSD license instead, so there is no conflict because it leaves whatever code you're trying to add to it alone, without trying to enforce any additional terms on it.

nabla9 · on Aug 25, 2020

There is potential GPL violation. Nothing in ZFS violates BSD.

GPL violation is matter of opinion. For the opinion to be settled someone owning Linux copyright should sue Canonical and find out.

Canonical's did legal review and they have some big name open-source lawyers like Eben Moglen and Mishi Choudhary on their side https://www.softwarefreedom.org/resources/2016/linux-kernel-... They see it as a non-issue. So much that Canonical provides legal cover for all paying customers.

mtzet · on Aug 25, 2020

The ZFS license (CDDL) is (arguably) incompatible with GPL. This is not a problem for FreeBSD since it’s not using GPL.

trasz · on Aug 25, 2020

The issues are caused by GPL, while BSD is, well, BSD licensed.

IntelMiner · on Aug 25, 2020

Since Linux came first (and was GPL licensed first), it's worth remembering that ZFS was intentionally made incompatible with the GPL via the "CDDL"

nix23 · on Aug 25, 2020

>ZFS was intentionally made incompatible with the GPL via the "CDDL"

That was said by one lousy youtube comment without any explanation, Ubuntu has no problems with it...and i bet they have lawyers, i think it was in that vid:

https://www.youtube.com/watch?v=-zRN7XLCRhc

It's also crazy, nearly no distro has a problem shipping ultra-proprietary software (firmware), but has a big discussion about CDDL, no one wants ZFS in the Kernel tree, imagine Linus has control over it and has no clue was ZFS even is:

https://linux.slashdot.org/story/20/01/19/0059251/what-linus...

cesarb · on Aug 25, 2020

> It's also crazy, nearly no distro has a problem shipping ultra-proprietary software (firmware), but has a big discussion about CDDL,

It's the difference between "mere aggregation" and "derived work". The kernel+ZFS can be considered a derived work from both, so it has to obey both licenses (GPL2 and CDDL) at the same time; on the other hand, the firmware is completely independent from the kernel, so the licenses (GPL2 and the proprietary license) apply separately to each component.

> Ubuntu has no problems with it...and i bet they have lawyers

On the other hand, Fedora has problems with it as a kernel module (but they do ship it as a ZFS FUSE module, see https://fedoraproject.org/wiki/ZFS), and they do have lawyers (Red Hat's lawyers review all the licenses for Fedora, and they consider the CDDL to be incompatible with the GPL: https://fedoraproject.org/wiki/Licensing:Main).

nix23 · on Aug 25, 2020

>The kernel+ZFS can be considered a derived work from both

What?? No why?...no one of those project is derived from each other, ZFS is a module (on all platforms i think).

>and they consider the CDDL to be incompatible with the GPL

https://zonena.me/2019/01/the-cddl-is-not-incompatible-with-...

That's the important part:

>You may distribute the Executable form of the Covered Software under the terms of this License or under the terms of a license of Your choice, which may contain terms different from this License, provided that You are in compliance with the terms of this License and that the license for the Executable form does not attempt to limit or alter the recipients rights in the Source Code form from the rights set forth in this License.

>To reiterate, executable forms of CDDL source code can be under ANY license you want.

That means your compiled ZFS module package can even be under GPL2 or 3 or beerware ;)

cesarb · on Aug 25, 2020

> no one of those project is derived from each other, ZFS is a module (on all platforms i think)

Once compiled, the ZFS module is a derived work from both the Linux kernel and ZFS.

> > You may distribute the Executable form of the Covered Software under the terms of this License or under the terms of a license of Your choice [...] does not attempt to limit or alter the recipients rights in the Source Code form from the rights set forth in this License.

> That means your compiled ZFS module package can even be under GPL2 or 3 or beerware

I trust the Red Hat lawyers on this matter more than someone who is not a lawyer (from that page: "While it's true that I am not a lawyer [...]"). And from what I understand (I'm also not a lawyer), that clause in the CDDL applies only to the compiled result; but the GPL requires that, if the compiled result is distributed as GPL, the corresponding source code also be available under the GPL. So you cannot distribute your compiled ZFS module package under GPL2 unless the corresponding source code can also be distributed under GPL2.

quotemstr · on Aug 25, 2020

I have always been skeptical of the FSF dogma that dynamic linking of a module or shared library into a GPL program's address space propagates the GPL into that module. The choice of inter-module communication technique (dynamic linking, sockets, etc.) shouldn't have any bearing on licensing. Either components are independent or they're not. There's nothing magic that happens when you take an API that's intra-process and run it over a socket instead. If FUSE filesystems don't carry the GPL tag, neither should in-kernel modules logically decoupled from the kernel core.

belorn · on Aug 25, 2020

You are right that the choice of inter-module communication technique does not have a bearing on the licensing unless the license explicit say so.

To be fair to FSF however, they have to my knowledge never actually said anything else. What they do say is that they will enforce the license at that distinction, a conclusion RMS arrived at many years ago when apple tried to make proprietary addons to GCC. The reasoning, as provided by a consulted lawyer Eben Moglen at the time, was that a judge would likely see it as creating a single derivative work out of the linked parts, and as such you need a copyright license. This story has been mention in a few talks because so far no one has yet to challenge FSF in court and prove it wrong, which is a fight that at least Eben Moglen have been all this years willing (and possible wanted) to take up if given the opportunity.

As a bright line in the sand the concept has served FSF well, but it is not a supreme court decision that defines what a derivative work is under software. It could be that the line should be drawn much more narrower, but personally my bet is the opposite. Courts are rarely on the side that want to abolish copyright control or limit its scope.

trasz · on Aug 25, 2020

To make compiled ZFS module a derived work of Linux and ZFS, it would need to contain (ie include) parts of Linux source code, which is easily avoidable with a shim layer.

nix23 · on Aug 25, 2020

>Once compiled, the ZFS module is a derived work from both the Linux kernel and ZFS.

Really? So if i compile something with gcc it is a derived work from gcc and gcc is derived work from my bin? Or when i make a firefox extension it's a derived work of firefox? Compile the nvidia module...a derived work from linux?

>I trust the Red Hat lawyers on this matter more than someone who is not a lawyer

But he can read :) but hey whatever, really not interested if linux has something good like DTrace ZFS or Crossbow, they like the NIH-Syndrome and i am ok with that too.

cesarb · on Aug 25, 2020

> So if i compile something with gcc it is a derived work from gcc

Yes. This is why the "libgcc exception" (https://www.gnu.org/licenses/gcc-exception-3.1.en.html) exists. The issue is that parts of runtime libraries which come with gcc are statically linked into the compiled executable; without that exception, the result would have to be under the GPL.

The same issue happens with Linux kernel modules. Many functions they need are only defined as macros or inline functions in the headers (since ZFS is a filesystem, just take a look at how many inline functions there are in include/linux/fs.h). And even outside of these, many functions are so closely coupled to the internal data structures of the kernel (and the data structures themselves are also exposed to modules) that any module using them could be considered a derived work of the kernel.

trasz · on Aug 25, 2020

Are data structures copyrightable? The headers are not, and macros can be easily avoided.

And in any case, there's an easy way to prove something does not constitute a derived work: just show that it can work outside of the codebase it's allegedly a derived work of. The reason NVidia is safe is that they provide the same driver for FreeBSD.

vsl · on Aug 26, 2020

Criteria for whether a work is copyrighted are a bit more nuanced and the word “header” doesn’t figure in it. A work needs to be original and have creativity element in it. Code in header files can - and very often does - meet that requirement. Some kind of code (whether in headers or elsewhere) can be unoriginal, but that really has nothing to do with “headers”.

The parent actually explained it pretty well.

toast0 · on Aug 25, 2020

Can you use a module compiled for Linux anywhere but a Linux host?

The ZFS source code probably isn't a derived work, but the compiled module might be.

trasz · on Aug 25, 2020

How about closed source NVidia drivers, or ESX? It’s also derived work, yet there are no licensing problems.

comex · on Aug 25, 2020

In Nvidia’s case, they have an open-source shim layer in between the kernel and the closed-source bulk of the driver. In theory, the shim layer is a derived work of the kernel, but the closed-source part is not. I’d say that’s a bit of a dubious argument, but who knows? It’s never gone to court. Search for “GPL condom”.

As for ESX, that has gone to court:

https://www.zdnet.com/article/linux-developer-abandons-vmwar...

nix23 · on Aug 25, 2020

>"GPL condom”

Ahh its ok for the driver but not for net-code?

https://www.theregister.com/2020/08/04/gpl_condom_nvidia_lin...

See that's exactly the Linux Problem...

By the way...Linus is not ok with a ZFS GPL shimlayer:

https://news.ycombinator.com/item?id=22005901

trasz · on Aug 25, 2020

It would probably be equally ok for both cases; the difference is that Linux developers are not the court, and can use whatever excuse they like.

nix23 · on Aug 25, 2020

>the difference is that Linux developers are not the court

The difference is that they cannot afford to leave every Nvidia GPU owner out.

trasz · on Aug 26, 2020

They couldn’t even if they wanted to; it’s up to NVidia. Or the court, but we already know NVidia would win the case.

trasz · on Aug 25, 2020

So, ZFS would be in the exact same situation as the NVidia driver - shim layer would be derived work, and the CDDL code wouldn't. It is dubious, sure, but then the whole idea of GPL virality is.

nix23 · on Aug 25, 2020

His rhetoric is like:

Fedora has problems with it :)

toyg · on Aug 25, 2020

It is however undisputable that Sun (and Oracle later) chose to write their own license for some of their opensource efforts, when there were very established alternatives already available. It's hard not to draw the conclusion that they made things awkward on purpose (i.e. to keep it away from Linux, the dominant OS in Solaris' market).

znpy · on Aug 25, 2020

There's a very interesting talk on the matter by Bryan Cantrill where he explained that the gpl was unsuitable for their scope for a number of valid reasons, including the fact that the gpl vitality would have affected indipendent hardware vendors too, iirc. And some more cases.

I hope I didn't write bs, however here is the talk of you want to know more and with more details: https://youtu.be/Zpnncakrelk (leaping the chasm from proprietary to open: a survivor's guide)

As usual Bryan Cantrill is just brilliant at explaining stuff.

Edit: just to be explicit: it was NOT about keeping stuff out of Linux.

toyg · on Aug 25, 2020

It's sad that even a brilliant person like Bryan can get opensource so horribly wrong at a fundamental level. Right at the beginning, he says F/OSS established itself because it leapt ahead technologically, not because "all that freedom / user rights stuff"; he totally fails to see how that "user rights stuff" is what allowed the jump to happen - because even developers are users when they first encounter a piece of software. Without the rights of users to read source, tweak it to their liking, and republish it, there is no right for developers to enable the technological leap. I hope Bryan simply chose a poor turn of phrase.

He makes another poor turn when he says that the CDDL kept stuff out of Linux, but "not because of the CDDL but because of perceptions of the GPL" (i.e. IHVs didn't want to deal with the GPL). The license and the choice of using it are two different things. Sun could have written a license that was not GPL (to satisfy IHVs) but was compatible with GPL. They did not. Danese Cooper, who was part of that effort, says this was on purpose, and stated it on record ( http://caesar.ftp.acc.umu.se/pub/debian-meetings/2006/debcon... from 27:30). I guess Bryan was one of those engineers Danese mentions as "having some biases" back then. It was certainly obvious, at the time, what a careful avoidance of a license compatible with the biggest Solaris competitor would have resulted in, from a practical and commercial standpoint.

boomboomsubban · on Aug 25, 2020

>Danese Cooper, who was part of that effort, says this was on purpose, and stated it on record ( http://caesar.ftp.acc.umu.se/pub/debian-meetings/2006/debcon.... from 27:30).

In the same talk, Simon Phipps disagrees with her assessment and later said he was furious at her spiteful comment. Now, almost fifteen years later, nobody else had corroborated her claim despite there being no reason to protect Sun anymore.

toyg · on Aug 25, 2020

> there being no reason to protect Sun anymore

There is still a reason to protect themselves, of course, from a historical judgement that wouldn't appear particularly kind.

boomboomsubban · on Aug 25, 2020

I haven't seen people give any unkind judgment towards Danese Cooper for helping write the CDDL and stating it was done to block tools from Linux. And so many people already blame them for the incompatibility, it's hard to think a confirmation would make it worse for them.

toyg · on Aug 25, 2020

> I haven't seen people give any unkind judgment towards Danese Cooper for helping write the CDDL

We haven't read the same flames, then...

boomboomsubban · on Aug 25, 2020

I've seen a ton of flak thrown at Sun for the license, but Cooper's name only seems to come up as a source that it was intentional. I've never gotten the impression people blame her.

Of course I haven't read every discussion about it, but I have read quite a few. I think someone would have to both confirm that the incompatibility was intentional and argue it was the correct choice to to get much hate about it.

znpy · on Aug 25, 2020

I wouldn't know, mr Cantrill knows his stuff.

It seems to me that Mr Cantrill is more aware of the business side, while you're more keen on the philosophical side.

Yeah twiddling the source and everything, but if you didn't choose it in the first place that wouldn't happen.

nix23 · on Aug 25, 2020

>Sun could have written a license that was not GPL (to satisfy IHVs) but was compatible with GPL

Why? Why even care about GPL? It's 100% compatible to MIT/BSD/ISC and that's enough, even Linus said that he probably would choose the ISC today, he chose mainly the GPL because GCC had it...and now they have GPLv3 (an absolute shit decision if you ask me).

toyg · on Aug 25, 2020

(ISC is a troublesome license, but that's beyond the point)

Licenses are not adopted in a vacuum. The license is what made GCC what it was when Linus had to pick a license, and arguably what made Linux what it is today. A "BSD Linux" would have likely ended up the same way BSD systems have: a largely academic project routinely cannibalized by behemoth closed-source companies.

The question, yesterday and today, is really "why not care about the GPL?". The rights it grants are what made the F/OSS ecosystem what it is today. If you want to grant more rights to your users and developers, by all means use a less restrictive but compatible license like modified BSD, or MPL 2. It's easier to do that without fear of withering, now that opensource is the de-facto default choice in a lot of fields. It definitely was not as easy back in the '90s or early '00s, which is when the GPL effectively fought the war for the whole field.

If you choose to stick with pre-MPL2 derivatives, though, you are clearly not interested in granting users and developers the same basic rights as the GPL.

nix23 · on Aug 25, 2020

Look the problem is, you CAN make Linux as proprietary as you want if you DON'T redistribute it, like google, so google cannibalizes Linux too.

It's just a different vision, if you want real freedom and want to work with a community go with BSD/MIT.

>ended up the same way BSD systems have: a largely academic project routinely cannibalized by behemoth closed-source companies.

That's BS, just look at how Netflix and Dell EMC work with FreeBSD. And for example Juniper, they learned a lesson not working with the community and fell far behind because of that.

https://www.freebsd.org/releases/12.0R/relnotes.html

toyg · on Aug 25, 2020

I am not going to argue beliefs, MIT/BSD and GPL are compatible. My point on cannibalization was in the historical context. Today the situation is different (thanks largely to what the GPL did in the past). And of course Linux is GPL2, not 3 - which would address some of the issues you raise.

All of these licenses would have avoided the problems we see with ZFS' adoption of CDDL or other MPL1 derivatives.

trasz · on Aug 25, 2020

The historical context is that Linux had a proper community support at the time when everyone was looking for Unix on x86, to migrate off RISC workstations, and BSDs at the time were either a bunch of jerks, or had unclear status due to ATT vs UCB lawsuit. You could try to argue that licensing had played a role at this, but experience with other projects suggests otherwise.

toyg · on Aug 25, 2020

You know what enabled that "proper community"? A license that provided a sort of moral compass, fostering collaboration and goodwill towards shared improvement, safe in the knowledge that efforts wouldn't be abused. Something that other licenses still fail to provide, to this day.

trasz · on Aug 26, 2020

Except that communities work equally well with other licenses. GPL doesn’t enable anything; there’s just no correlation there.

toyg · on Aug 26, 2020

> Except that communities work equally well with other licenses.

Today. Not in the '90s cut-throat "pirates of SV" environment.

trasz · on Aug 26, 2020

Not true, as evidenced by Postgres, or Apache, or X11, to give just a few examples of GPL-free communities from the nineties.

trasz · on Aug 25, 2020

Even a cursory look at the open source ecosystem shows that GPL doesn't work like that. In fact, you can easily find examples that show the exact opposite: look what happened to MySQL, while Postgres continues to thrive. Look at GPL-licensed X11 implementations, or GPL web servers, or DNS servers (which are pretty much nowhere).

toyg · on Aug 25, 2020

You cannot argue on one side that Linux's success was due to commercial happenstance, as you do in another comment, and on the other hand that Postgresql's success vs MySQL is entirely due to the license. In fact, before the Sun acquisition and following mismanagement, MySQL was the opensource rdbms, the M in LAMP. When opensource started to take off and the likes of Microsoft fired back hard with FUD and lawsuits, 3 out of 4 main tools in that stack were GPL and the other was GPL-compatible.

Obviously each project and field has a story; licensing and timing are both part of it.

I would also argue that we are eventually going to pay for the ecosystem refusal to adopt GPL3, which is effectively "closing" away the software at another layer.

trasz · on Aug 26, 2020

I’m not saying the success of Postgres is entirely due to license; what I’m saying is the license is not a significant factor.

MySQL was always the more popular database, true. But it was also the one technically inferior, a PHP of databases.

As for GPL3 - on the contrary, the problem is the adoption of GPL3, which forced many parties to switch to non-GPL alternatives. But it also gave boost to Open Source alternatives, putting an end to the unhealthy monopoly of GNU toolchain.

toyg · on Aug 26, 2020

> the problem is the adoption of GPL3, which forced many parties to switch to non-GPL alternatives.

Those are the same sort of businesses that, back in the '90s, would take BSD code and ship it without giving anything back. They were using GPL code only because they could respect the letter of the license while ignoring the spirit, thanks to coincidental technological changes (the move from PCs to server-based architectures). This problem keeps coming up and promotes a parasitic business model.

We'd be better served in the long run, as an ecosystem, by trying to find business models that work with GPL3.

trasz · on Aug 26, 2020

Companies which don’t contribute - just sell code written by others - don’t mind GPL. GPL can be the problem for the rest, as it forces them to immediately donate all their changes to their competitors, who can then offer the same thing cheaper, since they don’t have to amortize the development costs.

cycloptic · on Aug 25, 2020

I'm not sure those are good examples. MySQL lives on with a different name (MariaDB). X11 is currently being replaced by GPL implementations of Wayland, in the form of Kwin and Mutter.

As far as I know there were never any serious attempts at making GPLed alternatives to Apache httpd, nginx and BIND.

trasz · on Aug 25, 2020

To make a license GPL-compatible, it needs to be a subset of GPL. Which means, to make ZFS GPL-compatible, Sun would need to ditch patent protection, among other things. That would create real problems for everyone, while not really helping ZFS on Linux - Linux devs would simply find another excuse; case in point is AdvFS.

toyg · on Aug 25, 2020

I don't think AdvFS ever reached the level of community buy-in that ZFS has. A lot of people in Linux land really like ZFS, IMHO both demand and supply to sustain an in-kernel ZFS is there already.

> ditch patent protection [...] would create real problems for everyone

What problem, exactly?

trasz · on Aug 25, 2020

The problem is the risk of being sued into oblivion by Oracle. Which currently doesn't exist thanks to CDDL patent protection clauses.

toyg · on Aug 25, 2020

Are you arguing that users of MIT/BSD licensed software the world over are "at risk" of being sued by the copyright holder for patent infringement...? Big if true.

trasz · on Aug 26, 2020

MIT/BSD it’s the same as GPL in this aspect; it’s CDDL that’s different by providing patent protection.

toyg · on Aug 26, 2020

yes, so you're saying MIT/BSD licenses should be considered at risk of lawsuits for patent infringement (presumably from the same copyright holder that released the code in the first place)? If that were a realistic threat, most F/OSS projects should shut down tomorrow.

The patent-protection clauses of the CDDL are at odds with the spirit and practice of open licenses anyway, since they are voided as soon as the relevant code is modified.

So basically, the cddl ships clauses that are pointless and likely to be disregarded in practice and in court. Losing them would make no difference whatsoever.

trasz · on Aug 26, 2020

Again, BSD/MIT is the same as GPL in that aspect. It’s not a problem for the open source projects themselves, as there’s no point in suing them. It can be a problem for high profile downstream users.

The CDDL clauses are not voided when the code is modified. Where did you get that idea? What you are saying is plain fantasy.

nix23 · on Aug 25, 2020

>made things awkward on purpose

Maybe, some could say that the GPL2 is really awkward, but you have to know that Microsoft and Apple have their own opensource license too...big business is awkward.

znpy · on Aug 25, 2020

> It's also crazy, nearly no distro has a problem shipping ultra-proprietary software (firmware), but has a big discussion about CDDL, no one wants ZFS in the Kernel tree

Jesus Christ you are absolutely right I hadn't seen the issue under that light.

catmistake · on Aug 25, 2020

> imagine Linus has control over it and has no clue was ZFS even is

I keep getting down voted, maybe I am too annoying for HN. But I'll risk it in the hopes this is helpful, in the social and psychological space.

First of all, I have been in awe of ZFS since... ever. But FS were not my curiosity, OS was, so I installed OpenSolaris and Illumos in VMs, a few times to get used to how these things install. No work, just play. That's me. I was sad Apple did not adopt ZFS, but they did develop a read-only something, kernel extension, I think, don't recall. And there is some support from the generous for OpenZFS on OS X (idk about macOS). Why not ZFS? Because it is different and complicated, and it takes dedication to learn. Had Apple adopted it as the native FS, I would have been forced to learn it, so that is why it made me sad. I need that kind of motivation. But many are this way to some extent. Everyone is industrious and everyone is lazy. When you can learn easily, you learn. When you must learn, you learn. When you get older, you just want to keep doing things the old way, the way you know how, and you can do that for 12+ hours a day and yet learn nothing new.

Linus is just a man. I'm glad the penguinistas have faded into the background, and I am happy for Linux, but fail to see that it was any better than NetBSD when the fanatics were vocal. Linux would have benefitted from not changing a lot of userland stuff that was changed from *NIX for no rational reason, but probably just for ignorance and arrogance. Just MHO. But now Linux is important. It just is what it is.

So Linus is insanely industrious, and also lazy just like most. How often does he learn something new, how often does he just work like a mad dog without learning anything new? Just a man, perhaps a very smart man, but no more. And he is imperfect. And I strongly suspect, by the public evidence of the tell-tale symptoms, he has an identifiable personality disorder, and it is the one that includes the symptom of an unwillingness or inability to recognize the problem. But IIRC, he did admit to things, and supposedly took time off to work on himself. But this can't be cured by the patient, though it, oddly enough, can be cured, one of the few mental illness than can be cured, not just maintained, but only after getting evaluated by a professional, and treated, and the treatment is easy, just sitting with the pro and talking twice a month for 45 minutes for up to two years. At least my understanding of it, it can be cured in two years or less depending on its severity, though rarely ever is. This is not criticism, it is compassion. Presidents get this. Captains of industry get this. Winning NFL quarterbacks get this. And it is not any surprise the Lord of the Linux kernel can get this. And I wish him the best.

We're all allowed to be incorrect and imperfect. But those that stand out get better, because they try. But as far as laziness is concerned, because I am lazy, too, and I believe most are, I can't fault anyone for being lazy and not learning new stuff, especially in order to have a better understanding of it to be able to say something about it that makes sense.

However, those that do change themselves for the better, those that are not lazy and learn new things, should be applauded, and given gratitude for the possibility that their example will be followed... hopefully by me, too.

nix23 · on Aug 25, 2020

I just love your comment and the fact your down-voted just proves the "How often does he learn something new"

catmistake · on Aug 26, 2020

I mean no harm. But I should have just let it hang. No one cares, and this was a technical discussion, not a touchy feely one, and no one wanted the explanation. I embrace the absurd, but I should do it quietly when the adults are talking.

cat199 · on Aug 25, 2020

they do. just to a lesser extent than linux, and policies vary within the family (see also: https://www.mail-archive.com/misc@openbsd.org/msg105077.html)

not going to get into particulars because it's a big topic.

codetrotter · on Aug 25, 2020

Awesome work by all parties involved in this :D

quotemstr · on Aug 25, 2020

ZFS still doesn't do direct IO or automatic defragmentation. It represents an early realization of a good idea, one since clipsed by other tree-based filesystems. You wouldn't fly in a Wright Flyer, so why would you run ZFS?

yjftsjthsd-h · on Aug 27, 2020

> You wouldn't fly in a Wright Flyer, so why would you run ZFS?

Because it has literally zero competitors that can even approach its feature set? Btrfs is a trash fire, bcachefs is years out, nilfs2 doesn't actually verify its checksums, ceph/gluster run largely in userspace (so you can't run them as your root fs). What fs would you like to propose that lets us do snapshots, data integrity, and compression?

quotemstr · on Aug 28, 2020

In what way is btrfs a trash fire? Works fine for me.

fomine3 · on Aug 26, 2020

AFAIK, actual O_DIRECT support is now in progress but performence is not expected to great due to impossibility of CoW optimizing. Anyway lack of O_DIRECT is not important for some use cases.

https://github.com/openzfs/zfs/pull/10018

For defrag, ZFS developers avoid to implement "block pointer rewrite" feature that's needed for defrag due to complexity. I hope BPR project start to improve features.

srinathkrishna · on Aug 25, 2020

Quite the natural evolution since TrueNAS has traditionally been a BSD box and they've invested a lot of time in ZFS. Happy that this will carry over to FreeBSD.

Now, only if Windows would also have some sort of support ;)

chungy · on Aug 25, 2020

https://github.com/openzfsonwindows/ZFSin

stiray · on Aug 25, 2020

What really worries me here is compatibility with older pools.I hope they have thought about that as freebsd implementation is here since 2007.

JackMcMack · on Aug 25, 2020

ZFS has feature flags [1]. If you're carefull not to do 'zpool upgrade' (as this commit message warns) unless you understand the consequences, you can switch between ZFS implementations.

[1] http://build.zfsonlinux.org/zfs-features.html

You can explicity create a pool with certain features disabled.

I'm going to try FreeNAS Core tonight.

yjftsjthsd-h · on Aug 25, 2020

It's my understanding that ZFS is backwards compatible forever; old pools are on old versions or have fewer feature flags, but they can still be used without issue.

stiray · on Aug 25, 2020

Yes in theory they are but currently syncing pools between linux and freebsd zfs is not possible so I would imagine there are some differences. I have around 12Tb of zraided disk space on freebsd and I hope I will be able to upgrade eventually.

hpcjoe · on Aug 25, 2020

About 2 years ago, I shared a zpool between Linux, FreeBSD, and SmartOS/Illumos. It generally just worked, no matter where the zpool was created from. People warned me about not enabling specific features, so I simply used defaults to see what would happen. Again, 2 years ago, it just worked.

moonchild · on Aug 25, 2020

Eh? I currently share a zpool between linux and freebsd, and it works fine without issue.

stiray · on Aug 25, 2020

Thank you for the info, I wasn't aware of that, the last time I have tryed it didt work (1-2 years back)

magicalhippo · on Aug 25, 2020

ZFS is backwards compatible, not forwards compatible.

If you create the pool on pre-OpenZFS FreeBSD, you can use it on Linux, but not vice versa.

I just did this with FreeNAS (based on FreeBSD 11) and Raspbian.

chungy · on Aug 25, 2020

If you leave "zpool create" to its defaults (enable every feature).

There's two levels to consider though if you create a pool with explicit consideration for a system with an older version of ZFS: "-o version" and "-d"/"-o feature@...", both of which can be used to fine tune exactly what the pool can and cannot do.

I've made a list to help out doing it between ZFS-on-Linux releases. It shouldn't be hard to look up FreeBSD feature lists and do the same: https://chungy.keybase.pub/zfs-features-table.html

nix23 · on Aug 25, 2020

But if you create the zfs with openzfs on freebsd it's the same version and works:

sysutils/openzfs/

nix23 · on Aug 25, 2020

Yes they care about that, with the pool and filesystem versioning, Michael W. Lucas talks about that in this vid:

https://www.youtube.com/watch?v=gmaHZBwDKho

PostThisTooFast · on Aug 26, 2020

Remember when Apple was going to adopt ZFS as Mac OS's file system?

Then... nothing further was heard about it.

williesleg · on Aug 25, 2020

Zfs does not work well on sparse volumes, it loves new blocks. That's why you don't see it much in a VM or on storage that is sparse (go nimble, NetApp, etc)

Also it's pretty much impossible to grow a zfs pool and there are no consistency check tools or repair tools. So when you actually do get corruption, you lose it all.

Might have been a breakthrough around 2000 but it's time for something new.

magicalhippo · on Aug 25, 2020

> Also it's pretty much impossible to grow a zfs pool and there are no consistency check tools or repair tools.

This is flat out wrong. First you can trivially grow a zpool by adding another vdev to it. If you can't do that, you can grow a vdev, and hence the zpool using that vdev, by swapping the disks for larger disks one at a time. Once all the physical disks in the vdev are of the new, larger size, the new space will become available automatically.

Also there's absolutely a consistency check and repair tool built in. ZFS computes and stores checksums for all blocks, even when you use a single disk!. It also verifies the checksums when reading blocks. In addition you can trigger a verification of all the blocks by running a scrub command, which simply issues a read for all the blocks.

> So when you actually do get corruption, you lose it all.

Hardly. Not only does ZFS have the above, it also stores multiple copies of important metadata blocks (three copies by default), and it even takes effort to spread those blocks out on the physical disks to reduce the chance of them getting corrupted as much as possible.

I'm not saying ZFS is the perfect filesystem. But it's definitely one of the better if you care about your data.

benlivengood · on Aug 25, 2020

> Also it's pretty much impossible to grow a zfs pool and there are no consistency check tools or repair tools. So when you actually do get corruption, you lose it all.

`zpool scrub` is the consistency check and repair tool. It's happy to detect and repair everything from bit flips to realtime `dd if=/dev/random of=/dev/ada0` of a mirrored/raid-z disk.

Growing a single vdev can be impractical especially for raid-z. Adding new vdevs to a pool is straightforward.

fomine3 · on Aug 26, 2020

> Might have been a breakthrough around 2000 but it's time for something new.

Agree with it but other stable filesystems (XFS, ext4) are 1990s tech. ZFS is only available stable 2000s filesystem. Possibly Bcachefs is the future.

google234123 · on Aug 25, 2020

The fact that there are still 3 main BSDs being developed in parallel seems odd to me. I imagine that they could accomplish much more if all that effort was focused on a single OS.

Rebelgecko · on Aug 25, 2020

The same could be said to the nth degree for Linux. In fact I think you could make the argument that the differences between the 3 most popular BSDs are way more significant than the differences between the 3 most popular Linuces

eatbitseveryday · on Aug 25, 2020

Aren't the kernels amongst the BSDs different enough that effort is split? With GNU/Linux, the variations is much more in the packaging than the kernels, no? Couldn't I download a vanilla kernel, compile it, and boot any Linux distro with it?

spijdar · on Aug 25, 2020

Well, yes/no. There are different ways to configure the kernel, and some distros/userlands require specific kernel features which may or may not be default. Your general point is still true, though.

They are different enough that some of the work is split, however there's significant code sharing in things like device drivers and the like.

Ultimately each being very different is usually seen as a positive. While the <n> Linux distros are spending significant hours optimizing essentially the same userland for minor differences in use cases, the different BSDs have very different goals and results. OpenBSD has an emphasis on making a lean and consistent base system with comprehensive documentation, NetBSD has excellent tooling for cross-compiling, and a comparatively more modular kernel that lends itself to experiments like rump-kernels. I'm less familiar with FreeBSD but it's tended to emphasize features like ZFS and its own robust container solution.

In some cases these features are contradictory, as OpenBSD would be (I imagine) very reluctant to merge NetBSD's in-kernel Lua module support, or NetBSD's multithreaded "high performance" implementation of the firewall pf, "npf", in favor of its own much simpler and more secure pf. OTOH NetBSD has a stronger emphasis on backwards compatibility which somewhat limits their ability to make broad changes to the userland as OpenBSD has. Then, one is much more likely to be able to run old NetBSD binaries on new kernels. :)

Each has its pros/cons, and attracts its own users and developers. The distinguishing thing about *BSDs is the base userland and kernel are deeply intertwined, giving each more freedom to deeply tie userland tools to special kernel functionality. There are pros and cons to this too, but this means that, for example, one can use "ifconfig" to connect to WPA2 WiFi networks on OpenBSD with no other tools required, as the ifconfig userland is always released alongside a matching kernel. The various net-util packages shipped for linux have suffered from having to operate as "external" tools interfacing with a range of kernel versions. (of course, now the situation is better with ip/iw, but it sure took long enough)

zymhan · on Aug 25, 2020

Yes but part of that is inherent in what BSD considers a kernel as well.

_lqaf · on Aug 25, 2020

They have rather different goals.

dman · on Aug 25, 2020

The fact that 7.6 billion people live in parallel seems odd to me. I imagine humanity could achieve a lot more if every aspect of human identity was put into one individual.