Besides the licensing issue, I wonder if optimizing ZFS for low latency + low RAM + low power on iPhone was an uphill battle or if it’s easy. My experiencing running ZFS years ago was poor latency and large RAM use with my NAS, but that hardware and drive configuration was optimized for low $ per gb stored and used parity stuff.
While its deduplication feature clearly demands more memory, my understanding is that the ZFS ARC is treated by the kernel as a driver with a massive, persistent memory allocation that cannot be swapped out ("wired" pages). Unlike the regular file system cache, ARC's eviction is not directly managed by the kernel. Instead, ZFS itself is responsible for deciding when and how to shrink the ARC.
This can lead to problems under sudden memory pressure. Because the ARC does not immediately release memory when the system needs it, userland pages might get swapped out instead. This behavior is more noticeable on personal computers, where memory usage patterns are highly dynamic (applications are constantly being started, used, and closed). On servers, where workloads are more static and predictable, the impact is usually less severe.
I do wonder if this is also the case on Solaris or illumos, where there is no intermediate SPL between ZFS and the kernel. If so, I don't think that a hypothetical native integration of ZFS on macOS (or even Linux) would adopt the ARC in its current form.
The ZFS driver will release memory if the kernel requests it. The only integration level issue is that the free command does not show ARC as a buffer/cache, so it misrepresents reality, but as far as I know, this is an issue with caches used by various filesystems (e.g. extent caches). It is only obvious in the case of ZFS because the ARC can be so large. That is a feature, not a bug, since unused memory is wasted memory.
You were downvoted but I have also run into situations where it didn’t and caused a cascade of processes getting out of memory errors. In both instances I was pushing the server beyond what was reasonable.
I assume that the VM2 project achieved something similar to the ABD changes that were done in OpenZFS. ABD replaced the use of SLAB buffers for ARC with lists of pages. The issue with SLAB buffers is that absurd amounts of work could be done to free memory, and a single long lived SLAB object would prevent any of it from mattering. Long lived slab objects caused excessive reclaim, slowed down the process of freeing enough memory to satisfy system needs and in some cases, prevented enough memory from being freed to satisfy system needs entirely. Switching to linked lists of pages fixed that since the memory being freed from ARC upon request would immediately become free rather than be deferred to when all of the objects in the SLAB had been freed.
This seems like an early application of the Tim Cook doctrine: Why would Apple want to surrender control of this key bit of technology for their platforms?
The rollout of APFS a decade later validated this concern. There’s just no way that flawless transition happens so rapidly without a filesystem fit to order for Apple’s needs from Day 0.
(Edit: My comment is simply about the logistics and work involved in a very well executed filesystem migration. Not about whether ZFS is good for embedded or memory constrained devices.)
What you describe hits my ear as more NIH syndrome than technical reality.
Apple’s transition to APFS was managed like you’d manage any kind of mass scale filesystem migration. I can’t imagine they’d have done anything differently if they’d have adopted ZFS.
Which isn’t to say they wouldn’t have modified ZFS.
But with proper driver support and testing it wouldn’t have made much difference whether they wrote their own file system or adopted an existing one. They have done a fantastic job of compartmentalizing and rationalizing their OS and user data partitions and structures. It’s not like every iPhone model has a production run that has different filesystem needs that they’d have to sort out.
There was an interesting talk given at WWDC a few years ago on this. The roll out of APFS came after they’d already tested the filesystem conversion for randomized groups of devices and then eventually every single device that upgraded to one of the point releases prior to iOS 10.3. The way they did this was to basically run the conversion in memory as a logic test against real data. At the end they’d have the super block for the new APFS volume, and on a successful exit they simply discarded it instead of writing it to persistent storage. If it errored it would send a trace back to Apple.
Huge amounts of testing and consistency in OS and user data partitioning and directory structures is a huge part of why that migration worked so flawlessly.
To be clear, BTRFS also supports in-place upgrade. It's not a uniquely Apple feature; any copy-on-write filesystem with flexibility as to where data is located can be made to fit inside of the free blocks of another filesystem. Once you can do that, then you can do test runs[0] of the filesystem upgrade before committing to wiping the superblock.
I don't know for certain if they could have done it with ZFS; but I can imagine it would at least been doable with some Apple extensions that would only have to exist during test / upgrade time.
[0] Part of why the APFS upgrade was so flawless was that Apple had done a test upgrade in a prior iOS update. They'd run the updater, log any errors, and then revert the upgrade and ship the error log back to Apple for analysis.
I don't see why ZFS wouldn't have gone over equally flawlessly. None of the features that make ZFS special were in HFS(+), so conversion wouldn't be too hard. The only challenge would be maintaining the legacy compression algorithms, but ZFS is configurable enough that Apple could've added their custom compression to it quite easily.
There are probably good reasons for Apple to reinvent ZFS as APFS a decade later, but none of them technical.
I also wouldn't call the rollout of APFS flawless, per se. It's still a terrible fit for (external) hard drives and their own products don't auto convert to APFS in some cases. There was also plenty of breakage when case-sensitivity flipped on people and software, but as far as I can tell Apple just never bothered to address that.
Using ZFS isn't surrendering control. Same as using parts of FreeBSD. Apple retains control because they don't have an obligation (or track record) of following the upstream.
For zfs, there's been a lot of improvements over the years, but if they had done the fork and adapt and then leave it alone, their fork would continue to work without outside control. They could pull in things from outside if they want, when they want; some parts easier than others.
If it were an issue it would hardly be an insurmountable one. I just can't imagine a scenario where Apple engineers go “Yep, we've eked out all of the performance we possibly can from this phone, the only thing left to do is change out the filesystem.”
Does it matter if it’s insurmountable? At some point, the benefits of a new FS outweigh the drawbacks. This happens earlier than you might think, because of weird factors like “this lets us retain top filesystem experts on staff”.
It’s worth remembering that the filesystem they were looking to replace was HFS+. It was introduced in the 90s as a modernization of HFS, itself introduced in the 80s.
Now, old does not necessarily mean bad, but in this case….
If I recall correctly, ZFS error recovery was still “restore from backup” at the time, and iCloud acceptance was more limited. (ZFS basically gave up if an error was encountered after the checksum showed that the data was read correctly from storage media.) That's fine for deployments where the individual system does not matter (or you have dedicated staff to recover systems if necessary), but phones aren't like that. At least not from the user perspective.
ZFS has ditto blocks that allows it to self heal in the case of corrupt metadata as long as a good copy remains (and there would be at least 2 copies by default). ZFS only ever needs you to restore from backup if the damage is so severe that there is no making sense of things.
Minor things like the indirect blocks being missing for a regular file only affect that file. Major things like all 3 copies of the MOS (the equivalent to a superblock) being gone for all uberblock entries would require recovery from backup.
If all copies of any other filesystem’s superblock were gone too, that filesystem would be equally irrecoverable and would require restoring from backup.
As far as I understand it, ditto blocks were only used if the corruption was detected due to checksum mismatch. If the checksum was correct, but metadata turned out to be unusable later (say because it was corrupted in memory, and the the checksum was computed after the corruption happened), that was treated as a fatal error.