ZFS on a single core RISC-V hardware with 512MB

FullyFunctional · on March 13, 2022

Ha, this is awesome, thanks for checking that out. One point of note though, I'm pretty sure it would have been faster to build the kernel + OpenZFS on the Debian/RISC-V in QEMU. QEMU on decent hardware runs very fast and much faster than the D1.

ADD: Geekbench 5.4.1 on RISC-V

- under QEMU/Ryzen 9 3900XT: 82

- under QEMU/M1: 76

- Native D1: 32 (https://browser.geekbench.com/v5/cpu/13259016)

The M1 result is skewed because for some reason AES emulation is much faster on Ryzen. The rest of the integer stuff is faster on the M1, up to 30% faster.

bombcar · on March 13, 2022

Anyone have a low-power RISC or ARM hardware that supports many SATA ports?

LeoPanthera · on March 13, 2022

The Pi Compute Module exposes its PCIe bus. You can use something like the Wiretrustee SATA to connect SATA disks: https://sata.wiretrustee.com

Well, you will be able to. Coming soon.

NewJazz · on March 14, 2022

I don't know about low power (or inexpensive, for that matter), but the HiFive Unmatched has lots of IO. 8 lanes of PCIe 3.0 can be expanded into quite a few SATA 3 ports.

For ARM, rk3568 is available now and has 2 lanes of PCIe 3.0 and up to 3 SATA-3 ports. As opposed to the rpi4 which has a single PCIe 2.1 lane.

Sophistifunk · on March 14, 2022

It's inexpensive in that it can't be bought, at least

epakai · on March 14, 2022

FriendlyElec's NanoPi M4 has a 4-port hat. Their NanoPC T4 has a M.2 PCIe x4 that has potential to be adapted to full-size slot.

Pine64's Rockpro64 has a regular x4 slot for expansion cards.

I've seen lots of upcoming Pi CM4 options, but don't know what's out yet.

potrebitel · on March 14, 2022

NanoPI M4 burned two SD cards and destroyed its own eMMC module due to some weird behavior while I was using it for NAS.

No idea what caused it, I suspect excessive writes and wearing the flash storage or corrupting the bootloader and not being able to recognize the boot media. The "wear" I mentioned could be something else of course, but except the normal OS writes, everything went to the SATA drives.

It sounded like a solid piece of hardware with the 4 SATA-ports-hat and good CPU, but at the end it turned out to be un-reliable as at some point the OS hang, couldn't boot from the OS storage and the bootloader wasn't seeing the partitions via UART debug session.

actually_a_dog · on March 14, 2022

Huh. The only thing I can think of off the top of my head that would cause it to chew up 2 SD cards is excessive amounts of logging. Nothing else really seems to make much sense.

mustache_kimono · on March 13, 2022

Don't know why anyone hasn't made such a board a priority. Seems like a sweet spot.

justaj · on March 13, 2022

Yeah I would love not being dependent on x86 and its horrid UEFI mess. Then again... I believe technically ARM (and even RISC-V) board manufacturers can also make something similar. Let's hope that doesn't become a thing though.

yjftsjthsd-h · on March 14, 2022

Why not? I've been working from the view that UEFI is the one of the few things worth keeping from x86_64 - I do not want to deal with every board having its own special boot process that prevents one disk being able to boot different machines. And UEFI is already here and works, so why reinvent the wheel?

snvzz · on March 14, 2022

RISC-V took steps to prevent chaos.

The application profiles (like RVA22) have a bunch of requirements on hardware that must be present, the boot process and interface the firmware offers to the OS (i.e. opensbi).

NavinF · on March 14, 2022

Ok but does any of that work in practice? Can I write a random distro ISO to an SD card and expect it to work on any RISC-V board that has the same pointer size? Cause that’s how it works in PC land thanks to ACPI tables.

I just checked the Ubuntu RISC-V download page and it has 2 different ISOs for 2 boards from the same vendor! And apparently those are the only 2 boards with ISOs available 0_o

snvzz · on March 14, 2022

>Can I write a random distro ISO to an SD card and expect it to work on any RISC-V board that has the same pointer size?

Yes, you will be able to, at some point.

>I just checked the Ubuntu RISC-V download page and it has 2 different ISOs for 2 boards from the same vendor!

No board out there is RVA22 compliant, as RVA22 itself isn't done and closed. It is expected to be this spring.

By the time large scale production of boards happen and they ship to the masses, this will be a solved problem.

I do not expect this to happen this year. Maybe the next it'll start to pick up.

ulzeraj · on March 14, 2022

So you want yo be dependent on u-boot plus sketchy binary blobs mess?

NavinF · on March 14, 2022

Now there’s a hot take! In PC land the same bootloader, kernel, config files, etc will work in pretty much computer. Hot swappable components are the norm here.

Meanwhile in ARM/RISC-V embedded land, every chip and every board is its own special snowflake. With no ACPI/UEFI, someone’s gotta hardcode the config of every device on every board, including the ones inside the SoC. Naturally the communities around these boards are even more fragmented than Linux distros already are.

kornholi · on March 13, 2022

The new RK3588 boards look promising. ITX3588J comes with 4 SATA ports and a PCIe 3 x4 slot that you could add more with, but it's not shipping yet.

dnr · on March 14, 2022

If "two" is acceptable for "many", look at the ODROID-HC4? I made some home NASs out of them.

vorpalhex · on March 13, 2022

Would also love to see this, even if it's experimental or beta.

PiBox is the only contender I am aware of.

ksec · on March 14, 2022

Can we have Consumer NAS that support ZFS natively now? Synology? Even low end Synology Drive have 1GB DDR Memory and a Quad Core Cortex A55 CPU.

aborsy · on March 14, 2022

TrueNAS. If you don’t care about a GUI, you can easily set up any server.

philjohn · on March 14, 2022

The top of the range QNAP rackmount models have ZFS natively.

nix23 · on March 14, 2022

>Consumer NAS

I other words, cheap unreliable plastic-boxes.

You are much better of with TrueNAS and a HP MicroServer Gen10 Plus (with xeon and ecc)

azubinski · on March 14, 2022

Yes, you can use a hair dryer to make roast chicken and you can run Linux kernel with ZFS support on chinese risc-v implementation.

The world has become better, life has become more interesting.

magicalhippo · on March 14, 2022

Context: https://news.ycombinator.com/item?id=30669030

michaelmrose · on March 13, 2022

There has not ever been a reason for memory to be correlated with storage capacity nor any reason to believe that such a correlation ought to exist.

Nobody ever said well I plugged an 20TB external hard drive so I better plug in a few more sticks of RAM so that works.

Dedup needs RAM in proportion to storage because for each duplicate block it maintains an entry in an in memory table.

lazide · on March 13, 2022

You uh just kind of contradicted yourself?

All file systems have metadata which is good to keep in memory. Building several 50+TB NAS boxes recently, it isn’t just ZFS either. And it isn’t some sort of linear performance penalties sometimes when you don’t have enough RAM. It can be kernel panics, exponential decay in performance, etc.

michaelmrose · on March 13, 2022

I didn't contradict myself at all virtually nobody needs dedup its not remotely worth the RAM cost for 99.9% of users.

Can you quantify what you are saying. What OS/filesystem? What minimum RAM requirements for what amount of storage?

lazide · on March 13, 2022

That’s a good point - I’ve seen free memory drop every time I’ve built the larger file systems (and not from just the cache), but I never tried to quantify it. And I don’t see any good stats or notes on it.

seems like no one is building these larger systems on boxes small enough for it to matter, or at least google isn’t finding it.

michaelmrose · on March 13, 2022

Another way of saying this is that RAM usage doesn't meaningfully scale with storage size in scope of storage systems encountered by actual non theoretical people because the minimum ram available on any system one encounters is sufficient to service the amount of storage that it is possible to use on said system.

AdrianB1 · on March 14, 2022

The main case of deduplication that I know is hosting many virtual machines with the same OS. If that OS is Windows (explains why VM and not containers), there are dozens of GB of data that is duplicated per VM. It is not 'nobody', even if it is not common and not always worth.

magicalhippo · on March 13, 2022

> There has not ever been a reason for memory to be correlated with storage capacity nor any reason to believe that such a correlation ought to exist.

However specific implementations can indeed have memory requirements that scale in relation to storage capacity. For example, if the implementation keeps the bitmap of free space in memory, then more storage = larger bitmap = more memory required.

There's been several attempts in ZFS to reduce memory overhead. I'm pretty sure that if you took a decade old version of ZFS you'd struggle to run it on a system with 512MB RAM.

radiowave · on March 14, 2022

My first ZFS box (running OpenSolaris 2008.11) ran on 512MB, sometimes including a Gnome 2 desktop. It wasn't fast - I didn't need it to be - but it absolutely did work.

magicalhippo · on March 14, 2022

Interesting. I started with ZFS around 2009, and I recall several people struggling to us it on a system with less than 2GB, even after messing with tuneables. That was on FreeBSD though, so maybe implementation specific.

Annatar · on March 14, 2022

I've been using ZFS on Solaris 10 on a 4 x Pentium Pro @ 200 MHz a socket, with 256 MB of RAM since July 2006. Wasn't a speed demon but it ran okay for years as our central storage server until we upgraded to faster hardware.

magicalhippo · on March 14, 2022

Your stories got me inspired, so I downloaded the FreeBSD 8.0 image, released 2009, and fired it up in a VM with 256MB memory. I then created a 2TB dynamically-allocated disk in the VM and used it to create a single-vdev pool.

ZFS does complain that the minimum recommended memory is 512MB and that I can expect unstable behavior. However basic file I/O seems to work, I copied some multi-GB files around and such without issues.

So seems the bare minimum was lower than I recalled, at least on a plain system.

A proper test would include heavily fragmenting the pool, and preferably with more vdevs. But it's something.

radiowave · on March 14, 2022

What's supposed to happen with ZFS is that if the rest of system needs more memory, ZFS should back off and release some of the memory that it's using. It can do this because most of its memory use is just caching, and while there no doubt is a limit to how far you can take this, I never heard anyone question the effectiveness of this on the OpenSolaris/Illumos implementation.

I don't follow FreeBSD closely, but if memory serves there was a concern, particularly in the early days of the ZFS port, that their implementation couldn't be counted on to release RAM fast enough, if the system suddenly came under significant memory pressure. Hence the advice was to always run with more-than-sufficient RAM to minimize the likelihood of getting into low memory situations. I think this is a significant part of why FreeNAS considers 8GB to be the minimum supported configuration.

So it seems to me that this isn't really about ZFS's RAM requirement, rather it's about hedging against the volatility of the RAM requirements of other software on the same box, in case ZFS can't back off fast enough.

These days I run ZFS on Linux, and I remember about 4 years back spinning up some bulk data processing job that was configured to use 14GB of RAM, on a 16GB box, and watching ZFS's ARC RAM use drop in a single second from 5.5GB to 0.5GB. So I'm satisfied that for my purposes, on ZoL in recent times, this isn't an issue I need to worry about.

Annatar · on March 14, 2022

The bare minimum is 128 MB. A fully working yet minimal Solaris 10 system runs in less than 64 MB (~57 MB).

ZFS releases ARC memory as the memory pressure from applications running on the system increases; it's been that way since day 1.

michaelmrose · on March 13, 2022

At present 512MB of RAM is notable in how ridiculously tiny it is and 2TB is still an acceptable amount of storage. Without resorting to decades obsolete software can you put a pin on exactly how much storage it would take to render that tiny amount of RAM unusable and then explain how much storage it would take to render a machine with 4GB of RAM likewise unusable so that we may demonstrate memory usage scaling with storage?

magicalhippo · on March 14, 2022

My point was merely that your blanket statement doesn't really hold water, since any actual memory requirements by a filesystem would be implementation specific.

I will agree that ZFS should handle large pools once you clear the ~fixed minimum memory requirement.

glenneroo · on March 14, 2022

We've been using an old CORAID box running 24 drives with 50TB usable (100TB actual) on 16GB of RAM using FreeNAS 9.x for years without noticeable problems :) I've tried to upgrade to 32GB a couple times but for whatever reason the board won't allow more than 16GB even though according to Intel docs the RAM should be compatible. We have up to 12 PCs connected at gigabit and never any noticeable lag, even while resilvering, though I'm sure it would be faster if there was more RAM available.

yjftsjthsd-h · on March 14, 2022

> On Debian unstable (at time of writing, March 2022: bookworm/sid), OpenZFS is not yet available as debian package (zfsutils-linux). Hence you have to compile and build OpenZFS yourself.

> Unfortunately OpenZFS seems not to support (yet) cross-compiling for the RISC-V platform, hence you have to build the kernel on the RISC-V board.

Both of these seem like low hanging fruit, yes? If the upstream code supports it I'm surprised Debian doesn't already have packages, and cross-compiling isn't that special.

2Gkashmiri · on March 13, 2022

hope this coupled with more tech on risc-v hardware can bring it to the level of raspbery pi with all the community and hardware devices and the accessories and all that.

will it take a decade ? less?

FullyFunctional · on March 13, 2022

I hope it won't be a decade, but remember the (original) Raspberry Pi launched on a very mature part, with a _very_ mature (ancient) ISA.

Outside the discount pricing, Intel has promised to tape out SiFive's P650. Revos, Tenstorrent, and others are also working on fast cores, but it'll be at least 2-3 years before they hit the market if at all.

So far SiFive's dual issue in-order core (~ 40 GeekBench 5.4.1) (like on now-cancelled BeagleV) is the fastest chip you can buy as a lay person. The D1 (~ 32 GB 5.4.1) is cheaper but less powerful.

mhh__ · on March 14, 2022

It will only happen if a company decides to try and be the next raspberry pi chip (or as a by-product like the OG broadcom chip from the Pi1).

The actual chip designs, I think, are already there in terms of getting a high-performance risc-v chip built, but currently the market and tech stack is still getting they so we are still at the scaled-down-test phase in the high end market.

FWIW though I don't see much reason to care about the ISA of the CPU beyond it being RISC, chances are it'll still be full of all kinds of closed source crap like the Pi.

snvzz · on March 14, 2022

>chances are it'll still be full of all kinds of closed source crap like the Pi.

RISC-V's application profiles (like RVA22) do have requirements regarding some hardware that must be present (serial port with a specific interface), boot process and firmware interfaces.

These are there to prevent an ARM-like messy situation.

dark-star · on March 13, 2022

But can it do dedupe on such a box? I think the recommendation is still "1GB of RAM for each TB of storage" if you're using dedupe...

I still have some boards with ~512mb RAM lying around (an UltraSPARC for example) that I'd love to re-purpose to a cheap NAS, just for the heck of doing it on a non-x86 platform....

lazide · on March 13, 2022

I’ve tried dedup out, and even with a large powerful box with a LOT of duplicate files (multi-TB repositories of media files which get duplicated several times due to coarse snapshotting from other less fancy systems), I get near zero deduplication. I think it was literally low single digits percents.

ZFS dedup is block based, and actual block size varies depending on data feed rate for most workloads (zfs queues up async writes and merges them), so in practice once a file gets some non-zero block offset somewhere which happens all the time, even identical files don’t dedup.

magicalhippo · on March 13, 2022

While regular dedup is only a win for highly specific workload, the file-based deduplication[1][2] which is in the works seems like it can have some potential.

They discussed it, along with some options for a background-scanning dedup service (trying to find potential files to dedup), in the February leadership meeting[3].

[1]: https://openzfs.org/wiki/OpenZFS_Developer_Summit_2020_talks...

[2]: https://youtu.be/hYBgoaQC-vo

[3]: https://www.youtube.com/watch?v=hij7PGGjevc

oofbey · on March 14, 2022

This sounds cool. Current block dedup really seems like a simple (naive) v1 compared to what should be possible.

mlok · on March 13, 2022

ZFS Dedup has been wonderful for me : dedupratio = 7.05x (144 GB stored on a 25 GB volume, and still 1.3 GB left free). I use it for backups of versions of the same folders and files slowly evolving over a long period of time ( > 15 years) that gives a lot of duplication, of course. (I could also use compression on top of it)

infogulch · on March 14, 2022

Backups are the 99% case for duplicate files, but aren't snapshots a better replacement in every way? Snapshots are already deduplicated as soon as you take them, plus they're instant. Maybe if your backups are coming from a non-zfs system, but you could probably convert normal backups into snapshots without too much trouble.

Why is dedup even present when the primary use case (backups) is better served in every way by snapshots?

toast0 · on March 14, 2022

Dedup (if it worked like it might have!) could solve the backup use case without needing to dictate your workflow.

In theory, it could also really help with virtualized disk workloads where there may be a lot of duplicated data from the base OS, but you can't use a snapshot (easily) because windows won't run from a zfs filesystem. You could maybe do snapshotting on zfs volumes, but that's not as flexible as a dedupe that worked as imagined.

Personally, I think online dedupe ends up being too expensive in memory and computation and ends up missing things because of divergent block sizes or offsets as another poster mentioned. ZFS doesn't support an offline dedupe, but I think btrfs does. That might be more interesting. It's still expensive to find duplicates, but it's possible, and it'd be neat to be able to rewrite the metadata to refer to a single copy and free some space, maybe.

Wowfunhappy · on March 13, 2022

Wow, that’s worse than I realized! Honestly, this makes me wonder whether the feature should even exist in ZFS. Given the enormous hardware requirements and minimal savings... well, I’d be curious to hear if anyone has ever found a real use case.

FullyFunctional · on March 13, 2022

Does anyone actually use dedup? I think even the OpenZFS documentation says compression is more useful in practice. If at all, dedup should be an offline feature, to be run as scheduled by the operator.

My setup tries to get the absolute highest bandwidth and uses NVMe sticks in a stripe (I get my redundancy elsewhere), no compression, no dedup and yet can only hit ~ 3.5 GB/s reads (TrueNAS Core, EPYC 7443P, Samsung 980PRO, 256 GiB). I hope TrueNAS SCALE will perform better.

watersb · on March 13, 2022

My first ever large (> 4TB) ZFS pool is still stuck with dedup. It's a backup server, gets about 2x with deduplication.

At the time, it was the difference between slow and impossible: I couldn't afford another 2x of disks.

These days, the pool could fit on a portable SSD that would fit in my pocket.

Careful, file-based dedup on top of ZFS might be more effective.

Small changes to single, large files see some advantage with block based deduplication. You see this in collections disk images for virtual machines.

You might see that in database applications, depending on log structure. I don't know, I don't have that experience.

For most of us, file-based deduplication might work out better, and is almost certainly easier to understand. You can come with a mental model of what you're working with, dealing with successive collections of files.

Even though files are just another abstraction over blocks, it's an abstraction that leaks less without the deduplication.

I haven't used a combination of encryption and deduplication. That was Really Hard for ZFS to implement, and I'm not sure how meaningful such a combination is in practice.

justinclift · on March 14, 2022

> no compression, no dedup and yet can only hit ~ 3.5 GB/s reads (TrueNAS Core, EPYC 7443P, Samsung 980PRO, 256 GiB)

Hmmm, that 3.5GB/s sound low. From rough memory of doing initial storage benchmarking of our "new" Hetzner dedicated boxes a few months ago (AX51-NVMe, https://www.hetzner.com/dedicated-rootserver/ax51-nvme), they were giving about 10GB/s with mirrored NVMe drives.

Just logged into one of those boxes now, and it's running 2x 1TB Samsung PM9A1 drives (https://semiconductor.samsung.com/ssd/pc-ssd/pm9a1/), compression is on (lz4), and dedup is off.

(Didn't do any real tuning at the time, as these specs were already far in excess of what's needed for these servers.)

Would enabling lz4 compression be useful for your use case?

FullyFunctional · on March 14, 2022

Thanks for the insight. I'll can try that. Note however that reading from the raw device on the same hardware & OS I hit 5+ GiB/s (IIRC).

bombcar · on March 13, 2022

It would be nice if ZFS was able to combine dedup and compression - basically be able to notice that a block/file/datastream was similar/identical to another one, and do compression along with a pointer ...

willis936 · on March 13, 2022

ZFS can have both features enabled at once.

Though there is no clean way to disable either. Compression can be removed from files by rewriting them, but removing deduplication requires copying over all data to a fresh pool.

nightfly · on March 14, 2022

You should be able to remove the deduplication data by zfs send/recv within the same pool too, I think

lazide · on March 15, 2022

Zfs send/recv sends the blocks as written to the original filesystem (which is why it can be so fast, it doesn’t have to ‘understand’ what it happening or defragment things to read like reading a file does), but that also means undoing or applying dedup won’t work correctly unless it’s screwing with things you probably don’t want it too.

One issue I had is that due to what I eventually tracked down as power issues, I had some corrupted data written to disk under my zfs pool (at the media write later), and I had dedup on.

So dedup, unfortunately, actually made it REALLY suck to fix, because I couldn’t even copy a new version of the file to the same pool! It kept nuking the duplication, and keep the old bad data and I then couldn’t read the copy. :s

It even did this after I deleted everything, because prune couldn’t remove the bad underlying entries because it was having a media failure.

So delete files, scrub, put new files on resulted in them having the exact same failure.

When I nuked the pool and recreated it, it was all fine though.

So yeah, be careful with dedup.

nightfly · on March 15, 2022

Zfs send/recv actually does send data at a logical level, unless instructed otherwise. There are options to send deduplicated streams, streams maintaining compression, and raw streams but none of those are the default. Also, see my reply to a sibling comment.

willis936 · on March 14, 2022

Not according to people who know more about ZFS than me.

https://www.truenas.com/community/threads/zfs-dedup-disable-...

nightfly · on March 15, 2022

So at a pool level you might not be able to turn it off once it's turned on, but you can also turn off deduplication per file system, including in properties you set when receiving a stream. I wasn't confident this would work, but a test proved it can. (chicken_test/dedup_source had deduplication enabled and 16 copies of the same 100MiB file)

  chicken:~# zpool list
  NAME           SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    
  HEALTH  ALTROOT
  chicken_test    15G   144M  14.9G        -         -     0%     0%  16.00x    
  ONLINE  -
  chicken:~# zfs send chicken_test/dedup_source@send | zfs recv -o dedup=off 
  chicken_test/nodedup_dest
  chicken:~# zpool list
  NAME           SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    
  HEALTH  ALTROOT
  chicken_test    15G  2.29G  12.7G        -         -     0%    15%  16.00x    
  ONLINE  -
  chicken:~# zfs get dedup chicken_test/nodedup_dest
  NAME                       PROPERTY  VALUE          SOURCE
  chicken_test/nodedup_dest  dedup     off            local
  chicken:~# zfs destroy -r chicken_test/dedup_source
  chicken:~# zpool list
  NAME           SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    
  HEALTH  ALTROOT
  chicken_test    15G  2.29G  12.7G        -         -     0%    15%  1.00x    
  ONLINE  -

lazide · on March 13, 2022

practically speaking the tradeoffs to make that work are unlikely to make you or anyone else happy except in some VERY specific workloads.

yjftsjthsd-h · on March 14, 2022

> Honestly, this makes me wonder whether the feature should even exist in ZFS.

My understanding is that the OpenZFS project devs feels that the answer is an emphatic no, it should not exist, but they're committed to backwards compatibility so they won't drop it. (Take with a grain of salt; that's an old memory and I can't seem to find a source in 30s of searching.)

spullara · on March 13, 2022

Wow, that is a very naive dedup algorithm.

lazide · on March 13, 2022

Without restricting pretty heavily how you can interact with files or causing severe bottlenecks it's the best that can be done probably, since the FS API doesn't provide any real guarantees about what data WILL be written later, or how much of it, etc. So it has to figure things out as it goes with minimal performance impact.

mustache_kimono · on March 13, 2022

Has anyone done any better at the FS layer?

Wowfunhappy · on March 13, 2022

Yeah, I think the author may be mixing up recommendations for dedup vs non-dedup. The solution is always to not enable dedup, it's a niche feature that's not worthwhile outside of very specific scenarios.

R0b0t1 · on March 13, 2022

The speed you want them to run is a factor also. The rule of thumb hasn't applied for a while, he's right in noting that in the post.

Wowfunhappy · on March 13, 2022

I thought it does generally apply for dedup, though, because ZFS is then required to keep the dedup tables in memory?

rdc12 · on March 14, 2022

It isn't required (in the strict sense) to keep the dedup table in memory, the problem is that performance is dire when it doesn't. It would be pretty similar to virtual memory thrashing, when the table is not fully in memory.