I tested four NVMe SSDs from four vendors – half lose FLUSH'd data on power loss (2022)

r1ch · on Nov 22, 2023

We shipped a shader cache in the latest release of OBS and quickly had reports come in that the cached data was invalid. After investigating, the cache files were the correct size on disk but the contents were all zero. On a journaled file system this seems like it should be impossible, so the current guess is that some users have SSDs that are ignoring flushes and experience data corruption on crash / power loss.

fulafel · on Nov 22, 2023

I think this is typical behaviour with ext4 on Linux, if the application doesn't do fsync/fdatasync to flush the data to disk.

Depending on mount options, ext4fs does metadata journaling ensuring the FS itself is not borked, but not data journaling which would safeguard the file contents in event of unclean shutdown with pending writes in the caches.

The same phenomenon is at play when people complain that their log files contain NUL bytes after a crash. The file system metadata has been updated for the size of the file to fit the appended write, but the data itself was not written out yet.

Dylan16807 · on Nov 22, 2023

The current default is data=ordered, which should prevent this problem if the hardware doesn't lie. The data doesn't go in the journal, but it has to be written before the journal is committed.

There was a point where ext3 defaulted to data=writeback, which can definitely give you files full of null bytes.

And data=journal exists but is overkill for this situation.

charleshn · on Nov 22, 2023

It's likely because of delayed allocations (delalloc): https://issuetracker.google.com/issues/172227346#comment6

because the only guarantee which data=ordered provides is the security guarantee that stale data won't be revealed.

Yes, it's bad and breaks prefix append consistency, and does not match the documentation...

fulafel · on Nov 22, 2023

For more context, that's a comment from one of the ext4 main authors, Ted Ts'o. the other subsequent comment from him spells out the case more but sadly no spelled out NUL byte origin story I spotted from skimming.

charleshn · on Nov 30, 2023

The original report [0] shows the corruption due to NUL bytes at the end of the file (see the hexdump). This comment [1] from Ted Ts'o details the exact chain of events leading to it.

[0] https://issuetracker.google.com/issues/172227346 [1] https://issuetracker.google.com/issues/172227346#comment8

matja · on Nov 22, 2023

> which should prevent this problem if the hardware doesn't lie.

Or, one can take the ZFS approach and assume the hardware often lies :)

worthless-trash · on Nov 22, 2023

I do not know how zfs will overcome hardware lying. If its going to fetch data that is in the drives cache, how will it overcome the persistence problem ?

danarmak · on Nov 22, 2023

It will at the very least notice that the read data does not match the stored checksum and not return the garbage data to the application. In redundant (raidz) setups it will then read the data from another disk, and update the faulty disk. In a non-redundant setup (or if enough disks are corrupted) it will signal an IO error.

An error is preferred to silently returning garbage data!

nolist_policy · on Nov 22, 2023

The "zeroed-out file" problem is not about firmware lying though, it is about applications using fsync() wrongly or not at all. Look up the O_PONIES controversy.

Sure, due to their COW nature zfs and btrfs provide better behavior despite broken applications. But you can't solve persistence in the face of lying firmware.

Even thought zfs has some enhancements to not corrupt itself on such drives, if you run for example a database on top, all guarantees around commit go out the window.

Dylan16807 · on Nov 22, 2023

"Renaming a file should always happen-after pending writes to that file" is not a big pony. I think it's a reasonable request even in the absence of fsync.

nolist_policy · on Nov 22, 2023

Well, for one rename() is not always meant to be durable. It can also be used for IPC, for example some mail servers use it to move mails between queues. Flushing before every rename is unexpected in that situation.

Fun fact: rename() is atomic with respect to running applications per POSIX, that the on-disk rename is also atomic is only incidental.

Dylan16807 · on Nov 23, 2023

I'm not suggesting flushing for rename. If a file write and a rename happen shortly before power loss, and neither goes through, that's fine.

With this rule, three outcomes are acceptable: both occur, or neither occur, or just the file write happens. The unacceptable outcome is that just the rename happens.

("file write" here could mean a single write, or an open-write-close sequence, it doesn't particularly matter and I don't want to dig through old discussions in too much detail)

fulafel · on Nov 22, 2023

As an aside, can you still get the bad checksum file contents with zfs? Eg if it's a big database with its own checksums you might want to run a db level recovery on it.

gumby · on Nov 22, 2023

It can’t tell if the drive is lying to it.

worthless-trash · on Nov 22, 2023

This was my theory too, its not going to help in -these- situations. I can't see how.

matja · on Nov 22, 2023

Actual file data ends up in the same transaction group (txg) as metadata if both are changed within the same txg commit (either flushed explicitly, due to recordsize/buffer limit being reached, or txg commit timeout - 5 seconds by default), so if there is a write barrier violation caused by hardware lies followed by an untimely loss of power, the checksums for the txg updates won't match and they get rolled-back until the last valid one when importing the pool - which doesn't end up zero'ing out extents of a file (like in xfs) or ending up with a zero file size (like on ext3/ext4).

altfredd · on Nov 22, 2023

The "data" setting of ext filesystems isn't replacement for fsync().

Dylan16807 · on Nov 22, 2023

It's not a replacement but it can give you some guarantees.

Also fsync is a terrible API that should be replaced, but that's mostly a different topic.

the8472 · on Nov 22, 2023

At least on linux you can use io_uring to make fsync asynchronous. And you can initiate some preparatory flushing with sync_file_range and only do the final commit with fsync to cut down the latency.

consp · on Nov 22, 2023

My only n=1 observation is that null values in logs occur on nvme, ssd and spinning rust. All ext4 with defaults. I do have the idea it occurs more on nvme drives though. But maybe my systems settings are just booked.

lxgr · on Nov 22, 2023

I don't think that's how it works: Flushing metadata before data would be a security concern (consider e.g. the metadata change of increasing a file's length due to an append before the data change itself), so file systems usually only ever do the opposite, which is safe.

Getting back zeroes after a metadata sync (which must follow a data sync) would accordingly be an indication of something weird having happened at the disk level: We'd expect to either see no data at all, or correct data, but not zeroes or any other file's or previously written stale data.

Filligree · on Nov 22, 2023

The file isn't stored contiguously on disk, so that would depend on the implementation of the filesystem. Perhaps the size of the file can be changed, without extents necessarily being allocated to cover the new size?

I seem to vaguely recall an issue like that, for ext4 in particular. Of course it's possible in general for any filesystem that supports holes, but I don't think we can necessarily assume that the data is always written, and all the pointers to it also written, before the file-size gets updated.

lxgr · on Nov 22, 2023

At least for ext4 and actually written data (i.e. not ftruncate’d files), I believe zeroes should really not occur.

Both extents and the file size are metadata as far as I understand, which would be atomically updated through the journal.

Data can be written before metadata (in data=ordered mode):

> All data are forced directly out to the main file system prior to its metadata being committed to the journal.

Filligree · on Nov 22, 2023

Is data=ordered the default?

lxgr · on Nov 22, 2023

As far as I can tell it's always been on both ext3 and ext4, yes.

fulafel · on Nov 22, 2023

I think there could semi-reasonably be case for the zero bytes appearing if the fs knows there should be something written there, and the block is has been allocated, but no write yet. Then it's not compromising confidentiality to zero the allocated block when recovering the journal when the disk is mounted. But the zero byte origin doesn't seem to be spelled out anywhere so this is just off the cuff reasoning.

colanderman · on Nov 22, 2023

The file's size could have been set by the application before copying data to it. This will result in a file which reads all zeroes.

Or if it were a hardware ordering fault, remember that SSD TRIM is typically used by modern filesystems to reclaim unused space. TRIMmed blocks read as zero.

lxgr · on Nov 22, 2023

> The file's size could have been set by the application before copying data to it. This will result in a file which reads all zeroes.

Hm, is that a common approach? I thought applications mostly use fallocate(2) for that if it's for performance reasons, which does not change the nominal file size.

Actually allocating zeroes sounds like it could be quite inefficient and confusing, but then again, fallocate is not portable POSIX.

> Or if it were a hardware ordering fault

That's what I suspect might be going on here.

colanderman · on Nov 25, 2023

fseek to the (new) end and back. Avoids metadata updates on every write. Not sure how common it is but it does occur.

Dylan16807 · on Nov 22, 2023

Ext3 will totally let you expose yourself to those security issues. I'm not sure about ext4.

lxgr · on Nov 22, 2023

Only in data=writeback mode, which is not the default in either ext3 or ext4.

bugfix · on Nov 22, 2023

I had this exact experience with my workstation SSD (NTFS) after a short power loss while NPM was running. After I turned the computer back on, several files (package.json, package-lock.json and many others inside node_modules) had the correct size on disk but were filled with zeros.

I think the last time I had corrupted files after a power loss was in a FAT32 disk on Win98, but you'd usually get garbage data, not all zeros.

dspillett · on Nov 22, 2023

> but you'd usually get garbage data, not all zeros.

You are less likely to get garbage with an SSD in combination with a modern filesystem because of TRIM. Even if the SSD has not (yet) wiped the data, it knows that a block that is marked as unused can be retuned as a block of 0s without needing to check what is currently stored for that block.

Traditional drives had no such facility to have blocks marked as unused from their PoV, so they always performed the read and returned what they found which was most likely junk (old data from deleted files that would make sense in another context) though could also be a block of zeros (because that block hadn't been used since the drive had a full format or someone zeroed free-space).

wannacboatmovie · on Nov 22, 2023

They may be pointing to unallocated space which on a SSD running TRIM would return all zeros. NTFS is an extremely resilient yet boring filesystem, I cannot remember the last time I had to run chkdsk even after an improper shutdown.

Sakos · on Nov 22, 2023

As somebody who worked as a PC technician for a while until very recently, I've run chkdsk and had to repair errors on NTFS filesystems very, very, very often. It's almost an everyday thing. Anecdotal evidence is less than useful here.

dspillett · on Nov 22, 2023

So anecdotal evidence is not useful, as proven by your anecdotal evidence? :)

FWIW I've found NTFS and ext3/4 to be of similar reliability over the years, in general use and in the face of improper shutdown. Metadata journaling does a lot to preserve the filesystem in such circumstances. Most of the few significant problems I've had have been due to hardware issues, which few filesystems on their own will help you with.

It is worth noting that when you run tools like chkdsk or fsck, some of the issues reported and fixed are not data damaging, or structurally dangerous, or at least not immediately so. For instance free areas marked in such a way that makes them look used to the allocation algorithms.

Sakos · on Nov 22, 2023

There's a difference between your personal experience and my experience in a professional role handling multiple customer devices every day.

However, I'm also not making a statement about NTFS's reliability vs ext3/ext4. In the years that I worked in that position, I maybe dealt with Linux systems 3 times.

Reminder of the original statement:

> NTFS is an extremely resilient yet boring filesystem, I cannot remember the last time I had to run chkdsk even after an improper shutdown.

I never said anything about catastrophic failure of an NTFS filesystem. I have experienced that, but it was comparatively rare (still happened though). I have, however, have had to run chkdsk fairly often to correct errors. Sometimes user data was affected to some degree, but it was often a matter of system stability and getting Windows back to running without issues.

I still find that NTFS is reasonably resilient and have no qualms with it. I just want to push back against the idea that nothing ever goes wrong with NTFS, which was implied by your statement that you can't remember the last time you used chkdsk.

dspillett · on Nov 22, 2023

> not making a statement about NTFS's reliability vs ext3/ext4

I should have been a bit clearer there: I only mentioned those specific filesystems because those are the ones I have a lot of experience with, rather than intending to bang the drum for them (or one in favour of the other). I expect other much-tested journaled filesystems could be substituted into the same sentences.

> your original statement

GPP was not me.

matja · on Nov 22, 2023

Journaling filesystems (including NTFS, and ext3/ext4 using default mount options) typically only track file structure metadata in the journal, so that is WAI - the filesystem structure was not corrupted, but all bets are off when it comes to the contents of the files.

chrismorgan · on Nov 22, 2023

I lost Audacity projects due to BSODs on a Surface Book several times in ~2019: the *_data/**.au files were intact, each containing just a few seconds of audio; but the .aup XML file that maps them and contains whatever else makes up the project was all zeroed. My memory’s fuzzy, but I think it was something like exit sometimes triggering the BSOD, and save-on-exit corrupting consistently if it BSODed, and so the workaround was to remember to save first, and then if it BSODs you’re OK.

agbrrw · on Nov 22, 2023

>experience data corruption on crash / power loss

You mean on complete system crash, right? Your application crashing shouldn't lead to files being fulls of zeroes as long as you've already written everything out.

ricardobeat · on Nov 22, 2023

Misleading headline since after testing eight more drives, none more failed.

2/12 is not nearly as dramatic as “half”, and the ones that lost data are the cheap brands as one would expect.

alanfranz · on Nov 22, 2023

You can either not editorialize the title, and accept that the thread contains updates, or editorialize it and violate HN guidelines.

Either choice will lead somebody to complain

wannacboatmovie · on Nov 22, 2023

Clearly they should only editorialize the ones that are wrong.

lmm · on Nov 22, 2023

That doesn't help, HN mods still "correct" it back to the wrong title even in that case.

KennyBlanken · on Nov 22, 2023

So is it the mods who are editing titles of articles to be more sensational and clickbait-y? Because this happens all the damn time, flagging does nothing, etc.

lmm · on Nov 22, 2023

In many cases yes. I don't know if it's an automated or manual process (it seems to happen less when the US is asleep, so I suspect it's manual?), but generally if you submit a link with a title edited to be less sensational and clickbait-y it gets changed to the linked article title.

wannacboatmovie · on Nov 22, 2023

That was sarcasm, as what's "wrong" is often subjective and up to interpretation.

Dylan16807 · on Nov 22, 2023

And it's often not.

wannacboatmovie · on Nov 22, 2023

Then it gets flagged and removed. The end.

Dylan16807 · on Nov 22, 2023

No it doesn't. Good stories with bad titles almost always stay up.

thanhhaimai · on Nov 22, 2023

It's clear from the guidelines: https://news.ycombinator.com/item?id=38365934#38368867

Specifically: use the original title, then express your view as a top level comment. If people agree with it, the comment is naturally voted up.

alanfranz · on Nov 22, 2023

Yes, and the OP did abide by HN guidelines. But GGP implied that the title was misleading; but it was the original title! So, what to do if the original title is misleading?

Admittedly this is a twitter thread, so no "actual title" exists.

devnullbrain · on Nov 22, 2023

>So, what to do if the original title is misleading?

Complain about it in a comment so that people who don't click the link are aware it's misleading. It's not attacking OP.

_Algernon_ · on Nov 22, 2023

>So, what to do if the original title is misleading?

I'd suggest posting an alternative, non-misleading source instead, or if none exists, don't post at all.

andybak · on Nov 22, 2023

Then this very interesting piece would never never come to my attention. Which is surely the worst outcome.

6510 · on Nov 23, 2023

Best is "you wont believe what happened when I tested NVMe SSDs"

walteweiss · on Nov 22, 2023

Oh yeah, I bet something dramatic would happen if neither of us would see it.

alanfranz · on Nov 22, 2023

Personally, I _do_ editorialize by posting a “summary” of what a post discusses, if there’s no real title. I try not to make it pass as my own opinion; sometimes it happened that mods reverted it to the original, less relevant title.

yorwba · on Nov 22, 2023

The trick is to write the summary from the point of view of the original author, preferably by lifting a quote from the text.

https://hackernewstitles.netlify.app/ has a recent example "Kevin Scott (MSFT CTO) inviting all Open AI employees to join MSFT" which sounds like the title of a third-party article describing said event, but actually directly points to the Microsoft CTO's tweet with the offer https://twitter.com/kevin_scott/status/1726971608706031670 so it got edited to a quote "If needed, you have a role at Microsoft that matches your compensation" which is an attempt to summarize the content but doesn't provide additional context so it might be condidered more clickbaity https://news.ycombinator.com/item?id=38364315

shiroiuma · on Nov 22, 2023

This sounds like the religious idea that "the magical Invisible Hand of the Free Market will ensure that only the best-run businesses survive."

KennyBlanken · on Nov 22, 2023

....except people routinely submit articles with titles edited to be clickbaity and misleading, they get upvoted, and nothing happens.

Shekelphile · on Nov 22, 2023

2/12 is not good, especially if the drives that failed were using off the shelf phison controllers which is basically the entire market besides sandisk/samsung/intel.

SuchAnonMuchWow · on Nov 22, 2023

It doesn't seems to be related to the controller: the same phison controller was in some model that failed and some models that passed.

(see later in the thread: https://twitter.com/xenadu02/status/1496770658750980103)

daneel_w · on Nov 22, 2023

> "... and the ones that lost data are the cheap brands as one would expect."

What a sad world to live in, when one comes to expect cheap storage devices to not fulfill intended function.

wmf · on Nov 22, 2023

SK Hynix is a major brand and the P31 is a great midrange SSD... except for the fact that it seemingly doesn't care about your data.

dspillett · on Nov 22, 2023

> SK Hynix is a major brand

Is it? I passed on an offer for a drive carrying that name, and got something else for slightly more, the other day as I didn't know the name.

Perhaps their noteworthiness varies internationally? Or do they mainly sell to manufacturers rather than direct to the likes of me?

catdog · on Nov 22, 2023

They are even among the manufacturers of actual flash memory chips.

> Or do they mainly sell to manufacturers rather than direct to the likes of me?

This. Think they mostly sell OEM SSDs under their name, if you buy a laptop or pre built system from a major manufacturer chances are not that low that you find a SK Hynix SSD in there.

ricardobeat · on Nov 22, 2023

I have a Sabrent M2 in my own PC, bought it because it was the cheapest option. Incidentally I suspect it's the cause of system-wide slowdown in the past few months, even opening the file explorer takes over ten seconds sometimes.

cm2187 · on Nov 22, 2023

To me the real thing missing is whether those drive advertise power loss protection or not. The next question is whether they are to be used in a laptop where power loss protection is less relevant given the local battery.

jtwaleson · on Nov 22, 2023

That should be irrelevant, because flush is flush right? If your SSD does not write the data after a flush it's violating basic hard drive functionality.

HankB99 · on Nov 22, 2023

GP has a point in that a laptop is much less likely to experience unplanned power loss because of the built in battery. However that does not help desktops which also use NVME SSDs.

The other problem is that this doesn't only cause problems resulting from power loss. At least some files systems guarantee consistency of data on the drive by flushing at critical points. ZFS on Linux does this. If the flush doesn't happen as promised, subsequent writes could result in corrupted files should something else like a crash interrupt operation.

IngvarLynn · on Nov 22, 2023

There is a flood of fake SSDs currently, mostly big brands. I've recently purchased counterfeit 1TB. It passes all the tests, performance is ok, it works... except it gets episodes where ioping would be anything between 0.7 ms and 15 seconds, that is under zero load. And these are quality fakes from a physical appearance perspective. The only way I could tell mine was fake is that the official Kingston firmware update tool would not recognize this drive.

loeg · on Nov 22, 2023

Where are you seeing counterfeits? AliExpress, Ebay, Amazon?

Shekelphile · on Nov 22, 2023

Probably chinese sellers on all those sites. I've noticed a common thread with people who complain about counterfeits is that they're literally buying alphabet soup brand fakes from chinese FBA sellers instead of buying products directly sold by amazon or from more traditional retail channels.

vineyardmike · on Nov 22, 2023

There’s definitely a problem with my grandma or some less-technically educated person buying “alphabet soup” fakes, BUT Amazon does commingle inventory. This means that lots of people can end up with fakes sold by 3rd parties when buying from a reputable brand.

There were even stories of those crazy coupon people reselling on Amazon, and some cases of returned retail products ending up as “new” on Amazon. Which gets problematic with certain things like consumables (the WSJ did an article on toothpastes iirc).

alias_neo · on Nov 22, 2023

> alphabet soup brand fakes from chinese FBA sellers instead of buying products directly sold by amazon

Does this actually make a difference? I remember the issue was that Amazon would bin devices together regardless if they're from some random third-party or direct sale, so you could have fakes mixed in with genuine and it was basically a lucky dip.

Is this not still the case?

Ultimately, I'd be weary of buying things like this from Amazon and as you suggest go to a more traditional retail channel instead.

Shekelphile · on Nov 23, 2023

Amazon only comingles inventory among FBA sellers, and gives each an option to opt-out from it if they want to. They never comingle 'sold by amazon.com' items. In cases where I've bought amazon.com items and not from FBA sellers I never received bad products, and I've easily bought dozens of SD cards, flash drives, SSDs, etc.

traceroute66 · on Nov 22, 2023

> I've noticed a common thread with people who complain about counterfeits is that they're literally buying alphabet soup brand fakes from chinese FBA sellers instead of buying products directly sold by amazon

AMEN to that !

And the most annoying thing is that those of us who know to avoid FBA is that Amazon have removed the "sold by Amazon" search filter tick-box.

So whilst in the past you could tick a box and be presented with a list of products which are direct-sold rather than FBA, you cannot do that anymore.

According to some Reddit posts, you can still do it if you hack the URL and add an "emi=$obscure_value" GET-param. But I'm guessing sooner or later Amazon will kill this work-around too.

JonChesterfield · on Nov 22, 2023

Sold by amazon means "taken out of a box containing fakes and maybe real products". If that's your gamble, may as well buy the fake directly at lower cost.

alanfranz · on Nov 22, 2023

Did you get the fake in an official box? Or OEM version? This is quite a big claim.

iamacyborg · on Nov 22, 2023

It doesn't strike me as being a big claim, I recently bought some RAM for a NUC a few weeks ago on Amazon only to determine that it was likely counterfeit. It came in an official box with all packaging intact.

slowmotiony · on Nov 22, 2023

Then how did you determine it was fake?

iamacyborg · on Nov 23, 2023

I installed it in my system after which it had severe stability issues.

Running dmidecode showed that the part number didn’t match the sticker on the module.

vladvasiliu · on Nov 22, 2023

That's interesting. I have a Samsung 990 pro bought on Amazon and have the random lags. I've only noticed it in the terminal, so I figured something else may be the culprit. Never went to 15 secondes, but it can be around 1s.

The Samsung Magician app on Windows reports it as "genuine" and it was able to apply two firmware updates. The only thing it complains about is that I should be using PCIE 4 instead of 3, but I can't do anything about that.

thargor90 · on Nov 22, 2023

I have been able to fix these random lags by doing multiple full disk reads. The first one will take very long, because it will trigger these lags. Subsequent ones will be much better.

The leading theory I have read is that maintenance/refreshing on the ssd is not done preventative/correctly by the firmware and you need to trigger it by accessing the data.

vladvasiliu · on Nov 22, 2023

I'm going to try that, but I have little hope since this happened ever since the drive was brand new.

schainks · on Nov 22, 2023

If you dig at the vendor data stored on the drive firmware, fakes are easy to spot. Model numbers, vendor ID, and serial numbers will be zero’d out or not conforming to manufacturer spec.

I purchased a bunch of fake kingston SD cards in China that worked well enough for the price, but crapped out within a year of mild use. I didn’t lose data. It was as if one day they worked. Then one day they were fried.

jwells89 · on Nov 22, 2023

That’s wild. Is this limited to specific distribution channels or can you get them from anywhere?

archsurface · on Nov 22, 2023

How do you conclude from a single drive that there is a flood?

kristopolous · on Nov 22, 2023

Under long term heavy duty, I've routinely seen cheap modern platter outperform cheap brand name NVME.

There's some cost cutting somewhere. The NVMEs can't seem to sustain throughput.

It's been pretty disappointing to move I/O bound workloads over and not see notable improvements. The magnitude of data I'm talking about is 500-~3000GB

I've only got two NVME machines for what I'm doing so I'll gladly accept that it's coincidentally flaky bus hardware on two machines, but I haven't been impressed except for the first few seconds.

I know Everyone says otherwise which is why I brought it up. Someone tell me why I'm crazy

Edit: no, I'm not crazy. https://htwingnut.com/2022/03/06/review-leven-2tb-2-5-sata-s... this is similar to what I'm seeing with Crucial and Adata hardware, almost binary performance

dspillett · on Nov 22, 2023

For write loads this is expected, even for good drives, at some level. They tend to have some faster storage which takes your writes and the controller later pushes the changes to the main body of the drive. If you write in bulk the main, slower, portion can't keep up so the faster cache fills and your write has to wait and will perform as per the slowest part of the drive. Furthermore: good drives tend to have an amount of even faster DRAM cache too, so you'll see two drop-offs in performance during bulk write operations. For mainly read based loads any proper SSD¹ will outperform a traditional drive, but if your use case involves a lot of writing³ you need to make more careful choices⁵ to get good performance.

I can't say I've ever seen a recent SSD (that isn't otherwise faulty) get slow enough to say it is outperformed by a traditional drive, even just counting the fastest end of the disk, but I've certainly seen them drop to around the same speed during a bulk write.

--

[1] unlike this sort of thing: https://www.tomshardware.com/news/low-performance-external-m...

[2] get SLC-only⁴ drives, not QLC-with-SLC-cache or just-QLC, and so forth

[3] bulk data processing tasks such as video editing are where you'll feel this significantly, unless your number-crunching is also bottlenecked at the CPU/GPU

[4] SLC-only is going to be very expensive for large drives, even high-grade enterprise drives tend to be MLC-with SLC-cache. SLC>MLC>TLC>QLC…

[5] this can be quite difficult in the “consumer” market because you'll sometimes find a later revision of the same drive having a completely different memory and/or controller arrangement despite the headline model name/number not changing at all – this is one reason why early reviews can be very misleading

efxhoy · on Nov 22, 2023

I think cheaper QLC chips use a part of their storage space as SLC, which is fast to write. But once you’ve written the fast part that fits in the SLC cache write throughput quickly tanks as it has to push the data further in to the slower QLC parts.

kristopolous · on Nov 22, 2023

Yeah I guess it works well for how most people use computers which is not actually for computation...

Modern platter is actually pretty decent and cheap. It's probably still the way to go for large loads unless you have a grove of money trees

nijave · on Nov 22, 2023

I used to use an HP EX920 for my system drive and it was abysmally slow at syncs. I'd open Signal and the computer would grind to a halt while it loaded messages from group chats. After much debugging, I found out Signal was saving each message to sqlite in a transaction causing lots of syncing.

I found some bash script that looped and wrote small blocks synchronously and the HP EX920 was like 20 syncs/sec and my WD RE4 spinner was around 150. Other SSDs were much faster (it was a few years ago so can't remember the exact numbers)

Sakos · on Nov 22, 2023

1) Nobody says otherwise about cheap anything NVMe. They're pretty terrible once they've exhausted the write cache. This is well-known and addressed in every decent review by reputable sites.

2) Sustaining throughput seems the least of our problems when some unknown number of NVMe SSDs might be literally losing flushed data.

kristopolous · on Nov 22, 2023

Is this expected with say Samsung evo 9X0 pro? Or is there another tier above consumer level gear? Is there something I should go with?

vladvasiliu · on Nov 22, 2023

I don't know about the 950 pro specifically, but when I bought my 980 pro I looked into this, and it seemed that this drive does have a drop in write speed after a while (can't remember how long) but "low speed" wasn't that low. Again, don't remember specifics, but it was above 1 GB/s. Other drives fared much worse, with a drop coming in sooner and going lower.

Depending on your needs this can be an issue.

For me, using this drive for random "office" work, I figured I'd never feel it in practice. This drive is supposed to support PCIE 4, but my laptop only does 3. This also "helps", since it won't fill whatever cache it uses as quickly. In practice, it was able to write 100 GB at the top speed. Didn't bother to test more. The only time I've ever written that many data at a time was restoring a backup when I bought the drive. Since my backup was on 2.5" spinning rust, it wasn't an issue.

kristopolous · on Nov 22, 2023

Thanks. I guess I can't really blame them for cutting corners on a use case that 99.9% of their customers never see.

I'd probably advocate the same.

vladvasiliu · on Nov 22, 2023

I think it's important to be able to have a rough idea of how you'd be using this. If this drop in performance means I could use a cheaper drive, I'm all for it.

But some drives are truly awful. My work laptop came with a cheap Samsung drive that would quickly drop to around 200 MB/s. At first, I thought I had somehow badly configured Linux or something (I'm running zfs with native encryption).

Then I went and checked my desktop running quite worn-out SATA ssds from ~2012 (840 evo) and those drives would wipe the floor with the NVMe in write performance. They wouldn't go below 400 something MB/s until almost full. Same kernel version and zfs config.

It would seem that this is quite common behavior in cheap drives. But I guess that if all you do is browse the web, write mails in outlook and type the occasional word document, you're still ahead of spinning rust for the durability (it won't break if you drop your laptop) and for the latency.

kristopolous · on Nov 22, 2023

In my use case I'm talking about arrays of them, think 8. RAID can parallelize platter really nicely in that configuration. But on modern ram/cpu (epyc 9654s), you'll still see the disk dragging you down. NVME drags me down more.

Maybe the key is a bunch of small ones. Like 20 512GB modules... That may be brilliant. It's way cheap

kaliszad · on Nov 22, 2023

Or just get some Kioxia CD8 or CD8R or similar from Samsung/ Solidigm/ Micron depending on what you need. It will be much faster than spinning rust in all situations I am quite sure. Decent SSDs have a huge latency advantage compared to even the best HDDs just out of principle. Enterprise SSDs focused on read-write workloads can sustain decent performance even under continuous 100% load. For example: https://apac.kioxia.com/en-apac/business/ssd/data-center-ssd...

Sakos · on Nov 22, 2023

For an example of an SSD review that shows a sustained write test:

https://www.tomshardware.com/reviews/samsung-980-m2-nvme-ssd... (https://cdn.mos.cms.futurecdn.net/mzdXcBUJUuxkbYfmQigqQE-970...)

In a 900 second timeframe, the 980 Pro drops down to a bit above 1GB/s, while the worst is somewhere in the tens of MB/s. So there are significant differences in SSD performance profiles which you can only find out about from reviews like this.

Similar results here:

https://www.anandtech.com/show/16504/the-samsung-ssd-980-500...

vladvasiliu · on Nov 22, 2023

This was the kind of review I was talking about. But you should also look at the specific capacity you're after. IIRC for the 980 non pro the 1 TB is better than the smaller ones.

vladvasiliu · on Nov 22, 2023

That sounds like you might be having quite specific needs. Some reviews have graphs showing how the drives behave in various situations, so that could help you.

However, what I've found perusing those reviews is that there's a huge price gap between this class of drives (9x0 pro, wd 8xx and the like) and "enterprise" drives which seem to have more stable performance.

kristopolous · on Nov 22, 2023

Right. I'm just an unemployed guy with expensive hobbies

MonaroVXR · on Nov 23, 2023

>Under long term heavy duty, I've routinely seen cheap modern platter outperform cheap brand name NVME.

Saw this happen with previous job. I upgraded several Windows devices too Windows 10 and the fastest PC was a Dell desktop with a HDD.

Midrange to lower-mid laptop coupled with low-end SSD's.

kmxdm · on Nov 22, 2023

Writes are completed to the host when they land on the SSD controller, not when written to Flash. The SSD controller has to accumulate enough data to fill its write unit to Flash (the absolute minimum would be a Flash page, typically 16kB). If it waited for the write to Flash to send a completion, the latency would be unbearable. If it wrote every write to Flash as quickly as possible, it could waste much of the drive's capacity padding Flash pages. If a host tried to flush after every write to force the latter behavior, it would end up with the same problem. Non-consumer drives solve the problem with back-up capacitance. Consumer drives do not have this. Also, if the author repeated this test 10 or 100 times on each drive, I suspect that he would uncover a failure rate for each consumer drive. It's a game of chance.

gumby · on Nov 22, 2023

The whole point of explicit flush is to tell the drive that you want the write at the expense of performance. Either the drive should not accept the flush command or it should fulfill it, not lie.

(BTW this points out the crappy use of the word “performance” in computing to mean nothing but “speed”. The machine should “perform” what the user requests — if you hired someone to do a task and they didn’t do it, we’d say they failed to perform. That’s what’s going on here.)

kmxdm · on Nov 22, 2023

The more dire problem is the case where the drive runs out of physical capacity before logical capacity. If the host flushes data that is smaller than the physical write unit of the SSD, capacity is lost to padding (if the SSD honors every Flush). A "reasonable" amount of Flush would not make too much of a difference, but a pathological case like flush-after-every-4k would cause the SSD to run out of space prematurely. There should be a better interface to handle all this, but the IO stack would need to be modified to solve what amounts to a cost issue at the SSD level. It's a race to the bottom selling 1TB consumer SSDs for less than $100.

nolist_policy · on Nov 22, 2023

I still don't think this is the problem, the drive can just slow down accepting writes until it has reclaimed enough space.

The bigger problem is manufacturers chasing the performance. Generally you get the feeling they just hit their firmware with a hammer so it barely doesn't break NTFS.

See also the drama around btrfs' "unreliability", which is all traced back to drives with broken firmware. I fully expect bcachefs will get exactly the same problems.

kmxdm · on Nov 22, 2023

Yeah, but then you have a write amplification problem. Padding is write amplification from the start, and then GC is invoked many more times than it otherwise would be. There is a fundamental problem with (truly) flushing an IO that is smaller than the media write unit. It will cause problems if "abused." The SSD either needs to take on the cost of mitigation (e.g. caps) or it needs some way to provide hints to the host that don't exist today.

nijave · on Nov 22, 2023

I'm curious what the real cost of power protection is for the manufacturer. Adding caps or a bit of backup power seems like it _should_ be a fairly cheap compromise to maintain performance without lying about persistence

kmxdm · on Nov 22, 2023

M.2/2280 makes it hard. Can't use cheaper aluminum can capacitors due to their size/height. The low profile (tantalum?) capacitors are expensive and take up a lot of PCB area, forcing a two sided PCB design on 2280 (the 110mm version would be better here). M.2 only provides 5V. On other form factors you get 12V and can get more charge stored for the same capacitance (q=CV) without needing a DC-DC converter.

justsomehnguy · on Nov 22, 2023

define 'cheap'?

You already have the device shipped to you under $40 with packaging, shipping from China, to the distributor, to you; with a profit at the every step.

nijave · on Nov 23, 2023

$1-5

I'd gladly pay a few bucks extra for a drive that doesn't delete my data when the power fails

justsomehnguy · on Nov 23, 2023

> $1-5

For a $20 device this is 5-25% of the cost the product, assuming $40 retail cost. It would be 1-5% for a $100 product, of course.

You would pay for a such device, but 99% of people wouldn't. Now you have a product which costs 5-25% higher than all your competitors and in the world where the price dictates the sales you have no opportunity to sell your product.

NB to be sure on the prices I've checked Amazon. There are SSDs what are cheaper than many mounting trays, enclosures (without drives) and tray kits.

nijave · on Nov 27, 2023

I'm not so sure price is the only competing factor for SSDs. Devices generally also differentiate on performance as well. Even on consumer-oriented Slickdeals forum (where people are posting deals) there's pretty extensive breakdowns comparing different SSDs (controllers, flash, cache design). This has been floating around for a while https://www.reddit.com/r/NewMaxx/comments/dhvrdm/ssd_guides_...

Whether consumers would actually pay more for a device with power protection is unclear.

justsomehnguy · on Nov 27, 2023

Oof. Im in the bar RN, if you interested - leave a note, I would try to explain what I meant

Hizonner · on Nov 22, 2023

> so it barely doesn't break NTFS.

That's very optimistic of you.

gumby · on Nov 22, 2023

If it can’t honor the flush it should return an error.

nippoo · on Nov 22, 2023

This is the whole point of a FLUSH though. You expect latency penalties and worse performance (and extra pages) if you flush, but that's the expected behaviour: not for it to (apparently) completely disregard the command while pretending like it's done it.

donmcronald · on Nov 22, 2023

> Non-consumer drives solve the problem with back-up capacitance.

I’m pretty sure they used to be on consumer drives too. Then they got removed and all the review sites gave the manufacturer a free pass even though they’re selling products that are inadequate.

Disks have one job, save data. If they can’t do that reliably they’re defective IMO.

nolist_policy · on Nov 22, 2023

> If a host tried to flush after every write to force the latter behavior, it would end up with the same problem.

So? No reason to break the contract that flush makes all submitted writes durable. The drive can compact space in the background.

kmxdm · on Nov 22, 2023

Yes, GC should be smart enough to free up space from padding. But then there's a write amplification penalty and meeting endurance specifications is impossible. A padded write already carries a write amplification >1, then GC needs to be invoked much more frequently on top of that to drive it even higher. With pathological Flush usage, you have to pick your poison. Run out of space, run out of SSD life.

loloquwowndueo · on Nov 22, 2023

Twitter yuk, can somebody just post the names of the four tested drives and which passed/failed please?

RankingMember · on Nov 22, 2023

imo hackernews should just automatically replace twitter.com with nitter.net even if just for readability without logging in's sake:

https://nitter.net/xenadu02/status/1495693475584557056

matheusmoreira · on Nov 22, 2023

And reddit links with old reddit.

neoecos · on Nov 22, 2023

The Harmonic HN reader for Android does this

arbitrandomuser · on Nov 22, 2023

Uhm nope , atleast not the one on playstore , is it there somewhere in the settings ?

arbitrandomuser · on Nov 22, 2023

Oooh yes , found it in the settings

usr1106 · on Nov 22, 2023

We should just learn to ignore people still posting on Twitter (which is called something else now).

andybak · on Nov 22, 2023

I'm still posting on Twitter because the people I'm trying to reach are still there and the alternatives never materialised.

WendyTheWillow · on Nov 22, 2023

Then you are still part of the problem.

Besides, if you're trying to reach the people who remain on Twitter, that says a decent amount about you.

antiloper · on Nov 22, 2023

This is the only correct answer, if you don't like the boss of a social media company, you should just cut all ties with everyone on there. Then lock yourself into your room, never go outside again, and complain about it on HN.

WendyTheWillow · on Nov 22, 2023

Or you could have a spine and move over to a platform not run by a billionaire antisemite.

andybak · on Nov 22, 2023

Fix everything that makes Mastodon a terrible user experience, get me and all the people I want to reach an invite to Bluesky or persuade people to start using Threads, and I'll happily switch.

andybak · on Nov 22, 2023

"users of my application".

You seem to be claiming that everyone still on Twitter is ideologically compromised. There's a ton of people just ignoring the politics and I still need a channel to reach them.

WendyTheWillow · on Nov 22, 2023

The politics are all that are left; it's impossible to use Twitter and not participate on some level.

For example, Twitter is now called X. Surely they noticed that.

andybak · on Nov 22, 2023

I can't take this seriously. Sorry.

WendyTheWillow · on Nov 22, 2023

Yet again monetary selfishness wins over principle.

Maybe you should stay on Twitter.

Edit: Based on your website, who exactly are you trying to reach? Prospective clients? On Twitter?

You say "users of your app", but it seems like that isn't a huge volume of people, and were you to move to Bluesky, they would almost certainly follow... Unless it's not worth it for them?

andybak · on Nov 22, 2023

You're looking at my personal website. And I'm really not engaging any more.

WendyTheWillow · on Nov 22, 2023

Serious question; what responsibility do you have as a consumer of a product, if any, beyond yourself?

gsich · on Nov 22, 2023

>For example, Twitter is now called X. Surely they noticed that.

So what?

WendyTheWillow · on Nov 22, 2023

So the claim that people using Twitter are largely unaware of the substantial changes is an inaccurate one.

gsich · on Nov 23, 2023

twitter.com is still working.

WendyTheWillow · on Nov 25, 2023

That’s not what I said.

gsich · on Nov 26, 2023

Irrelevant, write clearly or others will interpret it differently than you intended.

Twitter now being called X can go undetected by users, simply because everything still works the same as before the non-completed namechange.

CaptWillard · on Nov 22, 2023

This kind of rhetoric had a good run, but you should understand it is now (solidly) a net loss in persuasion.

WendyTheWillow · on Nov 22, 2023

The attempt is not persuasion, but alienation. This is the fate of people who behave as he does, and I believe will contribute (to the extent any one person can) in deterrence of others from trying to normalize continued Twitter usage in light of its antisemitic shift in tone.

CaptWillard · on Nov 22, 2023

Alienation from what? People who uncritically parrot fabrications and propaganda on HN?

THAT is what shouldn't be "normalized".

But since you brought it up, of all the indicators of societal descent over the last 7-10 years, the shift away from critical analysis and open dialogue to soviet-level information control and behavior/thought policing is most troubling.

I use Twitter daily because it's the only platform where I have a chance to hear the whole truth on a given subject, usually assembled from multiple sources. I follow a lot of actual leftists as well as the conservatives and liberals-in-exile that make up the modern "right".

I don't know that I follow anyone with whom I agree on every topic, and that's as it should be IMO. Even when I strongly disagree, I'm grateful for the freedom to hear and evaluate these voices.

And while I'm reluctant to engage with your "argument", I'll say that I've seen a handful of antisemitic posts over 10+ years on Twitter, none recently. I have seen a fair amount of anti-Israel and anti-Zionist content in the last six weeks, consistently from those on the left and consistent with what I've seen in demonstrations around the world.

WendyTheWillow · on Nov 22, 2023

The continued use of Twitter ought to be met with disdain by anyone with a modicum of integrity. It is each person’s duty to make that clear to those who seem to lack the ability to self regulate, morally.

You don’t get to support a hateful platform consequence free.

saagarjha · on Nov 22, 2023

This is from last year.

lapsis_beeftech · on Nov 22, 2023

Is nitter.net still functional for you? The service was accessible earlier this year from my location but now it is all 403 Forbidden.

automatic6131 · on Nov 22, 2023

nitter.net is, the mirrors are not.

iirc, twitter turned off all anonymous access, unless you come from a search engine, then you get a limit number of requests. So zedeus came up with an idea to make a massive pool of search engine'd api tokens and use those to keep nitter up. The mirrors would have to copy that idea, and few (0?) have atm.

rft · on Nov 22, 2023

The current state is to use guest accounts [1]. If you use that branch instead of master and get a collection of accounts, you can run your own instance. For my personal use instance this is working fine with the initial set of accounts I put in. No idea how long this will work and/or when it is going to get merged, it is a moving target.

I think the token workaround you mentioned is the old way that no longer works, but I am not sure.

[1] https://github.com/zedeus/nitter/pull/985

geosh · on Nov 22, 2023

This is from February 2022. This should be in the title.

temp112123 · on Nov 22, 2023

The models that never lost data: Samsung 970 EVO Pro 2TB and WD Red SN700 1TB

Correction: “Plus” not “Pro”. Exact model and date codes:

Samsung 970 Evo Plus: MZ-V7S2T0, 2021.10 WD Red: WDS100T1R0C-68BDK0, 04Sept2021

Update 2: models that lost writes: SK Hynix Gold P31 2TB SHGP31-2000GM-2, FW 31060C20 Sabrent Rocket 512 (Phison PH-SBT-RKT-303 controller, no version or date codes listed)

sundvor · on Nov 22, 2023

Thanks, the 970 Evo Plus is a solid unit. I have one in my old PC, along with a 980 Pro. Went 990 Pros on my new, they didn't go 30 minutes before I had the firmware updated. :-) Very good performers.

(I always keep an eye on things like firmware updates).

Double_a_92 · on Nov 22, 2023

We should just say X. Not "X (formerly Twitter), not X-itter, or anything. It's called X, if that name is stupid and unclear that's on them.

eviks · on Nov 22, 2023

It's also on you since you have a choice re which name to use

CoastalCoder · on Nov 22, 2023

Does advertising a product as adhering to some standard, but secretly knowing that it doesn't 100%, count as e.g. fraud? I.e., is there any established case law on the matter?

I'm thinking of this example, but also more generally USB devices, Bluetooth devices, etc.

sofixa · on Nov 22, 2023

> there any established case law on the matter

Always makes me laugh.

Anyways, not in the US where you're probably asking for, but yes, the vast majority of the developed word has that. It's called "false advertising", and exists at least in the EU, Australia, UK. You can't put a label on your product or advert that is false or misleading.

So if the box says this is a WiFi6E router, but it's actually only 5 because it's using the wrong components to save on costs, you can report them to the relevant authority and they'll be fined (and depending on the case and scenario you get compensation). The process is a harder bordering on the impossible if you bought from AliExpress from a random no name vendor though, but as long as the vendor or platform or store exists in the country with the sensible regulation you can report it.

vineyardmike · on Nov 22, 2023

That’s not really what the commenter was asking. That’d be false advertising in the US too.

I think the question is less “if they skip on parts and lie” and more along the lines of incompleteness. Like “it’s an HTTP server, but they saved on effort and implement put as post, which works fine for most of use cases”.

That said, I’d guess this would be a pretty hard case to win. The law typically requires intent when false advertising, so if they didn’t know they didn’t follow the spec they might be fine. And it depends on the claims and what the consumer can expect. Like, if you deliberately don’t say explain the exact spec your SSD complies with, and you make no explicit promises of compatibility, it’s a harder win. Like I bet few SSD manufacturers will say “Serial ATA v3.5 (may 2023) tested and compatible with OpenXFS commit XYZ on Debian Linux running kernel version 4.3.2”. But if they say “super fast SSD with a physical SATA cable socket”, then what really was false if it doesn’t support the full spec?

yjftsjthsd-h · on Nov 22, 2023

I was under the impression that a lot of off-brand USB devices didn't use the USB logo specifically to get around certification requirements. Basically, they just aren't advertising adherence to a standard. No idea about NVMe or BT.

adhesive_wombat · on Nov 22, 2023

The USB trident logo doesn't need certification. The other, usually coloured, logos do need certification.

https://ftdichip.com/wp-content/uploads/2020/08/TN_114_USB-D...

https://www.usb.org/sites/default/files/usb-if_original_logo...

vineyardmike · on Nov 22, 2023

I’ve similarly seen a lot of “TF card reader slots” instead of “micro SD card” slots.

yjftsjthsd-h · on Nov 22, 2023

Actually, I think that one has technical merit - the SD card spec includes non-storage IO - https://en.wikipedia.org/wiki/SD_card#SDIO_cards - that isn't supported by slots that only care about storage. At least, that's my understanding of the situation.

hsbauauvhabzb · on Nov 22, 2023

Hardware vendors are known to swap to cheaper lower performance hardware after reviews are out which in my eyes is fraud, whether or not the law agrees is a different story.

matheusmoreira · on Nov 22, 2023

Please name the vendors that do this so I can avoid them.

kawsper · on Nov 22, 2023

LTT documented it happening to at least: ADATA, Kingston and PNY in this video: https://www.youtube.com/watch?v=K07sEM6y4Uc

FridgeSeal · on Nov 22, 2023

Samsung was caught doing this with their 970 pro( plus? Their naming is awful) - they swapped out the controller in a good portion of devices which resulted in significantly lower read and write performance.

lxgr · on Nov 22, 2023

Not a lawyer, but I doubt it – otherwise you might have a case against Intel and AMD regarding Spectre and Meltdown?

It might be a different story if the spec was intentionally violated, though (rather than incidentally, i.e. due to an idea that should have been transparent/indistinguishable externally but didn't work out).

KennyBlanken · on Nov 22, 2023

"Oops we didn't mean to do that" isn't a defense from liability for product not doing what you told the purchaser it would.

It's their responsibility to do develop the product correctly, do QA, and if a defect is found, advise customers or stop selling the defective goods.

The greatest scam the computer industry pulled was convincing people that computers are magical, unpredictable devices that are too complex for the industry to be held responsible for things not working as claimed.

lxgr · on Nov 22, 2023

To be fair, I’m not even sure if the majority of consumers would prefer a “power outage safe” SSD that is however significantly slower over the alternative.

I do agree that there should be transparency, though: Label it “turbo mode”, add a big warning sticker (and ideally a way to opt out of it via software or hardware), but don’t just pretend to be able to have the cake and eat it too.

KennyBlanken · on Nov 22, 2023

Merchantability and implied fitness? You absolutely could try sue the in small claims court for damages.

For extra fun: if the box carries a trademark from a standards group, you could try adding them into the suit; use of their trademarked logo could be argued to be implied fitness, if there are standards the drive is supposed to meet to use it.

At the very least they might get tired of the expense of the expense of sending someone to defend the claim, and it would cease to be profitable to engage in this scammery.

nraynaud · on Nov 22, 2023

I don't think it's even implied fitness. Declaring you support SCSI commands is probably a direct advertisement of conformance.

nraynaud · on Nov 22, 2023

I would probably use stronger words than that, data persistence is a big deal, so the missing part of the spec is a fundamental flaw. What's a disk whose persitence is random? You can probably legally assail the substance of the product.

teo_zero · on Nov 22, 2023

For IT products, I doubt it. For sectors where regulation is more mature of course: take food, automotive, etc.

wmf · on Nov 22, 2023

I wouldn't say fraud but this issue should trigger a recall.

nraynaud · on Nov 22, 2023

I think it's more or less the same thing, the recall is the way to legally prove you didn't intend to disseminate the flawed product, wheras leaving it on the market after learning of the problem shows intent to keep it there. I would be surprised if a discovery at those companies would not surface an email form engineers discussing this problem.

handedness · on Nov 22, 2023

Previous Discussion: https://news.ycombinator.com/item?id=30419618

sashk · on Nov 22, 2023

This is (2022).

Wondering if anything changed since the original tests...

throw0101b · on Nov 22, 2023

> Wondering if anything changed since the original tests...

You're wondering if firmware writers lie to layers higher up in the stack? I think it's a 100% certainly that there's drive firmware that lies.

There's a reason why many vendors have compatibility lists, approved firmware versions, and even their "own" (rebranded from an OEM) drives that you have to buy if you want official support (and it's not entirely a money grab: a QA testing infrastructure does cost money).

Tarball10 · on Nov 22, 2023

I'm curious whether any of the brands which failed this test owned up to the issue and released firmware updates.

arglebargle123 · on Nov 22, 2023

Meanwhile I'm over here jamming Micron 7450 pros into my work laptop for better sync write performance.

I have very little trust in consumer flash these days after seeing the firmware shortcuts and stealth hardware replacements manufacturers resort to to cut costs.

CobaltFire · on Nov 22, 2023

Have a solid vendor for these that isn't insanely priced (for home use)? The last couple I tried to buy they sent 7300's and tried to buy me off with a small refund (eBay).

jauntywundrkind · on Nov 22, 2023

Losing flushes is obviously bad.

I wonder how much perf is on the table in various scenarios when we can give up needing to flush. If you know the drive has some resilience, say, 0.5s of time it can safely writeback during, maybe you can give up flushes (in some cases). How much faster is the app then?

It's be neat to see some low-cost improvements here. Obviously in most cases, just get an enterprise drive with supercapa or batteries onboard. But an ATX power rail that has extra resilience from the supply, or an add-in/pass-through 6-pin sata power supercap... that could be useful too.

invalidator · on Nov 22, 2023

If the write-cache is reordering requests (and it does, that's the whole point), you can't guarantee that $milliseconds will be enough unless you stop all requests, wait $milliseconds, write your commit record, wait $milliseconds, then resume requests. This is essentially re-implementing write-barriers in an ad-hoc, buggy way which requires stalling requests even longer.

Flush+FUA requires the data to be stored to non-volatile media. Capacitor-backed RAM dumping to flash is non-volatile. When a drive knows it has enough capacitor-time to finish flushing all preceding writes from the cache, it can immediately say the flush was completed. This can all be handled on the device without the software having to make guesses at how long something has to be written before it's durable.

supersour · on Nov 22, 2023

Performance gains wouldn’t be that large as enterprise SSDs already have internal capacitors to flush pending writes to NAND.

During typical usage the flash controller is constantly journaling LBA to physical addresses in the background, so that the entire logical to physical table isn’t lost when the drive loses power. With a larger capacitor you could potentially remove this background process and instead flush the entire logical to physical table when the drive registers power loss. But as this area makes up ~2% of the total NAND, that’s at absolute best a 2% performance benefit we are potentially missing out on.

hurryer · on Nov 22, 2023

You could gain much more by coalescing repeated writes to the same address - database scenarios for example

lxgr · on Nov 21, 2023

I guess it's time for `fsync_but_really_actually_sync_it_please(2)` (and the lower level equivalents in SATA, NVMe etc.)?

throw0101b · on Nov 22, 2023

> (and the lower level equivalents in SATA, NVMe etc.)?

This is not a technical problem that needs yet another SATA/SAS/etc command to be standardized. It's a 'social' problem that there's no real incentives for firmware writers to tell the truth 100% of the time.

The best you can hope for is if you buy a fancy-pants enterprise storage solution with compatibility lists and approved firmware versions.

lxgr · on Nov 22, 2023

Well said.

Not even OS vendors are immune to the temptations of higher performance through somewhat relaxed interpretation of interfaces: https://developer.apple.com/library/archive/documentation/Sy...

wmf · on Nov 21, 2023

Buggy firmware could also screw that up.

kragen · on Nov 22, 2023

this is like blaming ftx's collapse on sloppy accounting

wmf · on Nov 22, 2023

Yeah, I am being generous.

kragen · on Nov 22, 2023

it's always been one of your best qualities

tripdout · on Nov 22, 2023

Flushing in this case is from the SSDs internal DRAM cache to the actual NAND flash?