APFS doesn't do proper checksumming on data, and won't mirror or RAID, so it doe...

hyperknot · on April 17, 2017

Isn't it possible that Apple developed this filesystem primarily to be used with their SSD based devices which are using their custom controllers, which simply takes care of this low level protection? SSD controllers are said to be very complicated piece of proprietary technologies, and I believe they are using sophisticated algorithms for integrity protection.

vetinari · on April 17, 2017

Apple users use external (USB, Thunderbolt) drives too, where they would not get this special hypothetical protection.

justinjlynn · on April 17, 2017

Clearly telegraphing Apple's intent to release new "Apple approved (and licensed)" storage "accessories" for their systems.

cmurf · on April 17, 2017

That'd just be going back to their historical roots. You used to only be able to buy Apple RAM and Apple drives. All parts were Apple parts, there wasn't anything else.

And look, we're already back to, you can only get RAM from Apple, because so many of their products come with it soldered on the logic board with no upgrade path.

tedd4u · on April 17, 2017

Maybe, but isn't the recent trend at Apple to discontinue desktop-only accessories (like monitors)?

https://9to5mac.com/2016/11/21/apple-reportedly-stops-develo...

https://www.theverge.com/2016/6/23/12020510/apple-thunderbol...

Analemma_ · on April 17, 2017

At the recent Mac Pro announcement, they also said there would be a new Apple-branded monitor, so they're not out of that game completely.

zeveb · on April 17, 2017

It's courage!

toyg · on April 17, 2017

Lock-in all the things!

justinjlynn · on April 17, 2017

Indeed, sure would be a shame for anything to happen to your data because you chose not to use drives equipped with iStoreSafe(tm) technology by Apple(tm).

kevin_thibedeau · on April 17, 2017

Brings back the days when Mac users were paying a 3x markup for standard SCSI drives.

michaelmrose · on April 17, 2017

This is bad logic. The benefits derived from additional data integrity assurances would be additive.

A special controller isn't a valid substitute for other measures apple hasn't historically been any good at designing filesystems and they aren't any good now.

asveikau · on April 17, 2017

Don't buy the hype or fall into the magical thinking. Disks fail. Apple tried to handwave it away.

coldtea · on April 17, 2017

And yet, neither Windows, nor Linux, nor macOS does anything special about it for consumer/90% of professional use, and the sky hasn't fallen.

michaelmrose · on April 17, 2017

Earlier cars were terrible death traps and we were told crashes at highway speeds were unsurvivable.

People died in mangled messes but the sky didn't fall then either.

The status quo is rarely a sufficient argument because the human race is pretty much terrible at everything improving things only slowly, incrementally accruing useful strategies and procedures.

We built bridges well via centuries of practice, software is less mature.

coldtea · on April 17, 2017

>The status quo is rarely a sufficient argument

And I didn't say it is. My answer was to the parent who singled-out Apple as somehow special in neglecting this.

michaelmrose · on April 17, 2017

They are writing a new filesystem today and neglecting this they are special and not in a good way.

asveikau · on April 17, 2017

Nitpick: I can think of one thing NTFS does proactively about this problem. Just not anything good. Googled and found: https://blogs.technet.microsoft.com/doxley/2008/10/29/self-h...

In my time at Microsoft I did see a number of workarounds for bad disks in Windows source, in ntfs.sys and elsewhere.

However I agree with your overall assessment, it's not as if anyone is running zfs or similar as a default.

toyg · on April 17, 2017

To be honest, I have a long list of friends, especially in creative professions where people struggle with setting up reliable backups. There is a reason Dropbox and other cloud drives were so successful. Any improvement on the data integrity front is very welcome.

dilap · on April 17, 2017

Well, that's disappointing to hear. I've run into multiple MP3s bit-rotting on my MB air over several years. Perhaps soley the fault of HFS+, but trusting the underlying hardware just doesn't seem like a good idea.

beagle3 · on April 17, 2017

Damaged files (that can still be read from disk without error) are more likely than not a result of bad memory. It is incredibly unlikely to have bits mutated on disk without triggering a CRC error -- approx 1 in 4 billion errors would go undetected in a CRC-32; and it has to be spread over more than 32-bits-wide. The fact that you had multiple damaged files basically guarantees that it was a faulty memory (or bus, or controller -- but not actual magnetic media corruption) issue.

ZFS can't really help you with that as the data was likely damaged in transit rather than on disk; though with its own 256 bit hash, it is likely to detect those faulty system components earlier than later.

Freaky · on April 17, 2017

All my memory is ECC (other than my barely-used laptop). Been there, bought the t-shirt, decided non-parity can go jump in a lake more than a decade ago.

http://i.imgur.com/uz2inSy.jpg

I encountered this going through a copy of my photos stored on a pair of WD Greens using NTFS 3-4 years ago. The original copy on a ZFS machine was fine. I found a few others, and promptly stopped using those drives.

Two years ago I had repeated bursts of ZFS checksum errors from a pair of SanDisk SSDs. Evidently TRIM didn't quite work perfectly 100% of the time, and caused data corruption - luckily ZFS was always able to repair it, and it being detected meant I could do something about it early - I updated firmware and the issue went away. Last year it came back after an OS update, and I just turned TRIM off completely (I guess it was sensitive to TRIM patterns and those changed).

Last year I also had a Toshiba HDD forget how to IO properly, and got a constant stream of ZFS checksum errors from it until I yanked it from the hot-swap bay and reinserted it. It resilvered and scrubbed fine.

These aren't the only times I've seen checksum errors and silent corruption, they're just the most recent. ZFS lost a file once, and was very noisy about it - the status message for the lost metadata stayed until I recreated the pool. NTFS, UFS2, ext2, all were completely silent on the fact that they were showing me data that was clearly wrong.

I don't trust disks, or IO controllers, and I don't trust filesystems that do. Neither should you.

_0w8t · on April 17, 2017

I also disable TRIM on all my devices. Rather I set aside 7% of disk space that I never touch. It eliminates the need for TRIM without performance degradation.

pstadler · on April 17, 2017

To mitigate the likeliness of damage during transit, it's recommened to use ECC memory with ZFS. I had FreeNAS running on a system with a faulty memory module. ZFS correctly discovered damage in random files I just transferred over. Luckily, it wasn't too late to replace the bad module and repeat the transfer.

Lessons learned:

- Run MemTest86 on hardware before using it as storage

- Use an FS that does checksums

Unfortunately, the latter doesn't seem to apply to APFS.

voidz · on April 17, 2017

Can you always just buy ECC memory (I have DDR3, I guess it's more affordable now since it's a little bit older), or does something else (cpu, motherboard) need to support it as well?

pstadler · on April 17, 2017

You need a compatible motherboard and CPU.

tw04 · on April 17, 2017

Which is why Intel kinda sucks. It would cost them basically nothing to have ECC enabled on all of their hardware, but they insist on using it as a differentiator between server and desktop parts.

AMD (at least in the past), includes it on all parts so that it's up to the consumer to choose.

rincebrain · on April 17, 2017

While I agree that it'd be nice if Intel didn't use ECC as a server/desktop delimiter, AMD's stance (at least for AM3 and AM4 parts) appears to have been "the CPUs support it, but we haven't run any validation tests on it; if the motherboard manufacturers want to validate it and turn it on, great".[1][2] Which is not quite the same as it being enabled on all parts.

[1] - http://www.hardwarecanucks.com/forum/hardware-canucks-review...

[2] - https://www.reddit.com/r/Amd/comments/5x4hxu/we_are_amd_crea...

tw04 · on April 18, 2017

That absolutely does mean it's enabled on all parts. They also don't validate the chips against FreeBSD - does that mean you can't run FreeBSD on their chips? Or do you think it would be ridiculous to expect them to test scenario's outside of the market they're targeting?

Do you have any idea the cost of running validation tests? I'm not the least bit concerned that they haven't "validated" the ECC functionality. It's enabled, they know it works, it's the same ECC they use on server class chips, and if someone found a bug I have no doubt they'd issue microcode to fix it.

rincebrain · on April 18, 2017

I should have perhaps included the article which tried _enabling and using_ ECC on a Ryzen CPU+MB. [1]

Page 5 is perhaps the most important one, where it observes that neither Windows nor Linux appear to react by halting to a UE, and Windows can't quite figure out that ECC is enabled on the platform and parse the notifications it gets as such.

So, sure, I should concede that it is "enabled" on all parts, I was wrong. But that doesn't mean it should be trusted on any of them.

[1] - http://www.hardwarecanucks.com/forum/hardware-canucks-review...

tw04 · on April 18, 2017

I guess we can agree to disagree. AMD implementing it, motherboard mfgs implementing it, but Windows not having an updated driver to handle it in all situations isn't on AMD. And it doesn't mean it's not there - it means that Windows is lagging slightly behind on a brand new platform. Something that's been fairly common with AMD for decades now. There's a reason the acronym Wintel became a thing.

dmitrygr · on April 17, 2017

Precisely because it is a market segmentation tactic, they do allow ECC on lower end chips which they do not see stealing marketshare from Xeon.

My home media center PC / NAS runs an i3-4370 CPU with ECC RAM

pstadler · on April 17, 2017

The same is true for AMD's new Ryzen CPUs, ECC support is enabled. You still need a suitable motherboard though.

dmm · on April 17, 2017

On linux you can force enable ECC on an unsupported motherboard with

> ecc_enable_override=1

Most motherboards have the extra memory traces in place even if they don't enable ECC.

dwighttk · on April 17, 2017

"cost them basically nothing"

"using it as a differentiator between server and desktop"

so it would cost them SOMETHING...

coldtea · on April 17, 2017

Something related to artificial milking, as opposed to tangible production costs.

dwighttk · on April 17, 2017

Sure, but as a consumer it is kind of nice that my chips are probably a little less expensive because server companies are paying more for their chips.

coldtea · on April 17, 2017

Who said we aren't milked as consumers too?

Those are two segmented markets. If Intel's revenue is above their R&D and other expenses, then whatever profit they make milking enterprises/server companies is independent of what they make milking us.

brianwawok · on April 18, 2017

Why is this any different from tesla selling you software locked batteries?

coldtea · on April 18, 2017

Well, it could be, who said it isn't? Though i'm not familiar with the Tesla case.

But selling "locked batteries" in a product where the batteries are 80% of the innovation/feature set, and where after-market batteries could cause all kinds of issues, is one thing.

Whereas selling memory at triple or more the price just because you switched on some feature (ECC) that would have costed nothing to switch on for everybody is another thing.

brianwawok · on April 18, 2017

In Tesla, they literally sell everyone the 75 KW battery. Some people pay 80k for the car, some people pay 75k for the car, but the battery is software limited to 60KW. You can later pay 6k to software unlock the extra part of the battery sitting in your car.

Product market segmentation is a very reasonable thing to do. Why do people make it out like a bad thing? If you ever run a business, you will want to find a way to get big enterprise to pay X, and small business to pay X/4. ECC is something businesses care way more about than gamers, so why not charge more for it?

coldtea · on April 18, 2017

>Product market segmentation is a very reasonable thing to do. Why do people make it out like a bad thing?

Because most of us would rather pay a price that mostly reflects costs + some reasonable profit, not some artificially created segment, not fuel extravagant profits, not pay for future research, not pay for the company to have cash reserves, etc etc.

brianwawok · on April 18, 2017

As a consumer, your only choice is binary. Vote yes and buy, vote no and don't buy.

If it offends you that your device has some enterprise feature that you don't really need turned off unless you pay $x... sorry? But you don't really have a right to the feature for some margin % that you deem fair.

ivank · on April 17, 2017

Eh, I've seen an Intel SSD 320 eat a bunch of pages and an HGST 5K4000 4TB direct thousands of writes to the wrong sectors. There are a lot of things that can go wrong with storage devices, not just the optimistic case of a bit getting flipped after being written correctly.

adolph · on April 17, 2017

Aside from transferring from disk to every larger disk, how would memory corrupt an mp3? Given Apple's long-time tradition of putting metadata (playcounts/star ratings/etc) in a separate file, an mp3 on a MacOS computer is pretty much write once, read many.

gvb · on April 17, 2017

Static data on a flash device (e.g. SSD) is subject to wear leveling, which is a regular re-writing process. This is counter intuitive, but makes sense.

If your flash device never moved the static data, the only flash blocks that would get wear cycles would be the flash blocks that contained dynamic (normally changed and rewritten) data. The result would be the blocks of flash that were not static would quickly wear out and the blocks of flash that were static would have a lot of unused write cycles available.

In order to use all of the wear cycles of all of the blocks, the static data has to be moved regularly so the blocks all have a (roughly) equal number of wear cycles[1]. Every time the data is moved, there is an opportunity for data corruption.

The flash data blocks (typically) have ECC (error checking and correction) which is designed to prevent data corruption. There are limitations to ECC:

* ECC can only correct a limited number of errors.

* Flash memory is not a perfect storage medium, it can "bit rot" too - the primary reason for ECC with flash is to "hide" the inherent bit rotting of flash. "MLC"[2] flash chips aggravate the problem because their margins are smaller.

* If a memory controller does a wear leveling move and the source data is bad, beyond the ability of the ECC to correct, it has no way to correct that error and (generally) has no way to inform the user that their file (system) has suffered corruption.

In Jean-Louis Gassée's anecdote (which is typical), his notification that his wife's files were corrupt was an backup failure notification. The backup failure was telling him that it could not read files, but it was not clear to him (and would not be clear to most users) that the root cause was file corruption, not a backup problem per se.

[1] https://en.wikipedia.org/wiki/Wear_leveling#Static_wear_leve...

[2] https://en.wikipedia.org/wiki/Multi-level_cell

Symbiote · on April 17, 2017

I add parity archives to add some redundancy to my photographs. I've done it for years, but I think it's useful not to rely on a filesystem to handle this.

For example:

  par2create -r5 -n2 example.par2 *.jpg

creates two files, between them giving 5% redundancy. I think that should be more than enough to repair bitrot within a file, but depending how many photos there are, losing a whole file could be more than 5%.

  par2verify example.par2

will verify, and par2repair will repair corrupted or missing files.

https://en.wikipedia.org/wiki/Parchive

agumonkey · on April 17, 2017

Ha, I thought of writing a similar ad-hoc checksuming tool so many times. I should have checked. I now wonder how feasible it is to embed them invisibly in metadata fields )

indolering · on April 17, 2017

It would be neat if there was a parity scheme fast enough to preserve 2% of of all files on disk. It could even be tucked away behind savings from file-level compression.

mfukar · on April 17, 2017

> it's useful not to rely on a filesystem to handle this

In what way?

wtallis · on April 17, 2017

Relatively few filesystems offer thorough data checksumming. Hardly any offer erasure coding. RAID at the filesystem layer is a bit more common, but also more inconvenient. Doing erasure coding at the file archive level rather than in the filesystem gives you the freedom to move your archives onto standard everyday filesystems and devices without silently losing the protection.

jen20 · on April 17, 2017

You don't need _a lot_ of filesystems to offer it - just the one you use - and if this is important to you, just use ZFS.

rovr138 · on April 17, 2017

What if I use multiple computers and multiple operating systems want to be able to work on this data on them. Par2 would allow me to create the needed data on any computer and test it on any computer.

The other alternative is having a server that's up and running all the time, exposed to the internet (or complicate the setup with a VPN), so I can sync. Operations would take a long time (via the internet) or I would anyway need to transfer the data to my computer, work on it, sync it back. During this time, any protections that ZFS offers are null since anything could happen in my computer and I can't test for it locally.

ZFS is great. But it's not the answer to everything.

coldtea · on April 17, 2017

>You don't need _a lot_ of filesystems to offer it - just the one you use

Only if you use it everywhere. Including on your laptop.

RoyTyrell · on April 17, 2017

How often have you had to, and been able to, correct data using your parity archive?

Symbiote · on April 17, 2017

I think I've needed it once, and I was able to correct the error.

I have a cron job which runs every month and verifies each photo album; that picked up the error.

itsoggy · on April 17, 2017

Yep, I had a similar thing with audio files corrupting on my old MacBook, I was using time machine to backup and guess what.... the copy in the backup was fried too!

Switched back to vinyl for my music, my 20 year old Texhnics 1210 will probably out last my current laptop and the next!

DugFin · on April 17, 2017

The physical vinyl won't outlast your digital, it'll just fail more gracefully, slowly losing fidelity every time you scrape it with a needle. You're just replacing a small chance of catastrophic failure with a guaranteed gradual failure.

semi-extrinsic · on April 17, 2017

Wikipedia says

"Record wear can be reduced to virtual insignificance, however, by the use of a high-quality, correctly adjusted turntable and tonearm, a high-compliance magnetic cartridge with a high-end stylus in good condition, and careful record handling, with non-abrasive removal of dust before playing and other cleaning if necessary."

In other words, record wear is negligible if you buy good equipment and take good care of it. Just like the risk of bit rot, catastrophic crashes etc. is negligible if you buy good storage equipment and take good care of your data (good backups etc.)

Perhaps the one big advantage of vinyl over digital is that the shelf life of unused vinyl is larger than a human lifetime. A disk stored on a shelf, however, can suffer damage in many ways and is guaranteed to be very hard to connect to your computer in ten-twenty years.

itsoggy · on April 17, 2017

With a little calibration of the TT once in a while it will esily outlast me!

amdavidson · on April 17, 2017

You switched to a playback means that is inherently destructive to the media as a means to protect against data loss?