Hacker News new | past | comments | ask | show | jobs | submit login
The case of the missing WAV audio files on the FAT32 SD Card (hanselman.com)
156 points by shanselman on March 15, 2020 | hide | past | favorite | 60 comments



> I didn't want to take any chances to I picked up a 5 pack of 32GIG high quality SD Cards. [link to Amazon]

The lesson: never ever buy SD cards or USB sticks from Amazon. Buy them from a reputable company.


There was nothing wrong with the SD cards physically, they weren't formatted properly for the sound recording device. There was zero data loss, it was a data access issue. An initial format would have helped avoid the problem. I'm not sure the lesson you state is the lesson the author was trying to share.


There’s a fairly good chance that the physical cards themselves were fake. That happens a lot on Amazon. Given that cards purchased from reputable manufacturers tend to be formatted without issue, it’s reasonable to guess that the real problem here is that those are counterfeit cards.


Agreed. This was probably someone trying to tweak the fat32 header for better fake-ability, for example putting the sector maps elsewhere.


It could be for a example a 4gb SD card formatted as 32gb. Which is fine if you only use 300mb like he did.


I thought the same. "I didn't want to take any chances," so I bought imports on a marketplace with a known history of legal gray areas and fake products?


Even if those cards aren't fake, they definitely aren't "high quality". Memory cards that don't use SLC NANDs (maaaybe pSLC too) shouldn't be trusted for anything serious.


You're being ridiculous. There are plenty of "serious" use cases that don't need a low-capacity industrial grade memory card.


If you're OK with randomly losing files without any warning then I'm not sure if your use case is serious.


There are options on the market that fall between questionably-sourced low-cost consumer grade cards and industrial-grade SLC cards.


But the lost files in this case weren't anything to do with the SD card hardware, were they? It was because of Zoom's implementation of FAT32. At least, that's what I understood from the article.


Of course, I'm not arguing that.


Wow, what a ride. 7-zip is awesome. For anything involving (suspected) data loss or "data disappearance", I usually try Testdisk/photorec from https://www.cgsecurity.org/


I've used 7-zip to extract data from vmware VMDK files. Nothing else has come close to opening obscure files for me.


That was an awesome article! The kind of rabbit hole I’d gladly go down :)

SD cards and USB sticks are just.... weird. I haven’t delved too deeply but maybe someone here knows.... why is this? At this point I almost think of them as ‘not disks’. The particular weirdness I’m thinking of is I’ve seen more than one drive just seem to die, just because you try and repartition them in Windows. Not even filling the drives up.

Is it the case that USB drive and SD card firmware is just crap? Like it makes assumptions about the device other than ‘this is just a set of blocks on flash memory’ and actually needs things on the disk at a file system level to be a certain way for it to work properly? I’m really curious.


Like it makes assumptions about the device other than ‘this is just a set of blocks on flash memory’ and actually needs things on the disk at a file system level to be a certain way for it to work properly?

The wear leveling algorithms are usually written to assume you're using FAT32 as the filesystem and certain parameters of it, but those assumptions are optimisations for wear and speed, not requirements --- AFAIK the last time I looked at this stuff in any deep detail, they weren't so crazy as to try to read the block content to determine the layout/filesystem, but those were still the days when 100k SLC was the norm and 10k MLC met with reliability skepticism.

The race for capacity means flash endurance has taken a steep nosedive with things like TLC/QLC, so if anything it's not really the firmware that's crap, it's the flash itself --- and the firmware is increasingly trying to compensate for it. For the cheapest USB/SD I think the firmware is actually stored in reserved areas of the flash itself, so any corruption has a much higher chance of rendering the device unusable.


Some drives try to interpret the fat free list (list of unallocated sectors), and try to erase them ahead of time for fast writes later.

If you reformat the disk and the partition alignment changes or the freelist moves, they'll randomly erase the wrong sectors of your data.


Do you have any references for that? It sounds horrifying, but very plausible.


I've seen that happen on Mac, but not Windows, unless it's a counterfeit card. If you "dd" to a mounted disk on Mac, you are going to have a bad time.

Also, people use "dd" a lot when all they need is "cat".


>people use "dd" a lot when all they need is "cat".

Well, GNU dd has nice things like 'status=progress' and options for direct-I/O :)


Or the awesome progress viewer (pv command). This is cat with some feedback. I use it a lot to image stuff when the media is readable, if you have bad sectors you need some like ddrescue to image it.


Yes USB drive and SD card firmware is just crap as with any embedded software... Firmware is always crap... That said, NAND Flash storage is quirky.

Individual bits on a NAND flash can be addressed or read out in kB "page". Each pages can be "programmed" aka written to, by repeatedly applying desired bit patterns (0b11001001 ... 0b11001001 ... and it sticks). Bits can only be set by writes, not unset. "Depth" of write can be deepened for longer storage or can be used to represent multiple state in the same bit(e.g. 0V means zero, 0.5V means one, 1.0V means two ... that's ternary). State of the art of multi-level write is something like 256 or 1024 levels per cell(bit).

Above is for read and write operations. As said, write ops can set bits, but cannot unset(0b0010 can be overwritten to 0b1111 but not 0b1101). So, in real life, all data in the page is wiped clean before writing. However in erasing context, erase operations take "blocks", of dozens of "pages" together in few MB size where you'd love to specify individual cells or pages. This means a one byte change in a file on a disk necessitate few MB worth of erase. To make it even worse, one cell of the flash lasts only up to 10k program-erase cycles on ancient chips or could be little as 300 cycles on modern designs. Clearly a naive approach destroys NAND way faster that what is commercially acceptable pace. Error margins are even worse, I've read that ECC correction is part of normal read operations or something.

Concerning all these quirks combined, clearly the flash chip interfaces cannot be directly exposed to OS driver, disk LBA addresses cannot be linearly mapped to chip banks in the ways DIP EEPROMs might have been wired in the 80s. Write ops must be cached to minimize reprogramming, and also must be constantly redirected to least used areas. Erase ops must be substituted by marking pages safe to discard. Defragmentation must be handled disk-local to avoid stray active pages scattering. These features called Wear Leveling are present on virtually any flash storage devices, since around high capacity cards started to exceed 16GB in size. From then, the controllers grew to support MLC/TLC/QLC technology(256 level per bit thing) or DRAM-less SLC caching(use part of TLC/QLC chip as SLC as substitute for DRAM write cache - used to be multiple GB of DDR3 in itself at one point) etc etc.

In short, you're completely right that SD cards and USB sticks are "not disks". Large erase units and low endurance of NAND chips, that necessitate intricate nursing, is what makes them "not". (I'm only a PC nerd, not a NAND expert. Especially numbers above are through my Google-fu so can be off)


> The particular weirdness I’m thinking of is I’ve seen more than one drives just seem to die, just because you try and repartition them in Windows. Not even filling the drives up.

I’ve absolutely had this issue. Lots. I can’t figure out why either. I have about half dozen Micro SD cards that now say they’re not formatted, and error when trying to format them in windows. All of them “broke” during a simple repartition in Disk Manager.

I’ve had minor success with some of them using fdisk to “repair” them but others just fail there too.


Have you tried Linux? It tends to be better at low-level access, or at least give a more informative error message.


I had a similar issue and clearing the disk (clean) in DISKPART allowed me to re-format through windows.

https://neosmart.net/wiki/diskpart/


I believe newer versions of MacOS dropped support for a specific partition scheme (the name of which I unfortunately don’t quite remember)

I spend some time on it just recently because disk manager would not work with the official Ubuntu images.


If you fully zero the media,

then partition for proper cylinder alignment even though experts have disparaged the need for this two decades now,

then format the target FAT32 partition using MS-DOS while actually booted to the W98SE startup floppy, CD, or equivalent (accepting no later DOS version than W98SE of 2001, which all proved to have lesser FAT consistency),

making sure both partition and format are accomplished using 255 heads & 63 sectors/track, correcting and re-doing if necessary, regardless of _native_ device geometry that may come up by default and need to be over-ridden,

you will almost never be as disappointed about compatibility as when you let other tools or devices prepare your FAT32 media.

________________

Even on devices which are intended to never boot the media, best results can also often be obtained when there is a master boot record (MBR) placed at the beginning of sector 0 anyway even though it's theoretically unnecessary. This is in addition to the partition table at the end of sector 0 which is actually essential either way for booting and/or mere storage.

As an an example if you simply create a valid partition table alone on a fully blank USB drive using a Linux tool, you would not yet have a DOS MBR at the beginning of the same sector 0.

Regardless, the partition should be ready for recognition and formatting by MS-DOS as FAT32 using its default geometry for the detected situation. After FAT32 formatting there will then be a volume boot record (VBR) at the starting sector of the partition a certain distance away from sector 0 (best recognition is usually when the VBR is at sector 63).

In MS-DOS, without disturbing the partitioon table, you place a DOS MBR on sector 0 (overwrites MBR zone and current MBR) using an undocumented switch in FDISK; FDISK /MBR when you are booted to floppy and there is no other drive hardware than the target, when the target is recognized as drive 0 by the BIOS and identifed as C:\ by DOS even before formatting, and supposedly when HDDs other than the hardcoded drive 0 target are present as well.

In Windows 10 a BIOS MBR can be written to a chosen sector 0 of a HDD containing a letter volume by BOOTSECT.EXE at the command line using the /MBR option. This occurs while you intentionally overwrite a target VBR simultaneously with an NT6 (BOOTMGR) version which is the main purpose of the BOOTSECT command. If this ends up being done you're probably better off rebooting to DOS and reformatting (/q option for a quick format) to replace with a DOS VBR before using the FAT32 volume.

But the Windows 10 MBR is a good MBR for an otherwise pure DOS HDD.

Unfortunately, occasional devices defy all patterns which Windows versions and sometimes MS-DOS itself are capable of recognizing and/or generating. Designers have poorly selected (sometimes bizarre) partitioning and/or formatting layouts far removed from what was consistent with all that had come before, even though there could be superficial Windows compatibility apparent at one point. This was not progress since FAT32 with LFN is a long-established, stable, now patent-unencumbered standard, and it is more valuable than ever to maintain compatibility back to its foundational 1999 OS version, before important features were compromised in ways which could unfairly make NTFS seem more appropriate by comparison.

For that you may just have to format media in the device itself to achieve its unique desired geometry & layout.


This advice should go without saying:

"Re-format the random SD cards you get from Amazon specifically on the device you're gonna use them."

Do this also for portable hard drives.


Since it comes up so often now, reformat the drive and then do what to verify that it's a legit drive, and not one that advertises as being larger than it actually is?

Put a bunch of large media files on it and verify that they still play properly?


I always use this tool https://github.com/AltraMayor/f3

It fills the entire device with data and then tries to read it all back. It can tell you how many bytes were successfully read, how many were corrupted and how many were written over by other writes.

Even on cards I know are real I still run the test because I have had a card that had a few bytes that got corrupted which caused loads of issues with my rpi.


I usually use f3probe, which IIRC uses some heuristics and trades of accuracy for speed, i.e. it doesn't write to the entire drive.

The docs are lower down the page from the main f3 utils:

https://fight-flash-fraud.readthedocs.io/en/latest/usage.htm...


The test time can be pretty insane on the high capacity cards. I remember it took about 4 hours to test a 256gb card but I find its worth it to check the card just once since it saves a lot of pain later when you find one tiny part of the card is failing.


I used h2testw on Windows to do this. It writes a 1-GB files with a special pattern on the disk until it is full (or however many GB you specified before starting it). It then tries to read them all back to see if they have been written correctly.


Any storage device I buy gets at least a full pass of read/write testing, any errors means it's being returned as not fit for purpose. SSDs and HDDs get 8 full passes.


Why 8, and not 4 or 7 or 16?


with a QLC drive, like the 660p, 8 cycles is 4% of your total drive life...


To be fair, he did test the storage device:

"I did a local recording right there and played it back. Sounds good. I played it back locally on the Zoom and I could hear the recording from the Zoom's local speaker"


That test doesn't really give you any more information than noticing that you didn't get a "card not found" error. A full-capacity write and read verification is a test that's actually informed by the possible failure modes that wouldn't be immediately apparent from ordinary usage.

If you're worried enough to take the deliberate step of manually testing your storage device, you should at least make sure it's a useful test that tells you something you don't already know.


7-Zip is pretty awesome. At one point I had a file that simply wouldn't delete in Windows. Came across a random thing online that said to try and delete it from 7-Zip. Simple as that 7-Zip was able to delete the file while everything else couldn't.


Another weird thing 7-Zip can do: extract files really, really, really correctly.

Often I download some japanese games (not pirated stuff, I mean freeware indie games, often being fan-made, like a fan game of some anime or something), and windows Zip somehow extracts them wrong, sometimes files are missing, or the game runs but with corrupted textures and so on.

If I extract same games using 7-zip then they work fine.

If there are files that use non-latin characters then 7-zip is mandatory, windows just fails to extract those files properly.


That's a common issue with Windows. The internal file API is much more powerful than the user-accessible front-end programs, which can be very frustrating at times. I had similar issues with corrupted file names created by some application bug. There was no way to delete these invalid file entries using OS-provided tools.


That clears up some of the mystery. I was wondering how 7-Zip could do it but not explorer or command prompt. But it makes lot of sense that 7-Zip is just fully utilizing the internal file API.


I'm wondering if `binwalk`'ing the initial dd-image would have worked. That's the first thing I would try.


It absolutely doesn't, in this case.

    $ binwalk hanselman.img

    DECIMAL       HEXADECIMAL     DESCRIPTION
    --------------------------------------------------------------------------------
    13026788      0xC6C5E4        MySQL ISAM index file Version 6
    13064186      0xC757FA        MySQL ISAM index file Version 2
    15513282      0xECB6C2        YAFFS filesystem, little endian
    18368322      0x1184742       YAFFS filesystem, little endian
    42678040      0x28B3718       MySQL ISAM compressed data file Version 6
    59068786      0x3855172       YAFFS filesystem, little endian
    60315328      0x39856C0       YAFFS filesystem, little endian


Patch dosfsck to just dump every file it sees.


Among one of the reasons the FAT family of filesystems is in widespread use is its simplicity, which makes data recovery easier. Writing a FAT driver/reader is a common assignment for systems programming courses, and I've done it myself too.

I took a quick look at the truncated image he provided, and it looks like the fields in the MBR and boot sector are OK; but things start getting weird after that.

The root is at cluster 2 where it normally should be (byte offset C00000 in the image), but its second entry is that all-spaces directory which claims to start in cluster 4, and furthermore its creation date is recent (2020-03-12).

Cluster 3 (C08000) is a directory which points to ZOOM0002.hprj, ZOOM0002_Tr1.WAV, and ZOOM0002_Tr2.WAV, its "." is correctly pointing to itself, but its parent according to ".." is at cluster 4. It is directory "ZOOM0002".

Cluster 4 (C10000) is a directory which contains 3 directories ZOOM0001 through 3, its "." is correctly pointing to itself, and its parent is pointing to the root (indicated with a cluster of 0). This is, according to the root, the "all spaces" directory.

Cluster 5 (C18000) is ZOOM0001 directory, and its "." and ".." are correct.

Cluster 6 (C20000) is a file, ZOOM0001.hprj

Cluster 7 (C28000) is a file, ZOOM0001_LR.WAV

In other words, everything looks OK except for that all-spaces directory in the root. According to page 25 of https://www.zoom-na.com/sites/default/files/products/downloa... that should have been named something like FOLDER01 through FOLDER10, so fixing that should make it valid again.

It's hard to say what happened here without knowing the state of the filesystem before the Zoom started writing to it, but the fact that the creation date/time of that nameless directory (2020-03-12 12:29) matches exactly with that of the first file it wrote (ZOOM0001.hprj/ZOOM0001_LR.WAV) strongly suggests that the card did not already have a corrupted filesystem beforehand, and the Zoom somehow wrote a blank name directory where it should've written FOLDER01. A search for "FOLDER" in the image yields no results either, so it's not like it wrote them somewhere else (unless it did so beyond the first 500MB of the card.)

I've seen embedded devices become confused and write corrupted filesystems a few times before, but they were far more egregious than this; e.g. ignoring/assuming certain fields' values would result in filesystem structures being wholesale shifted by some offset, etc. This is an unusual case because nothing about the filesystem stood out as being "obviously suspicious/non-standard", and everything except for the name was fine --- hence why 7-zip could still operate on it; it didn't care about the name.

The value of 1458 reserved sectors (729KB!) initially stood out, but upon further thought, that may simply be to align the first cluster with the eraseblocks of the flash, as if the Zoom really was to ignore it, the filesystem structures would've been far more mangled.


thanks a lot for dissecting this i was hoping for this ;) If i recall correctly the recording device could read those files without a hitch too. that's probably because it ignored the malformed name too or is it? i am a bit confused on how the device knew where to look at to find those files if not by the name of the parent directory.


Thanks!


its a really nice read but i am somewhat sad they neither know what actually happened nor why it happened at all. I hope this gets some attention from someone here that is skilled enough with this kind of things and willing to figure this out as it kinda bugs me now...


Why not try a ready rolled file recovery utility first? Why grind out this reinventing of the wheel?


Because he's a curious mind and wanted to figure out what was happening here. Admittedly not the most pragmatic approach, but definitely the more entertaining and informative one :)


Exactly! Thanks ;)


Great read! I agree, 7-zip is amazing. You can open all kinds of files for exploration using it.


He did not try chkdsk, the default tool to fix file systems on Windows?


lesson: use linux end to end (unless you work for Microsoft). :)


He tried reading the SD image on Linux and that didn't work either.

"We can see them, but can't get at them with the vfat file system driver on Linux or with Windows."


The author eventually recovered their files using 7-zip, which is a Windows program.


p7zip, the portable version of 7-Zip works on Linux, macOS and other Unix systems as well. It also has a Debian package.

Data recovery from that FAT32 filesystem would have worked identically in p7zip (and 7-Zip).


I know that p7zip exists. Nevertheless, the author used the Windows version and it worked for them, so I'm not sure why the Linux implementation is relevant to this conversation.


It's been available on all linux distros I have tried.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: