There was nothing wrong with the SD cards physically, they weren't formatted properly for the sound recording device. There was zero data loss, it was a data access issue. An initial format would have helped avoid the problem. I'm not sure the lesson you state is the lesson the author was trying to share.
There’s a fairly good chance that the physical cards themselves were fake. That happens a lot on Amazon. Given that cards purchased from reputable manufacturers tend to be formatted without issue, it’s reasonable to guess that the real problem here is that those are counterfeit cards.
I thought the same. "I didn't want to take any chances," so I bought imports on a marketplace with a known history of legal gray areas and fake products?
Even if those cards aren't fake, they definitely aren't "high quality". Memory cards that don't use SLC NANDs (maaaybe pSLC too) shouldn't be trusted for anything serious.
But the lost files in this case weren't anything to do with the SD card hardware, were they? It was because of Zoom's implementation of FAT32. At least, that's what I understood from the article.
Wow, what a ride. 7-zip is awesome. For anything involving (suspected) data loss or "data disappearance", I usually try Testdisk/photorec from https://www.cgsecurity.org/
That was an awesome article! The kind of rabbit hole I’d gladly go down :)
SD cards and USB sticks are just.... weird. I haven’t delved too deeply but maybe someone here knows.... why is this? At this point I almost think of them as ‘not disks’. The particular weirdness I’m thinking of is I’ve seen more than one drive just seem to die, just because you try and repartition them in Windows. Not even filling the drives up.
Is it the case that USB drive and SD card firmware is just crap? Like it makes assumptions about the device other than ‘this is just a set of blocks on flash memory’ and actually needs things on the disk at a file system level to be a certain way for it to work properly? I’m really curious.
Like it makes assumptions about the device other than ‘this is just a set of blocks on flash memory’ and actually needs things on the disk at a file system level to be a certain way for it to work properly?
The wear leveling algorithms are usually written to assume you're using FAT32 as the filesystem and certain parameters of it, but those assumptions are optimisations for wear and speed, not requirements --- AFAIK the last time I looked at this stuff in any deep detail, they weren't so crazy as to try to read the block content to determine the layout/filesystem, but those were still the days when 100k SLC was the norm and 10k MLC met with reliability skepticism.
The race for capacity means flash endurance has taken a steep nosedive with things like TLC/QLC, so if anything it's not really the firmware that's crap, it's the flash itself --- and the firmware is increasingly trying to compensate for it. For the cheapest USB/SD I think the firmware is actually stored in reserved areas of the flash itself, so any corruption has a much higher chance of rendering the device unusable.
Or the awesome progress viewer (pv command). This is cat with some feedback. I use it a lot to image stuff when the media is readable, if you have bad sectors you need some like ddrescue to image it.
Yes USB drive and SD card firmware is just crap as with any embedded software... Firmware is always crap... That said, NAND Flash storage is quirky.
Individual bits on a NAND flash can be addressed or read out in kB "page". Each pages can be "programmed" aka written to, by repeatedly applying desired bit patterns (0b11001001 ... 0b11001001 ... and it sticks). Bits can only be set by writes, not unset. "Depth" of write can be deepened for longer storage or can be used to represent multiple state in the same bit(e.g. 0V means zero, 0.5V means one, 1.0V means two ... that's ternary). State of the art of multi-level write is something like 256 or 1024 levels per cell(bit).
Above is for read and write operations. As said, write ops can set bits, but cannot unset(0b0010 can be overwritten to 0b1111 but not 0b1101). So, in real life, all data in the page is wiped clean before writing. However in erasing context, erase operations take "blocks", of dozens of "pages" together in few MB size where you'd love to specify individual cells or pages. This means a one byte change in a file on a disk necessitate few MB worth of erase. To make it even worse, one cell of the flash lasts only up to 10k program-erase cycles on ancient chips or could be little as 300 cycles on modern designs. Clearly a naive approach destroys NAND way faster that what is commercially acceptable pace. Error margins are even worse, I've read that ECC correction is part of normal read operations or something.
Concerning all these quirks combined, clearly the flash chip interfaces cannot be directly exposed to OS driver, disk LBA addresses cannot be linearly mapped to chip banks in the ways DIP EEPROMs might have been wired in the 80s. Write ops must be cached to minimize reprogramming, and also must be constantly redirected to least used areas. Erase ops must be substituted by marking pages safe to discard. Defragmentation must be handled disk-local to avoid stray active pages scattering. These features called Wear Leveling are present on virtually any flash storage devices, since around high capacity cards started to exceed 16GB in size. From then, the controllers grew to support MLC/TLC/QLC technology(256 level per bit thing) or DRAM-less SLC caching(use part of TLC/QLC chip as SLC as substitute for DRAM write cache - used to be multiple GB of DDR3 in itself at one point) etc etc.
In short, you're completely right that SD cards and USB sticks are "not disks". Large erase units and low endurance of NAND chips, that necessitate intricate nursing, is what makes them "not".
(I'm only a PC nerd, not a NAND expert. Especially numbers above are through my Google-fu so can be off)
> The particular weirdness I’m thinking of is I’ve seen more than one drives just seem to die, just because you try and repartition them in Windows. Not even filling the drives up.
I’ve absolutely had this issue. Lots. I can’t figure out why either. I have about half dozen Micro SD cards that now say they’re not formatted, and error when trying to format them in windows. All of them “broke” during a simple repartition in Disk Manager.
I’ve had minor success with some of them using fdisk to “repair” them but others just fail there too.
then partition for proper cylinder alignment even though experts have disparaged the need for this two decades now,
then format the target FAT32 partition using MS-DOS while actually booted to the W98SE startup floppy, CD, or equivalent (accepting no later DOS version than W98SE of 2001, which all proved to have lesser FAT consistency),
making sure both partition and format are accomplished using 255 heads & 63 sectors/track, correcting and re-doing if necessary, regardless of _native_ device geometry that may come up by default and need to be over-ridden,
you will almost never be as disappointed about compatibility as when you let other tools or devices prepare your FAT32 media.
________________
Even on devices which are intended to never boot the media, best results can also often be obtained when there is a master boot record (MBR) placed at the beginning of sector 0 anyway even though it's theoretically unnecessary. This is in addition to the partition table at the end of sector 0 which is actually essential either way for booting and/or mere storage.
As an an example if you simply create a valid partition table alone on a fully blank USB drive using a Linux tool, you would not yet have a DOS MBR at the beginning of the same sector 0.
Regardless, the partition should be ready for recognition and formatting by MS-DOS as FAT32 using its default geometry for the detected situation. After FAT32 formatting there will then be a volume boot record (VBR) at the starting sector of the partition a certain distance away from sector 0 (best recognition is usually when the VBR is at sector 63).
In MS-DOS, without disturbing the partitioon table, you place a DOS MBR on sector 0 (overwrites MBR zone and current MBR) using an undocumented switch in FDISK; FDISK /MBR when you are booted to floppy and there is no other drive hardware than the target, when the target is recognized as drive 0 by the BIOS and identifed as C:\ by DOS even before formatting, and supposedly when HDDs other than the hardcoded drive 0 target are present as well.
In Windows 10 a BIOS MBR can be written to a chosen sector 0 of a HDD containing a letter volume by BOOTSECT.EXE at the command line using the /MBR option. This occurs while you intentionally overwrite a target VBR simultaneously with an NT6 (BOOTMGR) version which is the main purpose of the BOOTSECT command. If this ends up being done you're probably better off rebooting to DOS and reformatting (/q option for a quick format) to replace with a DOS VBR before using the FAT32 volume.
But the Windows 10 MBR is a good MBR for an otherwise pure DOS HDD.
Unfortunately, occasional devices defy all patterns which Windows versions and sometimes MS-DOS itself are capable of recognizing and/or generating. Designers have poorly selected (sometimes bizarre) partitioning and/or formatting layouts far removed from what was consistent with all that had come before, even though there could be superficial Windows compatibility apparent at one point. This was not progress since FAT32 with LFN is a long-established, stable, now patent-unencumbered standard, and it is more valuable than ever to maintain compatibility back to its foundational 1999 OS version, before important features were compromised in ways which could unfairly make NTFS seem more appropriate by comparison.
For that you may just have to format media in the device itself to achieve its unique desired geometry & layout.
Since it comes up so often now, reformat the drive and then do what to verify that it's a legit drive, and not one that advertises as being larger than it actually is?
Put a bunch of large media files on it and verify that they still play properly?
It fills the entire device with data and then tries to read it all back. It can tell you how many bytes were successfully read, how many were corrupted and how many were written over by other writes.
Even on cards I know are real I still run the test because I have had a card that had a few bytes that got corrupted which caused loads of issues with my rpi.
The test time can be pretty insane on the high capacity cards. I remember it took about 4 hours to test a 256gb card but I find its worth it to check the card just once since it saves a lot of pain later when you find one tiny part of the card is failing.
I used h2testw on Windows to do this. It writes a 1-GB files with a special pattern on the disk until it is full (or however many GB you specified before starting it). It then tries to read them all back to see if they have been written correctly.
Any storage device I buy gets at least a full pass of read/write testing, any errors means it's being returned as not fit for purpose. SSDs and HDDs get 8 full passes.
"I did a local recording right there and played it back. Sounds good. I played it back locally on the Zoom and I could hear the recording from the Zoom's local speaker"
That test doesn't really give you any more information than noticing that you didn't get a "card not found" error. A full-capacity write and read verification is a test that's actually informed by the possible failure modes that wouldn't be immediately apparent from ordinary usage.
If you're worried enough to take the deliberate step of manually testing your storage device, you should at least make sure it's a useful test that tells you something you don't already know.
7-Zip is pretty awesome. At one point I had a file that simply wouldn't delete in Windows. Came across a random thing online that said to try and delete it from 7-Zip. Simple as that 7-Zip was able to delete the file while everything else couldn't.
Another weird thing 7-Zip can do: extract files really, really, really correctly.
Often I download some japanese games (not pirated stuff, I mean freeware indie games, often being fan-made, like a fan game of some anime or something), and windows Zip somehow extracts them wrong, sometimes files are missing, or the game runs but with corrupted textures and so on.
If I extract same games using 7-zip then they work fine.
If there are files that use non-latin characters then 7-zip is mandatory, windows just fails to extract those files properly.
That's a common issue with Windows.
The internal file API is much more powerful than the user-accessible front-end programs, which can be very frustrating at times.
I had similar issues with corrupted file names created by some application bug. There was no way to delete these invalid file entries using OS-provided tools.
That clears up some of the mystery. I was wondering how 7-Zip could do it but not explorer or command prompt. But it makes lot of sense that 7-Zip is just fully utilizing the internal file API.
$ binwalk hanselman.img
DECIMAL HEXADECIMAL DESCRIPTION
--------------------------------------------------------------------------------
13026788 0xC6C5E4 MySQL ISAM index file Version 6
13064186 0xC757FA MySQL ISAM index file Version 2
15513282 0xECB6C2 YAFFS filesystem, little endian
18368322 0x1184742 YAFFS filesystem, little endian
42678040 0x28B3718 MySQL ISAM compressed data file Version 6
59068786 0x3855172 YAFFS filesystem, little endian
60315328 0x39856C0 YAFFS filesystem, little endian
Among one of the reasons the FAT family of filesystems is in widespread use is its simplicity, which makes data recovery easier. Writing a FAT driver/reader is a common assignment for systems programming courses, and I've done it myself too.
I took a quick look at the truncated image he provided, and it looks like the fields in the MBR and boot sector are OK; but things start getting weird after that.
The root is at cluster 2 where it normally should be (byte offset C00000 in the image), but its second entry is that all-spaces directory which claims to start in cluster 4, and furthermore its creation date is recent (2020-03-12).
Cluster 3 (C08000) is a directory which points to ZOOM0002.hprj, ZOOM0002_Tr1.WAV, and ZOOM0002_Tr2.WAV, its "." is correctly pointing to itself, but its parent according to ".." is at cluster 4. It is directory "ZOOM0002".
Cluster 4 (C10000) is a directory which contains 3 directories ZOOM0001 through 3, its "." is correctly pointing to itself, and its parent is pointing to the root (indicated with a cluster of 0). This is, according to the root, the "all spaces" directory.
Cluster 5 (C18000) is ZOOM0001 directory, and its "." and ".." are correct.
Cluster 6 (C20000) is a file, ZOOM0001.hprj
Cluster 7 (C28000) is a file, ZOOM0001_LR.WAV
In other words, everything looks OK except for that all-spaces directory in the root. According to page 25 of https://www.zoom-na.com/sites/default/files/products/downloa... that should have been named something like FOLDER01 through FOLDER10, so fixing that should make it valid again.
It's hard to say what happened here without knowing the state of the filesystem before the Zoom started writing to it, but the fact that the creation date/time of that nameless directory (2020-03-12 12:29) matches exactly with that of the first file it wrote (ZOOM0001.hprj/ZOOM0001_LR.WAV) strongly suggests that the card did not already have a corrupted filesystem beforehand, and the Zoom somehow wrote a blank name directory where it should've written FOLDER01. A search for "FOLDER" in the image yields no results either, so it's not like it wrote them somewhere else (unless it did so beyond the first 500MB of the card.)
I've seen embedded devices become confused and write corrupted filesystems a few times before, but they were far more egregious than this; e.g. ignoring/assuming certain fields' values would result in filesystem structures being wholesale shifted by some offset, etc. This is an unusual case because nothing about the filesystem stood out as being "obviously suspicious/non-standard", and everything except for the name was fine --- hence why 7-zip could still operate on it; it didn't care about the name.
The value of 1458 reserved sectors (729KB!) initially stood out, but upon further thought, that may simply be to align the first cluster with the eraseblocks of the flash, as if the Zoom really was to ignore it, the filesystem structures would've been far more mangled.
thanks a lot for dissecting this i was hoping for this ;) If i recall correctly the recording device could read those files without a hitch too. that's probably because it ignored the malformed name too or is it? i am a bit confused on how the device knew where to look at to find those files if not by the name of the parent directory.
its a really nice read but i am somewhat sad they neither know what actually happened nor why it happened at all. I hope this gets some attention from someone here that is skilled enough with this kind of things and willing to figure this out as it kinda bugs me now...
Because he's a curious mind and wanted to figure out what was happening here.
Admittedly not the most pragmatic approach, but definitely the more entertaining and informative one :)
I know that p7zip exists. Nevertheless, the author used the Windows version and it worked for them, so I'm not sure why the Linux implementation is relevant to this conversation.
The lesson: never ever buy SD cards or USB sticks from Amazon. Buy them from a reputable company.