I worked on a project professionally to find out why we were corrupting SD cards...

more-coffee · on April 6, 2018

Been in a similar boat. Management said we couldn't afford industrial cards because it would kill the margin on the embedded devices they were used in, so we had to make it work with 3$ p.o.s. cards, which would corrupt after 6-12 months.

Ended up mounting /var on a tmpfs - ensuring practically no writes to the card - and fetching the device configuration from a server in the network at boot. PLenty of work, with zero (or negative) profit for the company, but at least I learned a thing or two doing it.

amag · on April 6, 2018

Yup, been there too, way, way back.

We wrote simulation tools for typical access patterns and ran them heavily for testing out new cards, both for performance and failure rate. Luckily there was enough margin on the devices that we (R&D) could select the more expensive industrial graded cards by good manufacturers, but I did get to see some really shitty cards that purchasing preferred (because they had gotten a great price on them).

However in our case the corruption was due to a buggy FAT-driver for the obscure RTOS we used. In the end though I learned a lot about FAT which was fun (<sarcasm>and is a highly marketable skill these days</sarcasm>).

monocasa · on April 6, 2018

Haha, I ended up tapping all read and writes to the card, intending to catch our crappy FAT driver in the act of corrupting.

Turns out the card itself was shit and replaying the trace would just kill arbitrary cards.

amag · on April 7, 2018

Haha, I spent so much time debugging their crappy FAT-driver. Another time the whole system would just hang when you created a new file in a directory that already contained some files. Turns out that their long-to-short filename algo was:

For the first 4 files with identical prefix use the ~N scheme as in LongFile.txt -> LONGFI~1.TXT For the following files it was: 1) "LONG" + hex_str(hash(long_file_name)) + ".TXT" 2) if there's a name collision repeat step 1.

Compound that with a the fact that hash() was implemented something like:

    int h = 0;
    for(int i = 0; i < strlen(long_file_name); ++i)
    {
      h = (h + long_file_name[i]) ^ 0x42424242;
    }

It's pretty easy to see where this falls apart...

michaelt · on April 7, 2018

How does one tap all the reads and writes to a card?

hazeii · on April 7, 2018

Just intercept the read and write calls! On linux you can just use 'strace $prog'; try it on a minimsal program that reads and writes a file. For a custom approach, this can be done with a library (using LD_PRELOAD). Plenty of articles about how to do this, e.g. http://www.linuxjournal.com/node/7795 (essentially the same technique on Windows, or other OS with shared libs). Slightly more hardcore is to hook the kernel syscalls; not normally necessary (on linux this is effectively hooking the other side of glibc, where it talks to the kernel) but if all you've got a static binary it's one approach.

Note that all these techniques only show you the read and write calls made to the library/OS and - importantly - not what actually happens to the card. To see that, the next level down is to instrument the card driver to track the actual I/O operations (i.e. so you see what the card is really being asked to do, sans all the caching and buffering).

Note that's not the end of the story; there's what the hardware controller decides to do and when the hardware actually reads/writes the flash array. That's the level where the quality of the firmware in the controller(s) matters.

monocasa · on April 7, 2018

I suspected the filesystem driver itself, so I had to modify the driver rather than strace (also this was WinCE quite unfortunately).

michaelt · on April 7, 2018

Doesn't that mean you then have to write an event to disk for every write-to-disk event?

sowbug · on April 7, 2018

I would log by RPC or whatever to a remote computer.

monocasa · on April 7, 2018

Modify the SD card driver to open a socket to a computer in the lab, and just log it's commands and responses over TCP with timestamps. The network was orders of magnitude faster than the crappy SDIO interface, so it was easy this time. I've had to do a similar thing with SAS, and you basically just have to pay tens of thousands for a special purpose protocol analyzer, or build your own with an FPGA.

sitkack · on April 6, 2018

And write what you need to write to disk using an FEC so you can read it back off when it is corrupt. Effectively do what the card should be doing in the first place.

Image the system, and confirm the checksums on first initial boot, then never re-write any of them.

digi_owl · on April 6, 2018

Sounds like a good description of all those "It runs Linux because here is the kernel error" images that bounce around the web. Most of them from signage solutions or similar that stay on for ages at a time, and invariably comes down to the storage device going belly up.

xiaomai · on April 7, 2018

this is the story of my life (I run a signage business).

The rpi2 seemed to be the worst for fs corruption, I've never had a pi1 fail on me. Jury is still out on the pi3.

cghart123 · on April 7, 2018

Have a similar situation. Would be helpful (and grateful) if you would expand on what you learned to minimize writes.

pera · on April 6, 2018

Yeah, industrial grade SDs is the way to go. Here is an interesting graphic comparing different types of NAND memories:

https://www.cactus-tech.com/resources/blog/details/slc-pslc-...

0xcde4c3db · on April 7, 2018

You'll pay for it, though. Truly industrial-grade SD cards (i.e. SLC flash, industrial temperature range, specified terabytes written/MTBF) apparently cost over $10/GB unless you're buying in huge quantities. I guess that might be a good motivation to slim down your root filesystem.

monocasa · on April 7, 2018

A lot of the newer industrial cards are 'aMLC', which basically means you take an MLC NAND, but only stuff 11 or 00 into a cell and sort of treat it as SLC. You can get for close to $1/GB at sane quantities.

chasil · on April 6, 2018

I bought a SwissBit card off of Ebay for my Raspberry PI, because I wanted the longevity of SLC media.

The prices seem to be trending up for SwissBit.

Ahmed90 · on April 6, 2018

Are you allowed to share the results of your experiment? i have a similar issue with a very limited market (SD Brands/models), it would be great if you provide more info :)

monocasa · on April 6, 2018

I'd be hesitant to name any names due to NDAs and what have you.

I'll just say that all consumer level cards pretty much don't really care about your data, even the good brands.

logicallee · on April 6, 2018

I think I speak for everyone that we really appreciate what you did share here.

For how to use what you've shared if we trust you, can you throw us a bone more than:

>buy 'industrial' SD cards

For example can you name a specific card? Or give us enough clues that we can do so?

I for one take your advice very seriously but I don't know how to use what you've just shared. I've seen ATM-style Pi-based kiosks with corrupted SD cards that wouldn't boot. It looked expensive.

mrsteveman1 · on April 6, 2018

Take a look at ATP, specifically the AF4GUD3A and AF8GUD3A, for 4GB and 8GB respectively.

Digikey[1] and Arrow[2] generally sell the 4GB for $15 and the 8GB for $25. They make larger versions too, if you need them.

The 'A' stands for aMLC, they're using normal MLC flash (which most consumer SD cards no longer use, but is generally much more reliable than TLC) but they use it in 1-bit per cell mode like SLC. They make traditional SLC cards as well, but the price skyrockets.

The aMLC cards have very good endurance ratings, but they're still cheaper than SLC cards. The firmware and controller are designed to prevent sudden power loss issues, which is apparently the root cause of a lot of SD card corruption on the Pi.

They're also supposed to have lifetime (i.e. SMART) monitoring, but it's a vendor specific command set rather than something smartmontools can read. ATP has a tool for it that probably only runs on Windows.

I've been using those aMLC cards in a bunch of Pi3 and Pi Zero W devices for months, I've never seen them become corrupted or fail to boot even once, despite being pretty hard on them, compiling stuff, yanking the power, etc.

For comparison, a Samsung Ultra+ card became corrupted after a single power loss. The device was running Windows 10 IoT Core at the time, it never booted again and had to be re-flashed.

[1] https://www.digikey.com/product-detail/en/atp-electronics-in...

[2] https://www.arrow.com/en/products/af4gud3a-waaxx/atp-electro...

sitkack · on April 6, 2018

Could a lot of this be mitigated by having better file systems?

mrsteveman1 · on April 6, 2018

A small amount of it, yes, but it's the card itself causing corruption issues most of the time, it happens even on devices where the filesystem is read-only. The controller will write and move things around for maintenance purposes even if the host hasn't issued a write command.

The cheap SD cards just aren't designed for anything except being used in consumer devices with batteries, where sudden power loss is rare and losing data isn't going to cause a plane to crash or result in someone not receiving a dose of insulin.

So when they suddenly lose power, they aren't always capable of ensuring that whatever task they were carrying out at the time is actually completed and did not accidentally destroy data.

And apparently the consumer SD card controllers are really there to manage and remap parts of the flash that were defective before ever leaving the factory.

It's probably cheaper to build over-provisioned cards with a simple controller that can deal with manufacturing defects in the field, than to do QA on 200 million thumbnail sized NAND die every month and still try to profit while selling them for a fraction of a penny each.

zzzcpan · on April 6, 2018

It seems very much possible to fix data corruption of a broken flash translation layer of SD cards with yet another translation layer using erasure coding. Whatever the controller does or doesn't it still can only write in blocks, we just have to make sure erasure codes are at least a block size apart from the data.

monocasa · on April 7, 2018

At least for me, the failure mode in most cases had the card simply erroring out on accesses (even writes!) to particular sectors. Other sectors would allow to write, so it wasn't that the card had run out of sectors with write lifetimes. When you're not given anything to work with, even corrupted data, extra ECC isn't going to do you any good. My conjecture (with no data) is that the card's FTL was itself corrupted.

EDIT: and in no cases I saw did the cards I was testing give me truly 'corrupt' data. Just either error codes, or stale data, or occasionally data from another sector entirely. They've got metric shittonnes of ECC internally (to make up for the crappy NANDs), and will do a better job than you can at detecting errors.

zzzcpan · on April 7, 2018

Right, so, if we use virtual blocks the size of the physical blocks and either parity block or one of the data blocks are destroyed it is still possible to get the data. And with writes if a write fails our layer can just try to write a bit farther and farther until it succeeds.

monocasa · on April 7, 2018

In general you don't have access to the physical block size. And it can change even within a given lot.

zzzcpan · on April 7, 2018

Exact block size is not necessary, big safe choice is fine too. But I suspect it's trivial to detect it with benchmarks, as writing even a byte over a block size would require overwriting two physical blocks instead of one, which is much slower.

cmurf · on April 6, 2018

Btrfs will detect these corruptions and result in EIO so at least corrupt data isn't propagating beyond the file system. While you can set metadata and data to DUP, and thereby get automatic recovery from corruption, it could still fail if the card colocates the two copies into the same page or block on the sd card. So if the corruption cause can affect multiple parts of a page or block, recovery may still not be possible. And also you're doing double the writes, so half the write speed, and half the lifetime. Another option is Btrfs raid1 if you can stick two cards into the device.

XFS has metadata checksums, enabled by default when using xfsprogs 3.2.3+ but data is a much bigger footprint so you can still get hit with silent data corruption. And ext4 any day now is going to start to default to metadata checksums as well.

For those file systems, you can use dm-integrity or dm-verity. https://gitlab.com/cryptsetup/cryptsetup/wikis/DMIntegrity

geofft · on April 6, 2018

Depending on what the errors are, you could design a filesystem where the apparent block size is a bit smaller than the underlying disk's block size, giving you room to add some sort of error correction code to the bits you write. Then a single-bit error (even if the controller moves a block around within the disk without telling you) can be corrected by the filesystem driver, and the correct data re-written.

(I also want a filesystem that does this so that you have room for a proper authenticated encryption mode for your full-disk encryption - if your apparent block size is the same as your physical disk block size, either you have no room for an authentication tag and you're using a pretty fragile scheme for making your ciphertext tamper-resistant, or you kill performance because you need to read the authentication tag from another block. Current disk encryption software tends to choose the former.)

userbinator · on April 7, 2018

Then a single-bit error (even if the controller moves a block around within the disk without telling you) can be corrected by the filesystem driver, and the correct data re-written.

In my experience, flash corruption of the type found in SD cards are completely blank (00) or erased (FF) blocks, not single-bit errors. Remember that SD already has a layer of error correction to handle those from the raw flash.

monocasa · on April 6, 2018

ATP Inc. has treated us very well, both their products and their support.

But in general 'industrial' is the keyword to get the good shit from manufacturers who'll treat you like an adult.

pera · on April 6, 2018

Search for SLC type memory cards (in the Technology filter):

https://www.digikey.com/products/en/memory-cards-modules/mem...

monocasa · on April 6, 2018

Industrial is more than that. A big one for me was that the manufacturer will work with you and notify you when they change their BoM, letting you reverify and track internally.

stavros · on April 6, 2018

Why not name the manufacturer so we can avoid them?

monocasa · on April 6, 2018

NDAs.

24gttghh · on April 6, 2018

Wow I had no idea to even look for different NAND geometries for SD cards! More expensive obviously but probably well worth it.

jpm_sd · on April 6, 2018

Yes, SD cards are garbage. eMMC is far, far superior.

rwky · on April 7, 2018

Any idea where industrial micro SD cards can be purchased in Europe? A casual search didn't yeild much info.

fest · on April 7, 2018

I've bought Swissbit cards from Mouser/Farnell.

pmorici · on April 7, 2018

RS Components has them

pingec · on April 6, 2018

Any recommendations?