All About Erasure Codes (2004) [pdf]

omazurov · on Nov 19, 2017

Using a general matrix inversion algorithm to decode Reed-Solomon erasures is an atrocity. The only difference between encoding and decoding erasures is that we restore m=n-k code word values at static locations, k+1..n, when encoding, and m code word values at dynamic locations when decoding. In both cases, computing the matrix is O(mk) - much faster than matrix inversion.

akalin · on Nov 19, 2017

Can you elaborate more on the computation of the decoding matrix? Is it basically a special-purpose row reduction, taking advantage of the fact that the matrix to be inverted has a bunch of rows from the identity matrix? (I think it is, but would be curious if there's another algorithm out there!)

omazurov · on Nov 19, 2017

If RS(n,k) code is defined as Sum[i=0..n-1] Xi Zi^j = 0, j=0..m-1, m=n-k, where {Xi} are codewords and {Zi} are locators, then for any polynomial P(Z) of degree less than m we have Sum Xi P(Zi) = 0.

Now, let's erase m values at locations E={Ej}, j=0..m-1. For each Ej, we can construct polynomial Pj(Z) of degree m-1 that evaluates to 1 at Ej and to 0 at all other locations in E. From Sum Xi Pj(Zi) = 0 we get Xj=Sum[i!=j] Xi Pj(Zi) (note that the sum is actually over all known Xi). {Pj(Zi)} is in fact the decoding matrix. Computing it is O(nm)=O(km^2) (not O(km) as I put above).

I have that algorithm implemented in my ErrorZ project on GitHub: https://github.com/OlegMazurov/ErrorZ/blob/master/src/main/j... More advanced stuff, like decoding up to m-1 errors, can also be found there.

akalin · on Nov 20, 2017

Ah, okay thanks. My coding theory is a bit weak, but it looks like this algorithm depends on the specific form of RS codes that you mentioned. I think the presentation outlines a form of RS erasure codes which is more general (e.g., allows Cauchy parity matrices instead of Vandermonde-derived ones).

Using a general matrix inversion is indeed inefficient, and becomes impractical for large k. However, the special-purpose row reduction algorithm I mentioned gets to O(nm^2) which is much better, although worse than the O(nm) algorithm you mentioned, which makes sense given that that one handles more specific RS codes.

Thanks for elaborating! It's reminding me that I really should brush up on coding theory, though -- do you have any resources/textbooks you'd recommend for someone familiar with the mathematical background? (Finite fields, etc.)

omazurov · on Nov 21, 2017

I'm a bit rusty myself (all those small mistakes I made). I'm now more concerned with the engineering side of error correction (elegant implementations and more importantly a totally new class of dynamic error correction I have in another GitHub project). I can't recommend any good up-to-date textbook - haven't read any in long time.

akalin · on Nov 22, 2017

No problem. Do you happen to have a source for the algorithm you explained, though? I recognize the parity-check matrix, but the rest I can't seem to find in any of the usual coding theory texts; usually, they treat erasures as part of the full RS decoding process.

omazurov · on Nov 23, 2017

The source of the algorithm I described is my mind. It's so elementary that I believe it is omnipresent but disguised by more complicated math, usually due to other ways to define RS codes.

akalin · on Nov 23, 2017

I agree re. being disguised by complicated math. Although even if you think it's elementary, I encourage you to write it up somewhere, like a blog post. I'm sure there are people out there like me who would find it novel. :)

If I ever do use it or write about it, I'll make sure to cite you. Thanks again!

londons_explore · on Nov 19, 2017

I'd like to see FEC used on a TCP/IP replacement to handle lost packets without an extra roundtrip.

When you're downloading a 1MB webpage, splitting it into ~800 1kb ethernet packets, the chance that not a single packet gets lost on the internet where typical packet loss rates are ~0.5% is slim.

loeg · on Nov 19, 2017

I think typical packet loss rates are lower than 0.5% in some parts of the internet, and also, not necessarily normally distributed. That said, FEC is already used for very high latency networking. At some (low latency) point retransmit is faster than the hardware computing more FEC.

zokier · on Nov 19, 2017

Unless I misread, the slides do not touch fountain codes (RaptorQ etc) at all, which serms like a significant omission.

eternalban · on Nov 19, 2017

"were first published in 2004": https://en.wikipedia.org/wiki/Raptor_code

(OP slidedeck is also 2004)

[ps:]

presentation on Raptor codes 2005: http://algo.epfl.ch/_media/en/output/presentations/raptor-ba....

paper 2006: https://gnunet.org/sites/default/files/raptor.pdf

phonon · on Nov 19, 2017

LT codes were published in 2002...

https://en.wikipedia.org/wiki/Luby_transform_code

userbinator · on Nov 19, 2017

Those who have used Usenet to get binaries will be familiar with these, which work on the same principle: https://en.wikipedia.org/wiki/Parchive

loeg · on Nov 19, 2017

Indeed. I use par2 on my local tar backups, to protect against a few blocks corruption.

People may also be familiar with RAID5/RAID6/RAID-Z, which use erasure codes to protect data from drive loss.

DonbunEf7 · on Nov 19, 2017

How do the schemes laid out here compare with zfec (https://github.com/tahoe-lafs/zfec)?

walrus · on Nov 19, 2017

zfec is an implementation of Reed-Solomon coding, the class of codes discussed in the first part of the slides.

lrizzo · on Nov 19, 2017

To be precise zfec uses Vandermonde codes and it is derived from my old fec library from 1996-97. See http://info.iet.unipi.it/~luigi/research.html#fec (in 90's web style).