Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
All About Erasure Codes (2004) [pdf] (utk.edu)
44 points by Tomte on Nov 18, 2017 | hide | past | favorite | 18 comments


Using a general matrix inversion algorithm to decode Reed-Solomon erasures is an atrocity. The only difference between encoding and decoding erasures is that we restore m=n-k code word values at static locations, k+1..n, when encoding, and m code word values at dynamic locations when decoding. In both cases, computing the matrix is O(mk) - much faster than matrix inversion.


Can you elaborate more on the computation of the decoding matrix? Is it basically a special-purpose row reduction, taking advantage of the fact that the matrix to be inverted has a bunch of rows from the identity matrix? (I think it is, but would be curious if there's another algorithm out there!)


If RS(n,k) code is defined as Sum[i=0..n-1] Xi Zi^j = 0, j=0..m-1, m=n-k, where {Xi} are codewords and {Zi} are locators, then for any polynomial P(Z) of degree less than m we have Sum Xi P(Zi) = 0.

Now, let's erase m values at locations E={Ej}, j=0..m-1. For each Ej, we can construct polynomial Pj(Z) of degree m-1 that evaluates to 1 at Ej and to 0 at all other locations in E. From Sum Xi Pj(Zi) = 0 we get Xj=Sum[i!=j] Xi Pj(Zi) (note that the sum is actually over all known Xi). {Pj(Zi)} is in fact the decoding matrix. Computing it is O(nm)=O(km^2) (not O(km) as I put above).

I have that algorithm implemented in my ErrorZ project on GitHub: https://github.com/OlegMazurov/ErrorZ/blob/master/src/main/j... More advanced stuff, like decoding up to m-1 errors, can also be found there.


Ah, okay thanks. My coding theory is a bit weak, but it looks like this algorithm depends on the specific form of RS codes that you mentioned. I think the presentation outlines a form of RS erasure codes which is more general (e.g., allows Cauchy parity matrices instead of Vandermonde-derived ones).

Using a general matrix inversion is indeed inefficient, and becomes impractical for large k. However, the special-purpose row reduction algorithm I mentioned gets to O(nm^2) which is much better, although worse than the O(nm) algorithm you mentioned, which makes sense given that that one handles more specific RS codes.

Thanks for elaborating! It's reminding me that I really should brush up on coding theory, though -- do you have any resources/textbooks you'd recommend for someone familiar with the mathematical background? (Finite fields, etc.)


I'm a bit rusty myself (all those small mistakes I made). I'm now more concerned with the engineering side of error correction (elegant implementations and more importantly a totally new class of dynamic error correction I have in another GitHub project). I can't recommend any good up-to-date textbook - haven't read any in long time.


No problem. Do you happen to have a source for the algorithm you explained, though? I recognize the parity-check matrix, but the rest I can't seem to find in any of the usual coding theory texts; usually, they treat erasures as part of the full RS decoding process.


The source of the algorithm I described is my mind. It's so elementary that I believe it is omnipresent but disguised by more complicated math, usually due to other ways to define RS codes.


I agree re. being disguised by complicated math. Although even if you think it's elementary, I encourage you to write it up somewhere, like a blog post. I'm sure there are people out there like me who would find it novel. :)

If I ever do use it or write about it, I'll make sure to cite you. Thanks again!


I'd like to see FEC used on a TCP/IP replacement to handle lost packets without an extra roundtrip.

When you're downloading a 1MB webpage, splitting it into ~800 1kb ethernet packets, the chance that not a single packet gets lost on the internet where typical packet loss rates are ~0.5% is slim.


I think typical packet loss rates are lower than 0.5% in some parts of the internet, and also, not necessarily normally distributed. That said, FEC is already used for very high latency networking. At some (low latency) point retransmit is faster than the hardware computing more FEC.


Unless I misread, the slides do not touch fountain codes (RaptorQ etc) at all, which serms like a significant omission.


"were first published in 2004": https://en.wikipedia.org/wiki/Raptor_code

(OP slidedeck is also 2004)

[ps:]

presentation on Raptor codes 2005: http://algo.epfl.ch/_media/en/output/presentations/raptor-ba....

paper 2006: https://gnunet.org/sites/default/files/raptor.pdf


LT codes were published in 2002...

https://en.wikipedia.org/wiki/Luby_transform_code


Those who have used Usenet to get binaries will be familiar with these, which work on the same principle: https://en.wikipedia.org/wiki/Parchive


Indeed. I use par2 on my local tar backups, to protect against a few blocks corruption.

People may also be familiar with RAID5/RAID6/RAID-Z, which use erasure codes to protect data from drive loss.


How do the schemes laid out here compare with zfec (https://github.com/tahoe-lafs/zfec)?


zfec is an implementation of Reed-Solomon coding, the class of codes discussed in the first part of the slides.


To be precise zfec uses Vandermonde codes and it is derived from my old fec library from 1996-97. See http://info.iet.unipi.it/~luigi/research.html#fec (in 90's web style).




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: