Great stuff, and I love the concrete example of the ZK failure due to error logging -- a classic cascading failure mode. While it's true that I'm an inveterate disaster porn addict[1] and would therefore love this regardless, I think that Nathan's piece serves as a model in that it speaks to learning from failure rather than gloating about nascent success -- we collectively need much more of this! I also like that Nathan doesn't romanticize other engineering domains, as naive software engineers are wont to do; other engineering domains also struggle with failure -- it's just that their failures are so much more public (and so much more likely to involve loss of property and/or life) that they cannot evade collective introspection the way software engineering so frequently seems to. Very much looking forward to Part 2!
I've enjoyed your talk, thanks for posting. One thing I'd like to know though: As someone who's optimizing his debugging skills and environment so thoroughly as you, it surprised me that you love javascript. Don't get me wrong, obviously it has some of the best tooling thanks to its abundance, but doesn't it bug you that it tends to fail silently? I feel that there are quite a few error classes that need to be caught by unit tests in case of JS, where in languages with more rigid type systems (such as python) it gets caught as an exception right on the first run. Or is it that this uneasy feeling about everything you do in JS is what has spawned a culture of more thorough unit testing, such that at the end you're better off?
The things that you need to unit test even when you have static typing typically overlap tests that will detect type errors as well. The fact that there is no static typing also puts a bit more fire under your butt to test things.
[1] http://www.infoq.com/presentations/Debugging-Production-Syst...