Knuth really is a fan of writing monolithic (rather than "modular") programs from scratch, in a way that goes against all the experience of software engineering accumulated over decades, so that criticism is well-deserved.
For example, his big programs TeX (1982) and METAFONT (1984) are each book-length and the source code of each is in a single large file amounting to about 20000+ lines of Pascal code. His programs do not contain much in the way of standard software-engineering practices like abstraction, modules (hiding implementation behind an interface), unit tests, libraries, etc. In fact, he has spoken out against unit tests and code reuse! [1]
> the idea of immediate compilation and "unit tests" appeals to me only rarely, when I’m feeling my way in a totally unknown environment and need feedback about what works and what doesn’t. Otherwise, lots of time is wasted on activities that I simply never need to perform or even think about. Nothing needs to be "mocked up." ...
> With the caveat that there’s no reason anybody should care about the opinions of a computer scientist/mathematician like me regarding software development, [...] I also must confess to a strong bias against the fashion for reusable code. To me, "re-editable code" is much, much better than an untouchable black box or toolkit. I could go on and on about this. If you’re totally convinced that reusable code is wonderful, I probably won’t be able to sway you anyway, but you’ll never convince me that reusable code isn’t mostly a menace.
Moreover, his sympathies always lay with the "other" side of the "structured programming" revolution (he still liberally uses GOTOs, etc -- still coding like a 1950s/1960s machine code programmer), and in his 1974 paper "Structured Programming With Go To Statements", he approvingly quotes something that might horrify many software engineers today:
> In this regard I would like to quote some observations made recently by Pierre-Arnoul de Marneffe:
> In civil engineering design, it is presently a mandatory concept known as the "Shanley Design Criterion" to collect several functions into one part . . . If you make a cross-section of, for instance, the German V-2, you find external skin, structural rods, tank wall, etc. If you cut across the Saturn-B moon rocket, you find only an external skin which is at the same time a structural component and the tank wall. Rocketry engineers have used the "Shanley Principle" thoroughly when they use the fuel pressure inside the tank to improve the rigidity of the external skin! . . . People can argue that structured programs, even if they work correctly, will look like laboratory prototypes where you can discern all the individual components, but which are not daily usable. Building "integrated" products is an engineering principle as valuable as structuring the design process.
> ... Engineering has two phases, structuring and integration: we ought not to forget either one...
(This comment is slightly tongue-in-cheek, but hopefully provocative enough.)
[0]: Hey it's been a couple of hours and there's no reply attacking my comment, guess I better do it myself. :-)
That's extremism, but I think we need people like that. Nowadays, it seems that we forgot how to write monolithic programs from scratch. I think we went too far with code reuse. See the LeftPad npm fiasco.
Knowing that the one who is possibly the most respected person in the field of computer science has an opinion that goes against the current trends gives us perspective.
In the same way, I don't fully agree with Richard Stallman's activism and Linus Torvald's famous rants, but I'm glad there are people like that to shake things up.
> Knowing that the one who is possibly the most respected person in the field of computer science has an opinion that goes against the current trends gives us perspective.
Computer science is not software engineering. You also don't get physicists and mathematicians determine engineering best practices for precisely the same reason.
For example, the reference to Shanley's design principle is not only wrong in it's core (aerospace engineering instead of civil engineering) it also completely misses the underlying design requirements and constraints. More precisely, aerospace pays a premium for weight, thus design requirements emphasize the need to optimize structural systems wrt structural weight. This design choice favours operational economy at the expense of production costs, maintainability, robustness, and simplifying analysis.
None of this applies to software development, or even civil engineering structures.
Knuth also uses assembly in his book on algorithms.
But generally, algorithms researchers seem to not care about abstractions, as witnessed by TeX and LaTeX in multiple ways.
That's probably because when you really need to invent a fancy algorithm (which is their job), that will often not be built out of reusable components.
> "...goes against all the experience of software engineering accumulated over decades... standard software-engineering practices like abstraction, modules (hiding implementation behind an interface), unit tests, libraries, etc."
You need to be careful not to confuse experience and common practice with empirically proven benefit. Many of the practices that you mention are intended to increase the feasibility (perceived feasibility!) of industrial reuse, and/or division of labor, not to make software better in any other dimension (reliability, code size, power/time/space efficiency, fitness for purpose.)
> Many of the practices that you mention are intended to increase the feasibility (perceived feasibility!) of industrial reuse, and/or division of labor, not to make software better in any other dimension (reliability, code size, power/time/space efficiency, fitness for purpose.)
Actually, nowadays the main driving force for abstractions and modularity is to make the code easier to test, understand, and refactor.
Furthermore, "power/time/space efficiency, fitness for purpose" are concerns that don't really apply to software development. In general the main resource is labour, and all other requirements are secondary (computational reaources, latency, etc)
My personal experience would point to knuth being in essence right on at least one point: The obsession with code reuse is truly unwise; re-editable code is much more important. That's not to say reuse doesn't have it's place; nor that modularity is of no value. Instead, in the name of reuse people almost invariably write code that's too complicated, with opaque yet leaky abstractions that cause the code to be much more brittle, harder to maintain, and often has unfortunate non-functional consequences like having unexpected and unpredictable performance or security gotchas that require one to understand too much of the internals anyhow.
Part of the problem is this almost hero-worshipping nature of underlying libs we rely on; and part of the problem is one of perception: even in today's reuse-fetishized culture, likely almost all code is of the non-reused kind; yet in any given program we see huge amounts of reused code imported from e.g. package managers - because those reused bits are often reused a lot.
We'd be much better at reuse if we were a little more sskeptical of it, and didn't assume that design rules that hold for code that's been packaged for reuse also hold for the more pedestrian but nevertheless very common code that is not currently being reused. We want strict, leak-free abstractions ideally covering both functional and non-functional aspects for the reusable bits; but where we cannot do that or cannot yet afford to, we want the opposite: better clearly transparent code than a mess of leaky abstractions.
By the same token we don't copy and paste code enough. Sometimes a good abstraction is elusive, yet a pattern is still recognizable and useful. We have language features and a culture surrounding directly reusable code, but no such habits for derivative code, even though that would be quite useful. Essentially: I'm perfectly happy to deal with people using some stackoverflow answer to write code, but as soon as people do, it's like we regress into the dark ages: there's no structured citation, no support for detecting updates, no "package manager" that tracks updates (for stackoverflow clearly doesn't do that), and no diff or whatever to show how you tweaked the code segment. So instead people all too often just throw future maintainers under the bus with some random code-golfed answer, or some "reused" library that is hardly much easier to use than the reimplementation, with much harder to spot gotchas and perf/security issues, and often an API that isn't actually convenient for your use case.
So yeah: software needs reusable components, but 99% of the code you write should be re-editable, not reusable; and 100% of the time you should aim for re-editability, and only ever grudgingly accept the need for reusability after multiple use-cases are found (not just 2 or 3!), and there is a leak-free abstraction possible, and you've considered things like perf and debuggability and security.
"... you find only an external skin which is at the same time a structural component and the tank wall ... Engineering has two phases, structuring and integration: we ought not to forget either one..."
That could be interpreted in non-horrifying ways. Sure, the cost of a function call (especially with a modern compiler that can intelligently inline) is negligible in most cases compared with the engineering benefits it offers.
But what about cases where it's less clear? In databases, there are constantly cycles between:
Cycle phase 0: Abstract Storage: separate storage from computation to make the database system easier to test, replace, and reconfigure.
Cycle phase 1: Push computation down to storage: wow, look how much better performance we can get if we intermingle storage and computation!
The interesting thing about this is that TeX continues to be reused 45 years after its inception; new libraries show up on CTAN regularly, and of course TikZ and LaTeX are far from being in the original design of TeX. Very few libraries have shown the survivability and versatility of TeX. So maybe the way we think about software reuse is wrong.
You seem to be claiming that 20,000 lines of code constitutes a large program. Is that really your intention? I mean libjpeg is 34,000 lines of code, and LAPACK 3.6.0 (one of the very few libraries that excels TeX in reusability and enduring value) is 685,000 lines of code, and each of them is just a small part of many programs. I would instead describe the monolithic parts of TeX and METAFONT as small programs of only 20,000 lines of code, omitting even dependencies on libraries.
Yeah as I said it was a tongue-in-cheek comment and I don't believe it, just wanted to provoke some discussion. :-) But in any case what I meant is that TeX/MF are among his biggest programs (that I know of), not that 20000 lines is a large program (he calls it "medium" IIRC).
(Ironically, in my previous job supposedly using "modern" programming practices, a single Python file had organically grown to over 25000 lines in length and people complained to GitHub about the file not being rendered in full in the browser.)
> The interesting thing about this is that TeX continues to be reused 45 years after its inception; (...) So maybe the way we think about software reuse is wrong.
Incidentally you got it completely backwards. TeX is used because it's a convenient interface between higher level descriptions (i.e., book content) and the lower level output (pretty document formats). Thus, once again abstractions and interfaces show their value.
Additionally, TeX is used rarely by humans, while LaTeX is the tried and true workhorse. Again, an abstraction that targets a interface.
And how many TeX and LaTeX reimplementation a are there? Again, the interface and abstractions show their value.
> TeX is used because it's a convenient interface between higher level descriptions…
I think you mean LaTeX, not TeX. (LaTeX is a macro layer on top of TeX that provides these convenient interfaces, while TeX is a low-level typesetter.)
> Additionally, TeX is used rarely by humans…
This sounds a bit contradictory with the previous statement, but maybe you mean that TeX is rarely used directly by many people (without LaTeX or some other macro layer). In any case, the LaTeX macros are implemented in TeX, so the TeX program is always the one being used (which I think was the point of the poster you're replying to).
> And how many TeX and LaTeX reimplementation a are there?
I didn't understand the meaning or point of this, as the answer is either "very few" or "many" depending on what you're counting. Extensions of the TeX program include eTeX, pdfTeX, XeTeX and LuaTeX, not to mention a few others like pTeX and upTeX. (Confusingly, when we say “TeX” we often mean one of these programs as well, as a lot of their code comes from TeX—they are written/implemented as patches (changefiles).) Reimplementations of a small part of the TeX/LaTeX syntax for mathematical expressions (only) include MathJax and KaTeX. Are these a lot, or hardly any (would have expected a lot more)? Depends on your perspective I guess.
It's probably worth pointing out that nobody uses TeX bare without a LaTeX-like macro library. Knuth wrote his books with a macro library confusingly called "plain TeX", which ships with the TeX language interpreter. (That interpreter was the subject of this thread.) LaTeX relies on the plain TeX library, just as, for example, GLib relies on the C standard library. The third alternative popular TeX macro library for document formatting, other than plain TeX and LaTeX, is a thing called ConTeXt. It seems to me that ConTeXt is less popular than LaTeX, but more popular than plain TeX.
But of course to invoke any of these you have to write code in TeX, just as to invoke Rails you have to write code in Ruby, or to invoke Numpy you have to write code in Python.
As far as reimplementations, I think the only reimplementations were done at Stanford in the late 1970s as Knuth and his students wrote a series of prototypes, culminating in the TeX language we know today in 1983.
It is absurd that my repeated, informed rebuttals of this rudely-phrased nonsensical misinformation are being flagged, so that visitors to the site will see only the aggressive misinformation and not the corrections. What kind of site are these people trying to turn this into?
This is not just a "poor me, I am being persecuted" issue. We can let Hacker News turn into Twitter, with insult contests being resolved by flagging campaigns that eventually hellban the accounts of the less-popular side of any issue, or into YouTube, dominated by conspiracy theories and hate; or we can stand up for reasoned discussion and informed comment.
Svat's reply is of course excellent and highly commendable, but they did not post it until after I posted my cri de cœur above. The thread consisted of [svat's reasonable comment], [my reasonable reply], [svat's reasonable reply], [troll post with reckless disregard for the truth], [flagged], [flagged], where the [flagged] posts were the ones where I was correcting the misinformation.
Also, note that the person who posted the misinformation never bothered to thank svat for their careful and courteous correction and their meticulous attempt to try to DWIM the troll comment into something that made some amount of sense. This makes more sense on the assumption that they were trolling than on the assumption that they were merely misinformed — someone who was interested in the truth would surely have offered thanks for such a helpful and polite correction. Instead, it seems clear that they were only posting here to make trouble and waste the time of reasonable, knowledgeable people like svat — particularly in light of https://news.ycombinator.com/item?id=22424205.
I had already come to that conclusion because the comment was clearly based on a set of misconceptions about what TeX and LaTeX are, and how they relate to each other, which could have been corrected by reading the introductory paragraphs of the Wikipedia article for either TeX or LaTeX. If someone isn't investing even that level of care in their comments, they don't care whether they're talking nonsense or not.
We can see that the same account continues to post aggressive comments rudely attacking other users, although since I'm not familiar with the areas they're talking about, I can't tell if they're employing the same reckless disregard for the truth in these cases: https://news.ycombinator.com/item?id=22501626 "I don't see the point of your post, and frankly sounds like nitpicking."
https://news.ycombinator.com/item?id=22499637 "Don't you understand where and why are there abstractions? ... having people [referring to the person he's replying to] naively complain"
Maybe you think this is the kind of dialogue we should be encouraging on here, but I don't. I think that in addition to setting an example of better behavior, as svat did, we should explicitly call out such misbehavior and explain why it isn't desirable, as I did.
Of course that's the case, but thankfully we have you, kragen, to thank for your work in accusing others of being ignorant and clueless while knowing nothing at all about anyone or anything, and adding nothing of value to the discussion in the process.
In cases of aggressive trolling like those statements, I think it's sufficient to point out that they're obviously wrong and assume that people will then check Wikipedia before relying on them. That doesn't work if my contradiction gets flagged, however.
Modularity incurs a complexity tax. If you are smart enough to keep it all in your head, like a Motie Engineer, you can simplify by omitting it. But if you're not...
That's the thing here. Knuth can easily juggle far more complexity in his head than the average programmer, and that's fine for software that only he has to maintain. But when something needs to be maintainable by average programmers, you need to write for them and avoid that complexity.
Who's writing for the average programmers? It's the average programmers. How can average programmers write code that is sufficiently abstract that it hides the complexity they can't handle? How can they change that code when it doesn't work? Does every company of 5 devs need to have a PhD?
> How can average programmers write code that is sufficiently abstract that it hides the complexity they can't handle?
Well said.
Hiding complexity is harder than handling it. Designing an effective way to expose it for use amounts to (part of) a "programming system" [Brooks].
So it's library writers who hide the complexity for average programmers, with a "programming product": stdlib, open source project or (rarer these days) commercial "engine".
When there is no such library, we get the current state of our art: many projects failing.
I think that's the point of literate programming. You no longer have to keep it all in your head. You have an article or a book. You go to the chapter that interests you, and modify it. Now those modifications get scattered throughout the monolith.
The point of literate programming is that you have a text which is structured in a way that humans are good at dealing with, which is not some nifty mathematical abstraction like functional or oop code nor coherent modules.
(Having worked with a badly structured monolith with a mixture of literal code and a million bad abstractions, I must admit the simplicity of working with the used-once literal code has its advantages. But so does the more abstract code. The only thing I know is that I want to control my entry point rather than go via a framework and suffer my concrete classes getting upcasted.)
But always keep in mind Gene Spafford's observation: Debugging is twice as hard as programming. So, if you write a program as cleverly as you can, you are by definition not smart enough to debug it.
"For example, his big programs TeX (1982) and METAFONT (1984) are each book-length and the source code of each is in a single large file amounting to about 20000+ lines of Pascal code."
He also used six-character identifiers. He had to. TeX and Metafont were intended to be compiled by lowest common denominator Pascal compilers. Modules, etc., were all vendor specific extensions at the time.
One might consider that to be good engineering.
"If you cut across the Saturn-B moon rocket, you find only an external skin which is at the same time a structural component and the tank wall."
Soda cans and plastic water bottles are the same; it's how they are cheap enough to be fit for for purpose.
Agreed; I just didn't want to continue arguing with myself and post a counterpoint to the counterpoint; glad others are doing it :-) (Actually I posted a comment along the same lines in another thread: https://news.ycombinator.com/item?id=22409822) BTW one of the features of WEB is/was that it allows arbitrary-length macro names, which mitigates against the Pascal requirement that only the first 8 letters of identifiers count: https://texdoc.net/pkg/webman)
"the idea of immediate compilation and "unit tests" appeals to me only rarely, when I’m feeling my way in a totally unknown environment and need feedback about what works"
I've had to check myself and indeed we have different ideas about the meaning of "speaking against unit tests".
Did you know about all the tests he wrote and run for TEX? There's a book about it as well.
For TeX he has written a single large test called the TRIP test: described in "A torture test for TeX" (http://texdoc.net/pkg/tripman). This is a conformance test (to weed out bad reimplementations), written after TeX was written and debugged. (See https://texfaq.org/FAQ-triptrap for details.) Apart from this report there isn't a separate book about testing TeX specifically, though there is a TeX manual (The TeXbook, Volume A of Computers and Typesetting), the TeX source code as a book (Volume B, roughly equivalent to this: http://texdoc.net/pkg/tex), and there's also a nice paper called The Errors of TeX (https://yurichev.com/mirrors/knuth1989.pdf), supplemented by a changelog (http://texdoc.net/pkg/errorlog).
By "unit tests" Knuth is speaking of tests that each test just one part of the program in isolation, mocking out the rest, and also more broadly about the practice of "test-driven development" which has this as its prerequisite. He means he finds it simpler to just write the whole program up front, and test it wholesale.
I learned Pascal, programming logic based on invariants, WP, ... and algorithms with Pierre-Arnoul de Marneffe between 1994 and 1998. That was a very good time and proved so interesting and useful. I knew he had someting to do with Knuth and Oberon.
Knuth really is a fan of writing monolithic (rather than "modular") programs from scratch, in a way that goes against all the experience of software engineering accumulated over decades, so that criticism is well-deserved.
For example, his big programs TeX (1982) and METAFONT (1984) are each book-length and the source code of each is in a single large file amounting to about 20000+ lines of Pascal code. His programs do not contain much in the way of standard software-engineering practices like abstraction, modules (hiding implementation behind an interface), unit tests, libraries, etc. In fact, he has spoken out against unit tests and code reuse! [1]
> the idea of immediate compilation and "unit tests" appeals to me only rarely, when I’m feeling my way in a totally unknown environment and need feedback about what works and what doesn’t. Otherwise, lots of time is wasted on activities that I simply never need to perform or even think about. Nothing needs to be "mocked up." ...
> With the caveat that there’s no reason anybody should care about the opinions of a computer scientist/mathematician like me regarding software development, [...] I also must confess to a strong bias against the fashion for reusable code. To me, "re-editable code" is much, much better than an untouchable black box or toolkit. I could go on and on about this. If you’re totally convinced that reusable code is wonderful, I probably won’t be able to sway you anyway, but you’ll never convince me that reusable code isn’t mostly a menace.
Moreover, his sympathies always lay with the "other" side of the "structured programming" revolution (he still liberally uses GOTOs, etc -- still coding like a 1950s/1960s machine code programmer), and in his 1974 paper "Structured Programming With Go To Statements", he approvingly quotes something that might horrify many software engineers today:
> In this regard I would like to quote some observations made recently by Pierre-Arnoul de Marneffe:
> In civil engineering design, it is presently a mandatory concept known as the "Shanley Design Criterion" to collect several functions into one part . . . If you make a cross-section of, for instance, the German V-2, you find external skin, structural rods, tank wall, etc. If you cut across the Saturn-B moon rocket, you find only an external skin which is at the same time a structural component and the tank wall. Rocketry engineers have used the "Shanley Principle" thoroughly when they use the fuel pressure inside the tank to improve the rigidity of the external skin! . . . People can argue that structured programs, even if they work correctly, will look like laboratory prototypes where you can discern all the individual components, but which are not daily usable. Building "integrated" products is an engineering principle as valuable as structuring the design process.
> ... Engineering has two phases, structuring and integration: we ought not to forget either one...
(This comment is slightly tongue-in-cheek, but hopefully provocative enough.)
[0]: Hey it's been a couple of hours and there's no reply attacking my comment, guess I better do it myself. :-)
[1]: http://www.informit.com/articles/article.aspx?p=1193856