After 9 years of maintaining several dozen legacy codebases (which sometimes involved financial math) I am fully convinced that the only people who like terse code are people who write it and never read it after.
> Every abbreviation adds to the cognitive load the person has to maintain in their head.
I think your claim that abbreviations are always more cognitive load is wrong. For the sake of argument, let's take it at face value though. These are not even code abbreviations. They are literal translations to the math notation. The input "x" is not an abbreviation, it's a well defined value in neural network terminology.
If they called it something more descriptive like "input_vector_to_first_layer_of_neural_net", this would be more cognitive load because someone reviewing this now needs to mentally map this to "x" in the algorithm anytime they are reviewing both.
Now, to your claim itself, I think it's unfair to say abbreviations are always more cognitive load. These variables are significantly more approachable because they follow convention. I see "x" and I know what that is. If every individual developer went ahead and rewrote every neural net with something they viewed as more interpretable in their personal context, it'd be a lot harder to understand what is going on for everyone. The variable itself may be longer, but I might actually need a prose explanation of the variable's purpose because now I don't have the context of well established naming convention.
Are you suggesting that we should write all of our mathematical expressions using prose, and stop using symbols for operators?
That would be the original approach historically (before the past 500 years), e.g. for the quadratic formula, Brahmagupta (628 CE):
> To the absolute number multiplied by four times the square, add the square of the middle term; the square root of the same, less the middle term, being divided by twice the square is the value.
Go ahead and do what you like, but I doubt you’ll find many publishers who will accept your paper in the 21st century.
I know which one imposes more cognitive load for me. But disclaimer: I spent a lot of time from age 5–20 working with mathematical notation.
I am suggesting that just because software does "math" doesn't change how people read it. Bad coding practices like cryptic variable names and repetitions will affect it exactly in the same ways they affect software in all other domains.
> just because software does "math" doesn't change how people read it.
It absolutely does. Different problem domains (and different communities’ treatment of problems) involve differing types and amounts of formal structure, differing conventional notations, etc., and in practice the code looks substantially different (in organization, abstractions used, naming, ...) even if you try to standardize it to all look the same.
People who are reading “math” code can be expected to understand mathematical notation, e.g. to be capable of reading a journal paper where an algorithm or formula is described more completely including motivation, formal derivation, proofs of correctness, proofs of various formal properties, ...
Mathematical code is often quite abstract; unlike company-specific business logic, the same tool might be used in many different contexts with inputs of different meanings. There really isn’t that much insight gained by replacing i with “index”, x with “generic_input_variable”,
or to take it to an extreme, ax + b with add(multiply(input_variable, proportionality_constant), offset_constant)
or sin(x) with perpendicular_unit_vector_component_for_angle(input_angle_measure)
The extra space overhead of the long variables and words instead of symbols is a killer for clarity.
If variable names are “cryptic” [as in, can’t be guessed at a glance by someone working in the field] then that is indeed a failure though. Short variable names should have limited scope (ideally fitting on one screenfull of code) and obvious meaning in context, which might involve some explanatory comments, links to a paper.
>> People who are reading “math” code can be expected to understand mathematical notation, e.g. to be capable of reading a journal paper where an algorithm or formula is described more completely including motivation, formal derivation, proofs of correctness, proofs of various formal properties, ...
The majority of machine learning papers are very well stocked in terms of heavy mathematical-y notation, but are very, very low on formal derivation, proofs of correctness, proofs of anything like formal properties, or even motivation ("wait, where did this vector come from?"). Most have no theoretical results at all- only definitions.
So let's not overdo it. The OP is making a reasonable demand: write complex code in a way that makes it easily readable without being part of an elite brotherood of adepts that know all the secret handshakes and shibboleths.
A great deal of complexity could be removed from machine learning papers by notating algorithms as algorithms rather than formulae. For example, you can say exactly the same thing with two "for i to j" and two summations with top and bottom indices. Sometimes the mathematical notation can be more compact- but when your subscripts start having subscripted superscripts, it's time to stop and think what you're trying to do.
Besides- the OP did talk about code not papers. Code has to be maintained by someone, usually someone else. Papers, not so much.
If you're working in a domain, is it really that much to ask to become familiar with it? Especially if the domain has a large theoretical component.
When we teach people software engineering we teach them concepts like "give your variables meaningful names". Now that we're in sub-domain of implementing some mathematics in software, I'd argue that matching the variables and functions to their source (more or less) _is_ exactly "giving your variables meaningful names".
> A great deal of complexity could be removed from machine learning papers by notating algorithms as algorithms rather than formulae
And you would immediately lose the ability to quickly and easily recognise similar patterns and abstractions that mathematical notation so fluently allows.
>> If you're working in a domain, is it really that much to ask to become familiar with it? Especially if the domain has a large theoretical component.
If the domain has a large theoretical component. Here, we're talking about statistical machine learning and neural networks in particular, where this is, for the most part, not at all the case.
>> And you would immediately lose the ability to quickly and easily recognise similar patterns and abstractions that mathematical notation so fluently allows.
I disagree. An algorithm is mathematical notation, complete with immediately recognisalbe patterns and abstractions (for-loops, conditional blocks, etc).
And, btw, so is computer code: it is formal notation that, contrary to mathematical formulate that require some familiarity with the conventions of a field to read and understand, has an objective interpretation- in the form of a compiler for the language used.
So machine learning papers could very well notate their algorithm in Python, even a high-level abstraction (without all the boilerplate) of the algorithm, and that would actually make them much more accessible to a larger number of people.
Mathematical notation, as in formulae, is not required- it's a tradition in the field, but that's all.
However, that's a bit of a digression from the subject of the naming of variables. Apologies. It's still relevant to the compreshensibility of formal notation.
In code that starts something like "/* implements gambler et. al 2019 (doi:xxxxxxx) eqn 3 ... */". Then I really expect the code to go to great lengths to match the notation used in the paper. Anything else is adding to the cognitive load.
The exception is if the entire algorithm is discussed in the comments of the code without outside reference, then I want the code and comment to be extremely consistent.
Personally, I like the former as a shorthand for "don't mess with this without the paper in front of you, you'll probably screw it up".
In the context of the equation, having long/non-matching variable names ads to the overhead. I find it orders of magnitude easier to read some Julia code examples that use matching symbols than I do the equivalent thing in python with full variable names.
If I'm implementing some complex equation I try to match the symbols as much as possible and keep the representation compact because it means when I (or my teammates) come back to the code they can see the whole complex thing in one go and easily recognise the equation it comes from.
Personally, I stick to using a capital letter followed by an "s" for a list and the same capital letter alone for an element of a list, as above, though that is by no means a universal convention.
It's also common to use variables I,J,K,N,M for numbers and P,Q,R for predicate symbols (that can be passed as arguments to predicates, in Prolog). If the code sticks to the same convention throughout, it gets much easier to read than having to come up with special names for each variable ("Index", "Counter", "Next_value", "Length", etc).
And, if I remember correctly, the same kind of convention is common in Haskell, where I understand you can actualy "dash" variables (as in x, x').
"set_to_random_numbers" is much less descriptive than np.random.normal...the former doesn't tell me what distribution the random numbers come from. Agreed that the weights should be stored in a list or the like in real life rather than duplicating code for each layer, but the examples in the blog post are clearly intentionally minimizing abstractions as much as possible so that an untrained reader can immediately tell exactly what each line is doing. It's obviously not intended to scale to larger models.
If you see that full-word variable names create too much clutter, it means your code structure is wrong. So you fix it.
You go from this:
To this: After 9 years of maintaining several dozen legacy codebases (which sometimes involved financial math) I am fully convinced that the only people who like terse code are people who write it and never read it after.