So much this. I've encountered many codebases (in science and in tech) where the...

toastal · on July 5, 2020

This comes up very often and is probably a big part of the distaste many people have for jQuery. You see so much copypasta $(selector) that queries the entire DOM over and over again instead of storing the intial query in a selector, querying children based on a ParentNode, etc.. This duplication is wasteful at best, and can hurt performance at worst.

But as others noted, this is usually the sign that the creator is either green, or puts little focus in furthering their programming because they normally do other things--not malice or carelessness.

jpxw · on July 5, 2020

I saw a post on here recently about the “proportionality of code” (I think this was the term used) - as in, how much one line of code translates to in terms of work for the machine. Python was used as an example, in contrast with Go (list comprehensions vs Go’s verbose syntax).

I think a similar line of thinking is applicable here. $ hides a lot of work behind short syntax. The syntax isn’t “proportional” to the work. Not only that, but the amount of work depends on the argument. Perhaps it’s better that we’re forced to put the effort in and type out “document.getElementById” - it makes us think about what we’re doing.

platz · on July 5, 2020

> I've rarely seen somebody critisized for copypasta or for overly stupid code.

Do you think that is in the realm of what the article is concerned with?

alephnil · on July 5, 2020

Code like you describe is of often the result when a program is written by someone that does not have programming as their main profession. I have seen code like you describe in code written by scientists (in other disciplines than computer science).

They may have very deep knowledge in their field, and have written a program so solve some problem they have, but are unfortunately not very good programmers. This often results in quite naive code that still try to solve an advanced problem.

In code written by professional programmers, I have seen the pattern described in the article far more often than the naive style you describe. After all, programmers are trained to avoid duplication and finding abstractions, and will often add one abstraction too much rather than one too little.

jonahx · on July 5, 2020

> but I've rarely seen somebody critisized for copypasta or for overly stupid code. Probably because we're too accidentially afraid to imply somebody can't code.

It's because it's a far more benign problem than too much abstraction.

Sure it's easy to poke fun at that code and lol at how the programmer can't even use the most basic kind of abstraction, but that code is still clear and easy to read. More importantly, it is trivial to fix that kind of error.

I would take code like this any day over code written by an experienced programmer too keen on abstraction.

klyrs · on July 5, 2020

    plot('graph1')
    plot('graph2')
    ....
    plot('graph100')

I've done a lot of that myself. What you might not be seeing is the for loop in a scripting language that was used to generate that text. It probably took less effort than looking up and implementing it the "right" way. It might make your eyes bleed but if you need to change "plot" to another function, that's just a find-and-replace-all away. Most importantly, the code works fine and doesn't actually need abstraction.

Sharlin · on July 5, 2020

Yes, writing a for loop in another language to generate code instead of just writing the same loop in the language you're already using? Common technique, nothing wrong with it whatsoever.

klyrs · on July 5, 2020

Yes, a lot of scientists use their computers in ways that horrify software developers. For example, learning exactly enough of a compiled language to do some wicked fast integer / floating point arithmetic, and not bothering to waste time on the mundane crap you find obvious. And that might mean falling back to a familiar language that makes string formatting easy.

If it ain't broke, don't fix it.

zbentley · on July 5, 2020

> If it ain't broke, don't fix it.

But scientific programming is deeply broken. Code presented along with publications often doesn't work, or is an incomplete subpart/toy example that's supposed to be invoked within some larger framework. That sounds great until you realize that "some larger framework" doesn't refer to a standardized tool, but some deeply customized setup (a la the one you're responding to, that uses e.g. ad hoc code generators across two--or sometimes more--languages because the original authors didn't know how to format a string in one of them).

Even if you do get lucky enough to find a paper with all requisite code included, in many cases it was only ever invoked on extensively customized, hand-configured environments. And that configuration was done by non tech folks with a "just get it to where I can run the damn simulation" attitude, so configs are neither documented nor automated. And when I say configs, I'm talking about vital stuff--e.g. env vars that control whether real arithmetic or floating point is used.

Often as not, you hack your way to try to get something--anything--running, and it either fails catastrophically or produces the wrong result. Now you have to figure out which of several situations you're in: is the research bad? Were the authors just so non-technical they accidentally omitted a vital piece of code? Was the omission deliberate and profit-motivated (e.g. the PI behind the paper plans on patenting some of the software at some point, so didn't want to publish a special sauce)? Was the omission deliberate and shame-motivated (i.e. researchers didn't want to publish their insane pile of hacks written to backfill an incomplete understanding of the tools being used)? Is it an environment-dependent thing?

And all of that is just as pertains to code in published work--usually the higher-quality stuff. Assuming ownership of in-house code from other scientific programmers is much, much worse.

This isn't abstract moaning about best practices. The failure of labs, companies, publications, and universities to combat this phenomenon has direct, significant, and negative effects on the quality of research and scientific advancement in many fields.

TL;dr it is "broke". When programmers complain about reproducibility crises in soft-science fields, they're throwing rocks from glass houses.

klyrs · on July 5, 2020

You're bringing in a whole host of issues inapplicable to the snippet OC found questionable. Don't disagree with ya, but "lack of obvious abstraction" isn't one of these "extreme sensitivity to environment vars" cases.

In fact, vociferously complaining about such cases is a great way to turn scientists away from code review as a concept. Fold the code away in your head (or edit your local copy), and dig for subtle issues like numerical sensitivity, environment, etc. That's the way to bring actual value to the process.

For the code in question, "oh by the way this can be done simpler", with the simplified snippet, is an appropriate approach to the review. But in my experience it's best to save your breath for actual problems.

humbledrone · on July 5, 2020

> the code works fine and doesn't actually need abstraction

Well, maybe it works fine. We didn't see the other 97 lines to verify that they actually include all the integers from 3-99 without skipping or duplicating any. (NB with a loop this verification would be trivial.)

klyrs · on July 5, 2020

Maybe they deleted 57 because it triggers an edge case. Put it back if you dare. ;)

(no, that's the bad kind of tech debt that's unfortunately common and I actually hate)

_y5hn · on July 5, 2020

This is fine for code that belongs in the trash, ie. just testing stuff, prototypes, debugging, learning the language/framework, etc.