Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The comment:code ratio is higher than anything I write or that I’ve seen.

However, it does give me some comfort. When it’s not gamed, do other HNers also feel that a high comment:code ratio probably indicates quality?

There are reasons why this may be the case. (More thought, more time and a large team etc)

I don’t advocate using this measure to reward anyone because it would be gamed immediately.



I think it's situationally useful. If I take the author of the OP code at their word, this is one of those situations.

Core, critical plumbing/logic at the kernel of business critical, long-lived applications, will be the source of my stress-dreams long into the twilight years of my life; in the form of a lack of documentation and a presence of organic growth.

To criticize myself quite bluntly: If the core code I worked on at work looked like this, I'd feel a great deal more comfortable in some of the changes/digging that inevitably arises.

I would never use it as an absolute metric; but I'd use the level of comfort e.g. a new dev feels when looking at something that might otherwise be a spiderweb and saying "Oh this makes sense" (As I do when looking at OP) as a north star for the most sensitive bits of logic.


As one of the authors of the original code here, this was the result of several days of intense works by a half dozen people working through every corner case we could dream up, and a bunch we thought of on the spot.

It is in no way a guarantee that we got them all, but after spending so much time reason in through why those 'else' clauses were correctly empty, we thought it would be rude not to write it down.

In truth it was as much for future-me as anyone. My memory is know. To be spotty. :)


Tim, does someone have a fuzzer running against this? Or even some static analysis ensuring that say the enums from various things are actually handled?


Probably not, in truth. It's a great idea and not just for this code, but lots of tricky stuff.


As a user of this particular code, and someone who found several bugs in it in the early kubernetes days that were very hard to trace. I applaud the hell out of this.


Why isn't the core code not most of the code? Why isn't it all core code with a tiny dash of ux or data access sprinkled on it? With, maybe, one abstraction layer somewhere (but never two touching!l.


Having written safety critical code (and reviewed it) it is very useful.

On the other hand, as soon as someone not safety minded gets their hands on it, trouble happens. Comments aren't updated (and there's no way to make sure they are checked, other than GREAT code reviews by the original authors, usually with at least two or three people doing critical reviews). Then the comments can become misleading and a liability as people will take them for truth, as they should.

If you have code that has a lot of subtle dependencies or edge cases, really great comments can help enormously.


I generally find that a high comment to code ratio, if the comments are of the form "Do this thing because of this reason", indicates quality. It indicates the programmer both knows what they wrote and why they wrote it, and it helps future maintainers figure out under what circumstances the code can be changed, refactored, or removed and under what circumstances the original behavior must be kept.

A high comment to code ratio, where the comments are of the form "Do this thing in this way," indicates a lack of quality - generally a sign that the programmer is not confident enough in the language that they're writing in, and is trying to solve language-level problems instead of business-level problems.

Uncommented code better come with some reference for why the code exists in the form it does. Sometimes commit logs and the VCS "annotate"/"blame" feature works. Sometimes commit logs link to bug trackers or feature requests. Sometimes there's a README. If you don't have any of those, I tend to find that it's generally low-quality code.

Our purpose is to deliver business value. (Or non-business value, as the case may be; if you're writing a free video game for fun, you want people to successfully have fun.) Our purpose is not to generate lines of code. All code is, to some extent, legacy code; comments can help it be manageable legacy code, or make it even more unmanageable.


The way I think about comments is that you should always be able to articulate what the consequences of deleting any line of code is. If the code itself is insufficient to do that, it needs a comment.

There are three kinds of comments: why, what, and how. How comments are almost always a sign that the design is poor or the complexity is too clever. Why comments are necessary to understand the code and are almost always a good thing. What comments can be useful guideposts for skimming code, but they are also extremely prone to code rot. I suspect what comments generally end up being neutral in a net value proposition.

You want a high ratio of why comments to code, but I suspect most high comment-to-code ratios arise from what comments, which severely attenuates the utility of a pure comment-to-code ratio.


The comment:code ratio is similar to some legacy enterprise C/C++ systems I've worked on.

I've been on Rails/React teams where comments were seen seen as a possible smell. Not talking about useless literal comments, just that their need was seen as pointing to possible bad design and that a well factored codebase was self-documenting -- ie. if you had to comment something, perhaps methods/vars were poorly named, SOLID principles were not adhered to, methods needed to be broken out, or it was just a sloppy approach. Even explaining design decisions was considered more in the domain of git messages and having nicely packaged atomic commits.

While I see that aspect of it, there's no getting around the constraints of the real world and that some problems are just difficult and much easier to grok with a user guide in plain english, so to speak. And mission critical stuff needs as many safeguards as possible.

That said, inaccurate comments can be dangerous and when your code is highly commented there is real danger things can get out of sync. If you're working on a 5000 line file that 100 developers have touched over a 20 year period... and no one has taken it upon themselves to do a recent comment audit, there be dragons.


I have worked on teams with this same attitude, and in my case it was just a systemic way for the group to rule-away having to write comments. The codebase suffered for it.


Probably depends a lot on the programming language.


This. The code base should be the authoritative source of the behavior of the system. The comments should be the authoritative source of what is expected of the system.


I comment to myself before I write code. It’s in English. Then I write code. So every line of code is commented by default.

I think this provides me higher quality, less bug ridden results. So if others use comments in this style I would tend to believe it does increase code quality.

If a line of code doesn’t match the comment, something is clearly wrong. ;)


Maybe a loose correlation? Highly commented code was probably not written under tremendous time pressure; uncommented code can go either way. Wrongly commented code is painful, though.

And then there's something I recall running into, a decade ago:

    using namespace std; // using namespace standard


I always wondered what std stood for /s


Inorite?

It wasn't even "using standard namespace"...


I actually would say it’s almost the opposite, if you’re writing clean, expressive code it shouldn’t need explaining.

And if your code is clean, you shouldn’t have a bunch of redundant comments explaining the obvious.


I'm not sure there are many cases where there should be long amounts of expressive code.

If you're doing something obvious, you should generally be able to program it concisely, in which case you have a high comment-to-code ratio because the amount of code is low. Sometimes this will be because you're importing an external library to do something, or because you're calling out to an internal library. Sometimes this will be because you found a straightforward implementation. If you're finding yourself writing hundreds of lines of code to do a single obvious task then chances are high you're implementing it poorly (and, specifically, in a way where your defect rate is likely proportional to the number of lines of code).

And if you're doing several obvious things, then the point of the code is not to explain what the code is doing, but why it's doing that. What is the business purpose of the code? Which customer cares about this edge case that you're handling, and under what circumstances can you stop handling it? Why did you decide that the common library wouldn't actually work here? If you're converting data from an awful legacy format, why are your ingesters / parsers for the legacy format designed in this way? If you're micro-optimizing for performance, why are the optimizations sound (i.e., why do they accomplish the same thing as the unoptimized version), how do they work, and why did you decide these spots need to be optimized? Each individual thing you do might be obvious on its own, but the arrangement of the whole thing needs comments for each step, which again gives you a high comment-to-code ratio.


I might have misused the word expressive, I don't mean bloated code with more than necessary logic.

I just meant simple to understand variable, function, and class names. That combined with small classes and functions, makes following the logic of your program extremely easy.

Following concepts like DRY (don't repeat yourself) and the single responsibility principle ensure that you're making more easily testable code, and I'm sure less overall LOC.


While your fundamental point is very valid, there are plenty of times where a comment to flag up a fine point of your clean and precise code will save future-you hours of head-scratching.

I absolutely do not comment enough, but knowing this, I try to stick to the principle that if I have had to stop and think through an expression before I write it, then I am likely to eventually thank myself for leaving a short explanatory note.

And moreover, it may not be me scratching my head over that nest of ternaries in a year's time - it may be some other poor soul. And while that poor soul won't thank me for leaving a comment, he or she will certainly curse my name - quite possibly vocally and publicly - for not leaving one.


Of course, there are certain times like you said when you should 100% add a comment.

There’s nothing worse than going back to a codebase from a year ago and seeing a couple magic numbers and having no clue how they came to be, haha.


Well, let's consider an example from this very code:

  // The binding is two-step process. PV.Spec.ClaimRef is modified first and
  // PVC.Spec.VolumeName second. At any point of this transaction, the PV or PVC
  // can be modified by user or other controller or completely deleted. Also,
  // two (or more) controllers may try to bind different volumes to different
  // claims at the same time. The controller must recover from any conflicts
  // that may arise from these conditions.
How would you rewrite the code so that this information was explicit in the code alone, and as obvious as when it is stated in these comments? Note that simply being able to handle any conflicts that may arise from these conditions is not necessarily the same as saying that they can occur and must be handled, as any particular implementation is invariably an over-specification.

As for redundant comments explaining the obvious, that seems to be something of a straw man, at least in my experience - personally, I have very rarely seen such code. The person who is not motivated to write useful comments apparently prefers to write no comments rather than useless ones.


My comment was directed to the OP's question of comment:code ratio in general, not in this exact circumstance.

Additionally, in no way am I advocating for no comments, that's obviously not possible (like your example). Comments are useful, even necessary, for code that might have an otherwise confusing logic to them.

I've seen plenty of code with documentation for a method with nothing more than:

  /**
   * Bills the user
   *
   * @param user The user to bill
  */
  public void billUser(User user) {
    //
  }
In my opinion, that comment is completely redundant, and I think it's driven by the idea that we should comment EVERYTHING.


Though there should definitely be documentation here. What happens if `user` is null? What if the user doesn't have enough balance for the transaction to complete? What if the transaction fails?

I see this as someone trying to fool a linter that demands they have documentation. I think it's better to say "comment everything" because it puts documentation as a first-class consideration rather than an afterthought.


If it is a general principle, then would one not expect it to apply in this case? More importantly, this is not a corner case; situations where there are specific conventions and protocols that have to be handled consistently in various cases are extremely common in software. It is also not uncommon to see optimized code that is much easier to understand when it is explained as a modification of a simpler implementation.

With regard to the sample, then if that is your experience, I cannot deny it, but, irritating as it may be, it seems fairly harmless. It appears to date from a time when it was thought that extremely prescriptive coding style standards was the way to fix programming - an even less realistic belief than the idea that code can be entirely self-documenting.


no, in case like this, it's mostly for documentation generation. though you may find it useless.


Worse than useless. It harms your ability to take in multiple things at once on your screen.


Code shows how something happens (i.e., a string comes into a function and is parsed and only the date from it is returned), but it's so bad at showing WHY something needs to happen. My comments are almost always about why I'm doing it in the way I am, complete with examples of test cases where the users broke things in ways I wasn't originally expecting. Ten years from now, the code part will be rewritten using whatever crazy new stuff the language supports, but the underlying need for doing it at all will probably still be around.


As a counter example, here is a C file of 20,000 lines and no comments. I pushed this to Github long time ago, as it was the most gigantic "real" C file I have encountered.

https://github.com/miohtama/aliens-vs-predator/blob/master/s...

Comments are very barebone. There is structure, but needing to mess with this kind of code would be scary. Granted, most games are write once and never look back.


You made a good point because this code is neither clean nor expressive. The parent comment talked about clean and expressive code. Can you show some of clean and expressive code and try to make the same argument again? (I am not holding my breath that you will)


Precisely. I tend to only write comments that explain reasons for making non-idiomatic decisions like:

// It might look like you should do X here but esoteric reason Y dictates that you should do this instead.


I use comments to document assumptions that are likely to be wrong, either now as I write it, or later when someone (probably myself) changes it.

It is absolutely useful to do, and really not too difficult.

Languages that allow for more formal assumption-checking (especially before runtime) are even better, but comments have an additional benefit of being understood by a human directly.

I wish languages with static analysis could somehow allow authors to encode human-friendly/sematic errors that you often see as runtime exceptions into the static analysis itself...


> do other HNers also feel that a high comment:code ratio probably indicates quality?

I consider it a big risk of errors.

When some code is changed, will all related comments be rewritten too? I doubt it.

And then you end up with a codebase which indicate A but comments which clearly spell out B, and you as a maintainer have no idea what to believe.

DRY. Don’t repeat yourself. The comments should not double up for the code. That’s just future maintenance nightmare.


I've seen horrible inheritance/convoluted refactors done in pursuit of DRY.

I'm a bigger fan of WET(Write Everything Twice). Usually the first iteration of a component you don't understand enough of the domain space to get the abstractions right. So use that first attempt to explore the issues/problems/corner cases. Once well understood, rewrite it into something concise and well abstracted.

I've also find that if you try to re-write a third time you'll end up being to clever in trying to predict where a system will evolve and get you right back into the same situation as the first iteration.


Commonly referred to as rule of 3. Write it out in full twice, on the 3rd time it must be a real abstraction, therefore refactor.


> I've seen horrible inheritance/convoluted refactors done in pursuit of DRY.

Everything in moderation, including moderation itself.


Which is the first iteration, the code or the comment?


> And then you end up with a codebase which indicate A but comments which clearly spell out B, and you as a maintainer have no idea what to believe.

Can you name a few examples where you encountered this? In my career (30 years programming) I've never seen it. I believe it's a common, poor excuse for not writing enough comments.

The benefits of comments are well-understood. For me personally they often helped compensate sloppy code, (non-obvious) assumptions and prevented bad solutions because I reconsidered while writing (embarrassing) comments.

when you have to maintain a large codebase modified thousands of times in 15+ years, every single comment is invaluable.


Then you must have been very lucky, I have seen it happening probably hundreds of times in a mere 15 years on the job. The inconsistencies that I experienced ranged from doc strings stating to pass a parameter that didn’t exist any more, parameters with different names, parameters with the correct names but in different orders. As per actual comment I have seen plenty of time comments like

    //here we go baby!
    //do not touch!
    //1 ... //2 ... //3 .. //8
The last one apparently was to indicate the order of some overengineered stuff that could have been written properly.

//This happens only for CompletedOrders (while in reality it was the opposite)

//This calculates the notional in the same currency (while it was converting it in GBP)

//This is extremely important, never delete (and after that there was a bunch of commented code)

Honestly I don’t remember all of them, I tend to use my memory for more important things, but I bet that if you had seen only 1/10 of the bad comments that I have encountered you would have a different opinion. Btw in your post you are explaining exactly why I hate comments:

“For me personally they often helped compensate sloppy code” - the solution is obviously to fix the sloppy code, not to write a comment because you are too lazy to fix it

“(non-obvious) assumptions” - this is probably the only legitimate reason to write a comment.

“and prevented bad solutions because I reconsidered while writing (embarrassing) comments.” - this has nothing to do with the comments given that at the end you didn’t write it. Thinking more about the code instead of trying to comment it gives you the same result, if not even better.


I was asking specifically about comments that directly contradict the code in a way that confuses the reader. If obsolete comments like "never delete" stand before commented code, what makes you think the commented code is actually used or useful?

> “(non-obvious) assumptions” - this is probably the only legitimate reason to write a comment.

If that's what you think, I rest my case...


Are you serious? Almost every codebase that has comments has code that doesn’t exactly match the comments. Main obvious reason being that the comments aren’t executed whereas the code is. Sure meticulous attention to detail could keep the two in the sync. But you are suggesting that they have always been in sync for any code you’ve looked at?

As someone who is clearly a fan of comments, are you saying you’ve never modified a comment to make it more accurate to what the code is doing?


Short functions help here.

If every function is just a few lines long, the comments are easier to keep synchronized, and if a function drops out of service, it should eventually be garbage collected with its now-irrelevant comments.


But what if comments pertain to unexpected states the system as a whole can be in?

Kind of the whole problem is when there are weird corner cases going on that straddle function boundaries.

I'm not saying that's a good thing; mind you - but nor is it always trivially avoidable, especially if code needs to be concurrency and/or exception safe -- or in general whenever the statements your function consists of have surprising and opaque behavior based on system state, particularly if said state is hard to grasp due to being implicit or dynamic, or simply large and complex.


> Kind of the whole problem is when there are weird corner cases going on that straddle function boundaries.

If the problem has "hub and spokes" topology, i.e. it's relevant to multiple places in code that all reference a single location, put a comment describing the issue in that single location, and everywhere else put a comment with a reference. //Warning. See comment in [that location].

If there's no single best place for the detailed comment, put it in some design notes file, and put a comment with a reference to that file in all the affected places.

DRY can, and should be, applied to comments as well.


Centralized comment references sounds like a good+simple idea - I'll try to remember it, and hope I never have to of course ;-).


Yeah, references to a centralized document is such an obvious thing... once you read about it. It's another thing I recently picked up from Ousterhout's book, and looking back, I can now see the places in past codebases where I wish I thought of that myself.


Agreed, but that’s why I prefer writing code that has no global state. In Erlang, for example, it’s unusual (and the language lends itself via pattern matching to having short function clauses).

It gets a little tiresome threading the relevant state to every function that needs it, but it’s worthwhile in the end.


Even a pure function has "state" - namely its (arbitrarily complex) inputs. But sure, it's a little less of a landmine.

The fundamental issue remains that sometimes your knowledge about that state (whether the classical kind or a proper parameter) can be complex and dependent on what happened elsewhere, especially if the codebase your in was grown into that situation, and not designed like that per-se. A comprehensible set of preconditions and postconditions isn't always a luxury you have, certainly not at first.


If a function is only a few lines long its action should be entirely documented by its name.


The action, sure, but there’s more context than that in many cases. See my other comment.

https://news.ycombinator.com/item?id=18773167


I think it indicates pride more than anything. If I write something that is just business as usual or commonplace my comments are pretty lacking. If it is something very interesting or that I am proud to have done, I usually write some very detailed comments. This probably correlates to better quality just because it was something I was interested in doing rather than shoveling code.


Knuth allegedly attributes the stability of TeX to his literate programming style.


I was reminded of Knuth, too. The code/comment blend encourages reading it like a white paper.


It's really only useful in areas of codebases that either a) are very complex, b) touched by many people or c) both. When that happens, everyone prefers that there is a lot of documentation, especially about the why. With older codebases the question is always whether this is an actual bug from the developer or is there a reason why it's doing this super-weird thing and if so is it still applicable. What's happened over the last 5 years is that automated testing has become so mainstream that places without tests are the exception AND the tests have replaced the need for comments.


You've missed d) the codebase lives longer than a few months and someone else than the original author has to make changes. Comments describing the intent and caveats are extremely useful in ensuring the future developer gets adequate understanding quickly, and reduces the chance they'll introduce bugs.

Tests can help understand the interface, but they don't help to understand the rationale behind it, the underlying abstraction, or implementation caveats.


Agree 100%, along with accompanying documentation that lays out architecture, rationale, challenges, etc. All of those are invaluable for any code that will outlive the tenure of the developers who built it. And given how often people in tech change jobs that's virtually all code.


Oh yes. And even disregarding tenure, you have cases like illness (see e.g. the Word 1.0 postmortem[0], page 14, talking about losing a key developer), or people changing project.

One time, I inherited a big steaming pile of spaghetti my co-worker wrote to meet a tight deadline, before being shifted to another project. That code implemented one of the key functions of the application, and half a year later, the customer demanded extensive changes. Believe me, I would have paid half my monthly salary the just to have a third of the comments that we see in this Kubernetes file.

--

[0] - http://antitrust.slated.org/www.iowaconsumercase.org/011607/...


'A comment is a failure to express yourself in code. If you fail, then write a comment; but try not to fail.' - https://twitter.com/unclebobmartin/status/870311898545258497...

And a bit more on the same from clean code: http://www.kyleblaney.com/software-blog/2012/6/29/comments-a...


How do you successfully express "We need to treat all transactions on February 29 as happening on February 28, see customer ticket #4321 for rationale" in code?


I think it's fair to express rationale in comments, but the following works for me without comment:

  transactionsLeapDayAdjusted = transactions.map(t => t.date == '29-Feb' ? {date: '28-Feb', ...t} : t)
One thing I always push back on is references to ticket numbers or other external systems (except perhaps e.g. ISO standards). Repos should be self contained and perpetual. One might not have access to the ticket system now, or ten years from now.


Since we're referring to Bob Martin, I suspect the answer is through a functional test that captures that requirement. Thus, if a naive editor makes a change that breaks the requirement, it will not pass the test and cannot be committed to mainline.


That catches a change, but doesn't give the reader the answer for why the code is like it is until they find the right test.


True enough, but in my limited experience it's more likely that such little requirements are hidden in code (and possibly commented) than that they have been properly captured by specifications and tests.


By naming as much as possible. I'd need to know the rationale in the ticket to be able to try and codify it, but here's how I'd try and do the rest: https://codepen.io/anon/pen/Jwyzdv


I'm not sure that reducing the comments-to-code ratio by increasing the complexity of the code really helps anything. You've made the code more generic for what you currently think future changes are going to look like, which may or may not be accurate. And in the process you've split dateIsOnLeapDay and convertLeapDayToPreviousDay into separate functions, so if someone is tracking down a bug in line 10, they need to jump to lines 20 and 21 to figure out that the associated code is in line 15 (think "wait, did you say leap day? I meant leap second"). In a large program these would get even further separated over time - someone is going to decide that dateIsOnLeapDay should be in a common utils class because they want to use it somewhere else - and I think there's a lot of merit in keeping lines 4 and 5 next to each other.


> I'm not sure that reducing the comments-to-code ratio by increasing the complexity of the code really helps anything.

More lines doesn't mean more complex, it's the same logic just the logic is named now and more reusable. It's possibly not the best example, as the logic is minimal, but when the logic becomes more complex, wrapping it and naming it becomes very powerful. We're creatures of abstraction.

> You've made the code more generic for what you currently think future changes are going to look like, which may or may not be accurate.

I'd argue that I've reduced the number of reasons the code has to change, which should be a goal while programming. If we change how we calculate a leap day, we don't touch how we modify a leap day, which means we're less likely to cause adverse side effects.

> And in the process you've split dateIsOnLeapDay and convertLeapDayToPreviousDay into separate functions, so if someone is tracking down a bug in line 10, they need to jump to lines 20 and 21 to figure out that the associated code is in line 15

They should be separate functions, they are separate things.

> In a large program these would get even further separated over time

Is it really a problem if they are separated? What links them? There could be plenty of reasons for wanting to call one without the other.


> I'd argue that I've reduced the number of reasons the code has to change, which should be a goal while programming. If we change how we calculate a leap day, we don't touch how we modify a leap day, which means we're less likely to cause adverse side effects.

Maybe this is just different instincts/experience and I'm not saying you're wrong, but my feeling here is that you do actually want to change them at the same time. Suppose we decide instead of adding a day to February 29, we keep the months the same and add a festival day at the end of every fourth year, numbered 13/1. Then modifying the festival day to 13/0 is wrong - the day before 13/1 is now 12/31.

If you have one function for "fix leap days for reporting purposes" then you're fine, and you've set the abstraction in a good place (or at least good for my example case, I will totally concede there are other examples!). When you edit the is-it-the-leap-day line of code, you'll see the subtract-one line of code directly below, and if you forget, your reviewers are likely to notice. And you haven't really made things noticeably worse for the case where the customer says "Actually we need February 29 rounded to March 1, instead", it's not distracting to have that line of code above where you are (and if anything it's useful to have that comment, so that if this is a different customer asking you realize that you need to not break expectations for your first customer).

I am something of a skeptic of reusing very short pieces of code - for instance, my team's own codebase has a poorly-designed function for calling a subprocess and swallowing certain types of errors from a very specific command, and in a code review I had to tell someone to just use subprocess.check_output(), which does the same thing but without the modified behavior which they probably didn't want. Abstraction makes sense when there is a meaningful concept to abstract. (Similarly, I am also very much not a fan of getters and setters; I think most people are better off with a structure with public fields, because I have very rarely seen it be useful to convert a trivial getter/setter to a non-trivial one without bothering to look at how callers use it, and it is useful to rename the field and see which code fails to compile / no longer passes tests now.)


def we_need_to_treat_all_transactions_on_february_29_as happening_on_February_28_see_customer_ticket_#4321_for rationale:

obviously


The sad thing is, you're not entirely wrong...


Perhaps a function to get the "true" transaction date given a transaction date, and explain the "why" by adding a special function describing the rationale in brief through its name.


Finally, can’t believe I had to read this far down in the comments to find a reference to clean code.


Sometimes it's good to have a block comment explaining the motivation behind a chunk of code or particular line, or just to clarify the individual steps in small modules, but you have to remember you aren't writing prose.

I think around 10% of your code as comments is a good measure, but also remember that you may not revisit a module for years, and you will come to appreciate each and every breadcrumb you left which leads back to your original state of mind when you wrote it. If you measure code quality as maintainability, then comments can indeed increase code quality, just non-linearly.


Long comments tend to scare me because they often detail some horrific hack that I'm going to have to deal with.


So true.


Nothing to do with ratio but I’ve found that the quality of the comments often reflect the quality of the code.


No comments are good comments.


>However, it does give me some comfort. When it’s not gamed, do other HNers also feel that a high comment:code ratio probably indicates quality?

I think people should spend more time commenting/documenting across the board. I'd much rather have verbose commenting that is unhelpful that I can skip, versus minimal commenting and code that is overly optimized and hard to parse. The less thinking I have to do to pick up where the last person left off, the better.

I would say that yes, in general high comment:code ratio tends to be higher quality.


I think it's usually a sign of a poorly understood domain or a poorly modeled problem. It's not a good sign; it's not necessarily a bad sign; it's (at best) an admission of one's limitations.

Comments become useful when behavior is implicitly tricky. Ideally you'd make the "trickiness" tangible and expressible in-whatever-language you're in, but that's not always easy to do.


> When it’s not gamed, do other HNers also feel that a high comment:code ratio probably indicates quality?

Just the opposite. It typically indicates a history of rigorously documenting terrible code. Sometimes, comments come from complex business requirements or other external constraint. Documenting the former is largely an anti-pattern while documenting the later is hugely useful.


I think repeating the same concept in different ways just makes things harder to read, not easier. Comments also sometimes reference code outside of where they are placed. This leads to them becoming misleading and incorrect.

I find that comments can be a last ditch effort to make hacky convoluted code look better than it is. It can be an indication of lack of thought and planning and later obsessive documentation to make up for it.


> do other HNers also feel that a high comment:code ratio probably indicates quality?

That's always been my instinct.


There's a lot of talk about comments becoming stale and code being self documenting in the replies which makes me wonder: do people genuinely not read comments and just made code changes without updating comments? And do reviewers not look at the context of the surrounding code and just let commits in? What's the point of having code reviews then?


It's very easy to miss comments, since they aren't type-checked, nor are they unit-testable.

Comments don't need to be near the code they affect. They don't even need to be in the same file - consider if you've organized your code by features, but the comment relates to a layer than spans multiple features.


I once spent two hours on figuring out why a log file wasn't being modified when there was an error. I knew the location of the file but it just wasn't showing that error.

Eventually I tracked down this line:

    // writes to the log file at c:\...\xyz.log
    AppendToLog(message);
Yeah, that was the correct path and everything, and yet the line wasn't executing!

Eventually I looked inside the AppendToLog method. It was writing to another file in a completely different path :)

That was when I stopped bothering to read comments. They always lie, and I couldn't even blame the programmer who changed the AppendToLog method -- the comment wasn't inside the method, it was on a call. I can't honestly expect someone who changes a method to look for all the places where that method is called and make sure any existing comments match the change.


Can't you modify the comment to improve overall hygiene instead of emerging with code nihilism?


By all means, feel free to improve that comment :)


Comments straddle this weird line between code and human process. They're in the source code files themselves, and there is an appeal to trying to evaluate a project by looking at the source code alone and trying to gauge abstract technical merit. But I think the real truth here is that comments are a tool in the service of a development process, which includes things like having code reviews, having code reviewers be sufficiently motivated (intrinsically or extrinsically) to care in useful ways and neither nitpicking nor rubber-stamping, having motivated people on the project in the first place, having shared values about what code you're going to write and what code you're not, having tests and doing the operational work to keep tests running, retaining people on the team, etc.


I'd say it's mostly a meme. If a code review doesn't involve reviewing comments around the modified code, it's a bug in the process.

Elsewhere in this thread Ousterhout's book is mentioned; I like his advice about always placing comments in the most obvious places and as close to the code they affect as possible. This way, you can't miss them, and and it's hard to forget to update them.


Everyone can have comment blindness to some extent, but I've worked with two people who auto-collapsed docstrings and didn't read and hence update comments, which is enough (one person writing code without updating comments/docstrings and one person inadequately reviewing). Sure, the problem only appears in a bit of the code, but it means people stop trusting all the comments.


> auto-collapsed docstrings

Woah, that sounds like a pretty dumb feature. Auto-collapsing whole functions is useful, but auto-collapsing docstrings sounds like a recipe for disaster. People write docstrings and inline comments for a reason.


No. Autocollapsing doc strings is not dumb at all, it’s an amazing feature. Most of the time in my experience the doc strings are completely useless when you are writing code, they may be useful only for the caller because they give you info in the autocompletion (and most of the time is just “get this value” “set this value”) and they can be used to automatically generate api docs. If they are extensively used in a private project that no one will call from a different one I will auto collapse them. They take so much space for nothing and they slow me down terribly. And the projects in which I have seen this behaviour had horrible methods naming, clueless architecture and completely arbitrary method subdivision. The doc strings where just the wrong solution for the wrong problem.


The logical conclusion to more comments is Literate Programming. This book for instance is also a program: http://www.pbr-book.org/3ed-2018/contents.html

This file isn't that heavily commented. Do you look at many OSS projects for comparison? Though when things get complicated with many branches and function reentries it makes me wonder whether the problem would have been better solved with declarative logic that handles the procedural mess for you. (It might also be much higher quality since you may unlock access to various formal methods and go beyond unit tests. Though perhaps for example there's a vetted TLA+ spec not shown that this controller is based on.)

I don't think doc'ing every function is unusual, usefully doing so is less common though. Comments in the function body also aren't that rare, though it might indicate a place for better factoring e.g. just more function calls on descriptive/suggestive names. (Having more functions will help in not having to stub out (and deeply stub) so much in a behavioral test, too, since you can get away with just mocking the function call instead of the potentially hairy state logic the function does underneath.)

I see an example at a random spot for a couple improvements in naming (in my ignorant opinion, I don't know about kubernetes) -- though the fact I feel able to express even a weak opinion on an improvement suggests the comments were reasonable. I've seen code less hairy but with no comments or useful tests and without a need to really understand it I just want to move along pretending I saw nothing.

Look at the set of ifs at L591. The first if is a null check with part of the explanation on L592, better to remove that part and have a function call, something like "claimWasDeleted(claim)". The matching else if on 615 checks for an empty string name, I'm not sure but I think its explanation is at L634 and the empty string check could be "isClaimPending(claim)", and maybe move the mode mismatch check to its own else if before the isClaimPending block and give it a better name. I appreciate the comment on L635 telling me why the next line of code on 641 is done (it may likely not be clear from the commit history, which can be another place for whys) though with the isClaimPending change the comment and code might be replaced with a fn call with the details in the fn doc. I'm also reminded of an idea in more expressive languages to annotate purely optimization metadata of any kind (inlining being the simplest) and being able to toggle it on/off for extra QA in a test suite. Anyway the next elif on 643 and its comment, could be something like "isVolumeBoundToClaimProperly(claim, volume)". You get the idea.


I think it's likely to indicate low quality. Comments are for where the code wasn't clear enough.


It is rare to find code that comprehensively explains (without comments) why it exists, or often more importantly, why some superficially-equivalent code doesn’t exist there.

Comments when done correctly are vital.


So are politicians when not lying and well-behaved children.

Correctly done comments are a one in a million thing. In my experience, they are utterly surpassed by "i = i + 1; // increment i" style comments. (Seriously, I'm working on code written by someone who teaches programming and he writes this type of comment.)


In an actual teaching context, a comment like that can make sense.


Some languages like golang, doesn't priorities concise code, it often takes a few lines to do something trivial. Find the object in a list with the lowest lexicographical ranked value of some property.

The code to do this is simple, but not concise, leaving a comment so I can scan the function and skip 5-10 lines doing something trivial is nice.


This sounds like a self-fulfilling prophesy. If only think comments are for telling you what a line of code does literally, then you're only going to see comments where the code is obscure. If you use it as a form of high-level communication to help the user understand the broader context and reasoning behind code (like at the top of the linked page), then it will be useful because the person writing the comments understands why documenting your code is a good thing.


Code can and should be a form of high-level communication too. In a well-structured program, the high-level code will explain the high-level context, and the low-level detailed code will explain the low-level details.


> do other HNers also feel that a high comment:code ratio probably indicates quality?

Nope, imho code with lots of comments is generally crappy code. It's littered with comments to explain the sloppy code they couldn't make clear because they're bad programmers. Good programmers use few comments, write simple clear code that doesn't require explaining, and leave comments about why something was done rather than simply trying to explain what the code does.

Code never lies, comments often do; don't trust comments that explain the code.


i would say like you say it depends on the quality of the comments documenting the code. if they are correct then it shows a thorough understanding of what is written in code, apart from that a complete stranger to the code can easily find what they need. however, like you said, you will need to maintain comments more than code, which is a pain and will lead to inconsistencies in the comments, leading to crappy file with meaningless junk scattered in it, which in turn means you can never trust comments, and it's therefore kind of useless to have. :d but since that's a circular argument, and those tend to be just cynical in nature, i do prefer properly commented code above uncommented code. i'd do it less verbosely myself so i don't need to maintain so much of it though, trying to keep it more consistent over time.


No, I do not find it indicates quality.

To me, comments are noise, and code is signal; the code is what actually executes.

It's one thing to have a summary of intent at the start of a listing, that should not count towards the code:comments ratio.

Once the code begins however, there should be a minimum of comments necessary - especially in a high-level language not constrained to assembly-level instructions.

In assembly listings it was common to have two columns, the code on the left and comments which often resembled high-level pseudo code on the right. Here's some representative apollo guidance computer source:

  MAKEPRIO	CAF	ZERO
		TS	COPINDEX

		TC	LINUSCHR
		TCF	HIPRIO		# LINUS RETURN
		CA	FLAGWRD4
		MASK	OCT20100	# IS PRIO IN ENDIDLE OR BUSY
		CCS	A
                TCF	PRIOBORT	# YES, ABORT

When you're already working in a high-level language like C or Golang, you should be able to clearly communicate what is going on without the need for littering it all with comments.


>To me, comments are noise, and code is signal; the code is what actually executes.

Comments are noise to the compiler, but code is both a communication between humans and from humans to machines. To imply that only what executes is signal and all else noise is to ignore half the purpose of code, which is documentation.

And despite what a lot of people want to believe, code itself is often not sufficiently self-documenting.


More importantly, today's code is an optional basis of tomorrow's code. When NASA wrote the code for Skylab, I bet the comments in this codebase were the only useful thing in it and the code was all noise.

If you don't want to spend your career rewriting the same thing over and over again for slightly different business use cases and platforms, comments are incredibly valuable. (On the other hand, I guess there's a lot of job security in being hired to write the same thing many times....)


Why do you want to abort when prio is in endidle or busy, and not in other cases?


Exactly. True, code could be self-explanatory, but sometimes you need to explain why code does what it does.


If you care to understand the code I pasted, here's the full listing:

https://github.com/chrislgarry/Apollo-11/blob/master/Comanch...


Well, the first 500-ish lines of that 1500-line file are comments, and there are block comments throughout, too...

You did, to be fair, say that comments at the top shouldn't count. But I think that depends a lot on personal (and language) style towards multiple files and multiple units within a file - for instance, none of this code is object-oriented, and comments above each class make sense in an object-oriented language. I think as the language gets more concise - Go is a lot more concise than the Apollo assembly language - you're going to need to have the same amount of prose to explain what you're doing but a lot fewer lines of code to get anything done, and it makes sense to have comments above each function or each block, because that's really the comparable unit.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: