Hacker News new | past | comments | ask | show | jobs | submit login
Writing Well-Documented Code – Learn from Examples (codecatalog.org)
333 points by ainzzorl on Sept 4, 2021 | hide | past | favorite | 142 comments



Personally I found most of those examples to be not great. I think they would be improved better names overall, more well named private methods and most importantly well structured companion test classes proving the behavior.

I really dont want to read another developers comments on their mental state while I am feverishly debugging their code during a 3am page.

The number of times I have been misled by an outdated comment. I generally open git blame to see if the comments predate any code now. Tests on the other hand almost never lie.

Readmes or comments explaining why something exists vs what it allegedly does are generally much better. If you can’t write expressive code why would you assume your English is much better?


The problem with all discussions of good commenting style is that it's like debating good footware. Good for what? Office? Hiking in snow? Walking on the beach? Except that we have different footware for different uses but only one set of comments.

Having such a limited commenting syntax is like assuming one permanent type of footware and debating which one would be best. I'd much rather have a smarter commenting system.

I'm not talking about supplementary docs stashed elsewhere to gather dust. I'd like to have expandable comments right where they are needed but usable differently for a glance at familiar code or puzzling through unfamiliar code. Something very brief that doesn't clutter the code, a couple of words or no words at all, with an expansion arrow to something more detailed, expanding to something fully documented with links to everything.

And comments with scope (this comment applies to this line, this one to the function, this to a group of functions...) that actively use the scope. By active I mean that if you make any edits to code in the scope of a comment, that comment changes color or some other mechanism similar to a Git merge conflict: You changed the code without changing the comment, so check and make sure the comment says what you want and click OK. You don't have to fix the comments immediately for each edit. Make a few changes, test, make sure it's working, but when it is, you can see at a glance which comments need to be checked.

The general idea is instead of debating The Best Way to Comment, consider ways to improve the commenting system itself.


> (this comment applies to this line, this one to the function, this to a group of functions...)

This is probably the major reason I much prefer introducing new variables and functions solely so I can give them descriptive names over using comments. Comments aren't scoped so they're often ambiguous and imprecise compared to code, which also contributes to them becoming out-of-sync with the code over time. Comments above a function definition are generally better than comments inside a function for example as it's clearer what the scope is.


Inline graphviz annotations would be very nice. Store the text, IDE "codelens" renders it out


>The number of times I have been misled by an outdated comment

Imo a problem with code review and not leaving comments. Just like tests need changed during refactoring, documentation and comments should also be cleaned up as part of the change

I have found those as well but place the blame on the person who made the subsequent change and ignored them


There’s not many code review platforms that are great at showing you relevant comments in a diff because they are often in a much different part of the file, so I kind of think it’s hopeless in a large code base to catch this sort of thing in a code review


I now use VS code for reviewing PRs and it's easy to spot and then touch up comments.

With GitHub, not so much.

Not sure why code base size would make it harder... Workflow? Code unit size? The larger the code base, the more comment discipline matters IMO


I was mostly thinking about how small changes can sometimes touch a larger quantity of files on moderate or large code bases, so a bit more things to wade through when doing a review, but point taken.


Comments are not inherently bad but they do have inherent short comings. Still a useful tool but one that should be reached for rarely and last.


Comments are "reached for rarely and last". Whatever shortcomings comments have pale in comparison to being faced with and trying to figure out undocumented or too sparsely commented code, which, in my long experience, has been an order of magnitude more prevalent than sufficiently documented/readable code. Heck, even a wrong comment is sometimes preferable to no comment if it at least gives me a clue as to what the purpose of that section of code was at some point in its evolution.


I personally strive for code that can be read as a sentence, especially in complicated sections of code.

This sometimes leads to weird intermediate variables which might look superfluous, e.g. intermediates like:

foo_has_at_least_1 = len(foo) >= 1

(not a great example, usually it’s a bit more semantics. than that) but later code using those intermediates reads as english, which can reduce cognitive load when reading the code and obviate comments, and preconditions are more readily verified.

It’s literate programming and I prefer that to long comment blocks like those in some of the examples. All that said, it can be hard to get coworkers into such a habit, especially those who feel brevity and abbreviation is a virtue.


The problem comes not with what the code does but why the code exists in the first place.

The only time I ever feel the need to leave comments is to explain code that doesn't appear to be needed or isn't obvious why it exists. Usually this comes from agreed upon technical debt, short notice requirements with tight deadlines, or some other form of tradeoff.

"We're doing a simple check and throwing an exception here since we have work to introduce a more comprehensive permission system in xyz epic"

or something like

"There's a complicated race condition in xyz so we're doing a simple sleep here since the proper fix involves introducing a distributed lock"

Sure those types of things are cases you want to avoid but unfortunately you occasionally end up there anyway and leaving a short explanation about a compromise is better than ambiguous code that appears to serve no purpose


I dont think that held in the examples or a lot of what I have seen in practice once comments become the norm. Most classes should not need a comment in them at all. Most methods should be self-explanatory. A comment detailing every line of code is a huge smell.


…and on the reviewer(s) who accepted the code change even though the comments no longer match the code.


I agree, but git and most code reviewing tools I've used only show a few lines of surrounding context so the reviewers might not even realize there's a comment further up that needs updating.


Generally speaking, for all but the smallest of changes you should never only review the diff, but pop open the actual file on that branch (GitHub makes this fairly easy, I can't speak to other tooling) and consider the change in the context of the surrounding code. Does it contradict comments? Needlessly duplicate code from elsewhere in the file instead of creating a single function to be used in both places? Shadow variables confusingly? Add yet another 5 lines to a 100 line function instead of leaving the code better than it found it?

You can't really judge any of this if you're just eyeballing a diff. Diffs are just the starting point for a review, and while tooling could make this easier, the full file is always just a git checkout away.


That is how I feel about these comments.

In a vacuum I like them and when I bump into a piece of 10 year old code commented like this it is great.

There is one caveat however, they have to be meticulously maintained and always updated to reflect every change made to the code. It is a huge caveat because more often than not the comments are out dated and no longer reflect the actual code and at that point they are detrimental.


That's why Knuth had the idea of creating literate programs where documentation and code was maintained in a single block. Unfortunately, this idea was never used in large scale.


Have you ever tried to read the soup he created with code blocks rearranged to suit the generated text rather than any future maintenance coder?


Yeah, I think there's a fundamental problem with the implementation -- with small, single-file programs where the code can be written in a chronologically linear manner it works well. But with that there's no need for reordering logic, so much of the more complex parts of weaving are redundant. Scripts, for example, are pretty easy to set up as [for example] markdown files, which works great (particularly in a project that requires a set of helper scripts which need why communicated -- having a set of markdown files in a folder makes it IME much more ergonomic for other devs). Anything larger and it gets increasingly fiddly and (again IME) not worth it.


> If you can’t write expressive code why would you assume your English is much better?

Because English is optimized for communicating human intent?


I guess aside from assembly and very low level or highly specified languages aren’t programming languages as well?


In my experience, it's much easier to get just the right level of abstraction for quick reader comprehension in natural English, compared to any programming language.


I have to agree. Unit tests are some of the most helpful things I’ve used whilst refactoring the Libreoffice vcl module. In fact, I’m now at the stage where if I only move a class into a new header I’m writing a test that covers that class.

An example I’m currently working on is:

https://gerrit.libreoffice.org/c/core/+/121403/5/vcl/qa/cppu...


Yup, the rate limiter comments are unhelpful. A top level comment explaining the idea is fine (and very useful) but code should largely be self documenting.


Which it often is, but they still wrote a comment, leading to ridiculous stuff along the lines of

> // Returns None is x = 0

if x = 0 { return None }


To me, comments in code are like descriptions in a novel.

I would agree that comments which unnecessarily describe what is to come are bad, in the same way that a novelist should prefer to show, not tell, unless there is a compelling reason for the reader to keep that description in mind as the immediate events unfold.

But to require almost all comments gone in favour of well-named code and external docs feels wrong too, in the same way that one would probably not like to read a book with no chapter headings, no visuals, simply describing the story in a dry (but informative) language, and relying on cliffnotes for actual context.

And if you say code is more like a technical report than a novel, I will disagree. Good code is always a pleasure to read, and typically for similar reasons that a good novel is pleasurable.


Agreed, I think that the code itself should be readable through better names, with the comments being a bonus rather than critical to the understanding of the code.


Exactly. In my opinion, Thinking Forth by Leo Brodie should be required reading for anyone who touches code.


Speaking of source code comments, antirez (of Redis fame) wrote a fantastic article[0] about that topic some time ago and I still recommend it to colleagues whenever they make the hollow statement that "code should and can be intelligible on its own, without any comments".

[0]: https://web.archive.org/web/20210226004600/http://antirez.co... (I still don't understand why he deleted his blog)


I love this line: "Comments are rubber duck debugging on steroids, except you are not talking with a rubber duck, but with the future reader of the code, which is more intimidating than a rubber duck, and can use Twitter."


I agree with you. Code should be intelligible on its own without comments. It should also be well documented. One does not preclude the other.


I've forgotten who first wrote this, so I can't credit them, but there's a saying that if someone can't write clear, clean, and understandable code, why does anyone expect them to be able to write clear, clean, and understandable English (or whatever language)?

It's quite a bit easier to write something the compiler or runtime can understand, there's no need for meaningful names, consistent structure, or conceptual coherence. As long as it builds and passes whatever tests, the computer will accept it.


> if someone can't write clear, clean, and understandable code, why does anyone expect them to be able to write clear, clean, and understandable English

Isn't the answer to that obvious? The code is significantly constrained in form by having to be executable. The comments have no such restriction.


While surely not the source of the saying, antirez argues in a similar direction:

> You may think that writing comments is a lesser noble form of work. After all you can code! However consider this: code is a set of statement and function calls, or whatever your programming paradigm is. Sometimes such statements do not make much sense, honestly, if the code is not good. Comments require always to have some design process ongoing, and to understand the code you are writing in a deeper sense. On top of that, in order to write good comments, you have to develop your writing skills. The same writing skills will assist you writing emails, documentation, design documents, blog posts, and commit messages.

In any case, you're making a good point, though I would argue that the reverse of the saying (with the modification that "expectations" get replaced with "trust") also holds:

If someone is not able to write intelligible, stringent English, I wouldn't trust them to write good code, either, because writing good code requires similar qualities (good stringent structure, empathizing with the future reader of your code etc.) and, as antirez argues convincingly, it also requires writing comments (which are in plain English).


> if someone can't write clear, clean, and understandable code, why does anyone expect them to be able to write clear, clean, and understandable English (or whatever language)?

The comments don't say the same thing as the code. In the comments, the author might tell us - albeit not clearly or in proper English, what they want to achieve, or why they're doing it this way.


If someone takes a dozen lines of convoluted code to express what can be done in 2-3 lines concisely, the explanation isn't going to help. Think of the worst mess of if/then/else clauses you've seen, or someone who switches on a type instead of using polymorphism. Even if they can explain it well enough that you can follow it, the comments aren't going to improve the understanding much. They are more likely to add cognitive load and make it harder to grasp the intent.


Comments are not for fixing bad code, comments are for explaining complex (and good) systems.


I first heard that in a talk by Kevlin Henney [1]. I'm not sure if he's the one who came up with it though.

[1]: https://twitter.com/KevlinHenney/status/381021802941906944


I'm a big fan of Henney and have watched a lot of videos of his talks, so that's probably it.


I'm not sure we agree. But I guess the question comes down to what you and I mean by "intelligible".

antirez provides several examples where the code without the comments would be much harder to understand and/or maintain. Whenever this is the case, I would argue that comments are a must and without them the code is at least barely intelligible.


Very often it is just not possible to truly understand code without comments (or other form of documentation). This becomes obvious if you realize that code by itself can only tell you what is being done, but not why.


Huh? That URL is still alive: http://antirez.com/news/124


Maybe it's a buggy HTTPS redirect in GP's browser? Due to HSTS or something?

The HTTP version of your URL works fine. The HTTPS version has an expired cert. If I click past that, it redirects to redis.io, then gives a 404.

I know antirez is a smart guy, but if this is some kind of "I'm taking a bold stance against HTTPS" then I'd rather read the archive.org link with functioning HTTPS. My threat model prefers trusting archive.org to correctly reproduce the blog, over trusting everyone between me and antirez.com.


Wow, you just opened my eyes – I really thought he had deleted his blog! Must be the HTTPS Everywhere extension I'm using – though I really don't understand why it redirects me to redis.io just because antirez's TLS certificate is for redis.io, not antirez.com. I would really prefer a clear error message here.


The web server(s) at antirez.com are weirdly configured. So it’s not so easy for the web browser to display a very clear error message in this case.

If you run this command you can see a little more clearly what’s going on:

  wget --no-check-certificate -S -O - https://antirez.com/news/124
If you use the plain HTTP URL, everything works fine.

If you try https://antirez.com/news/124 the web server at antirez.com:443 will answer this:

  HTTP/1.1 301 Moved Permanently
  Server: nginx/1.10.2
  Date: Sat, 04 Sep 2021 18:15:46 GMT
  Content-Type: text/html
  Content-Length: 185
  Connection: keep-alive
  Location: https://redis.io/news/124
So it’s not HTTPS Everywhere that redirects, it’s the web server itself.

However, many clients won’t heed this response, since antirez.com:443 serves a certificate that’s not valid for that hostname.


Looks like the cert is for a different domain (redis.io) AND it expired August 7, 2020.


Let's hear more about your threat model that causes you to wory about nefarious content being injected into a blog post about comments in code


Not OP, but residential ISPs were caught injecting ads into HTTP sites at least as early as 2014. [0]

I certainly wouldn't trust Comcast to keep malicious ads out of their ad network, either.

[0] https://www.techdirt.com/articles/20140908/07191228453/comca...


How about not opening the gates for third parties to inject ads, trackers and crypto miners


I agree that in this case (reading an article) HTTP is not a huge threat. But in 2021 I consider HTTPS + HSTS to be a basic hygiene factor.


A bit snarky but I agree, it doesn’t seem like a huge threat. I’d love to be proven wrong though.


Huh? I've been getting

> Sorry, I can't find that page :-/

for several months


Works with pure HTTP. Try it in curl? Also see my sister comment for elaboration


For the first example with conditions, I much prefer rolling the "why" of the comments into boolean variable names where possible e.g.

   // We still have burst budget for *all* tokens requests.
   if self.one_time_burst >= tokens {
      ...
   } else {
      // We still have burst budget for *some* of the tokens requests.
becomes something like (I'm missing context but you get the idea):

   enoughBudgetForAllTokenRequests = self.one_time_burst >= tokens

   if enoughBudgetForAllTokenRequests
      ...
   else
      ...
I don't see this often though. I see the commenting pattern a lot where the comment usually duplicates the "what" of the conditional code, and it's often ambiguous if the comment is talking about the condition or a whole branch.

I think a descriptive variable makes the code easier to skim, it's more precise and less likely to become inaccurate. For a big complex conditional, you can break in up into several well-named boolean variables you compose together (but obviously comment the bits that can't be covered by a concise variable name).


Long variable names mean you perceive your simple code as being too complicated. It's an if statement. I would refactor your long variable name to something more compact like enough_budget.


enough_budget for what though? That's not a very clear variable name.


IMO that should be in a comment, instead of duplicating that information every time you mention the variable. (This applies more for local variables in short blocks than for, say, global variables.) At some point the length of the name makes everything around it harder to read, since it’s been pushed so far apart.


As a general rule I completely agree.

On the other hand, token bucket implementations are hard to understand and hard to debug. And an ambiguous variable name can cause trouble because some devs might think it "obviously" refers to enough budget for some other slice of requests.

I think my preferred solution here is to extract the condition into a boolean function with a long name like HasBudgetForAllRequests. This solves the long variable name issue and also gives you a useful place to add to your budget computation logic if you need to.


The words Has and For are unnecessary words in that variable name.

Think of code as mathematical notation. Some degree of familiarity with the domain is expected. You can use shorthand symbols or words as variable names.

Otherwise you're trying to write English sentences into compact notation and that's just noise. Hard to read.

Try read the A* algorithm when the variables have all been expanded out and you'll find it's impossible to see the forest for the trees. Sometimes G is the correct name for the collection of the cheapest scores known so far. It's compact and is best explained with a comment.


total_traffic_budget


I prefer this too, but sometimes poor type-narrowing support hinders it. For example, TypeScript only recently added support for aliased type-narrowing:

https://devblogs.microsoft.com/typescript/announcing-typescr...


This. And the extra documentation is provided by tests, which are living code and evolve together with the production code. That should be enough to understand the code.

The top of the file comment is good and well received. At the same time, you could have a wiki page with that info, where you can also have diagrams etc.


some types of comments are redundant.

These days I write my code as comments first and then write the code.

But for example this one

        // If either token bucket capacity or refill time is 0, disable limiting.
        if size == 0 || complete_refill_time_ms == 0 {
            return None;
        }
The comment is the same as the code. It is pointless. Instead I might say under what circumstances I expect them to be zero or why Im disabling limiting when they are zero.

This one is great, though at first glance I cant relate the actual code to the comment, at least I understand why the code is there.

            // We still have burst budget for *some* of the tokens requests.
            // The tokens left unfulfilled will be consumed from current `self.budget`.
            tokens -= self.one_time_burst;
            self.one_time_burst = 0;


In defense of the first comment, when first approaching a body of code like this, it's not always going to be clear what returning None is going to mean in any given method.

Also, one thing I think doesn't get mentioned enough is the use of comments as anchors for text search. Having "disable limiting" expressed in English that way makes it easier for newcomers to explore the codebase.


Good point. Also the phrases in the initial comment "token bucket capacity" and "refill time" may serve as hints and can be used as entry points to searching external documentation.


I agree, though I think it would be even better if variables were named "token bucket capacity" and "refill time" in the first place.


It’s hard to judge out of context. IMO None meaning “disable limiting” should be a docstring/comment at the method level, describing what its return values mean, not inline next to this code.


I don't think the comment is the same as the code. If "size" was called "token_bucket_capacity" I would agree but since it's not, if you remove the comment you lose some context.


> understand why the code is there

You hit the nail on the head here. This is the single most important reason for having comments in code (aside from just being a part of a coding standard).


The Protobuf documentation made me cringe. Every character of its "personality" adds a cognitive burden for the reader and a maintenance burden for anyone who has to update it. If a developer wants to be cute with the same brevity and clarity that the comments would have otherwise, fine. Otherwise it's self-indulgent performative trash that should be rejected in review.


Absolutely. That was not a good example for me.

> self-indulgent performative trash

A little harsher than I would have used; but now that you say it.

>that should be rejected in review.

Without question. I would have absolutely denied that until it was trimmed to at most 10% of what was there.


I just prefer nice, long, descriptive function names.

I was working on an unfamiliar codebase yesterday and had to work with the postMessage function. It was React, so one might assume that it sends the message to the backend, which was my initial assumption. Nope. It posted it to the page. Didn't help that another utility had another postMessage that was about what to do after the message was sent.


Better than writing really long descriptive function names is to not get carried away but write well described function names that show some intent or description and are chosen carefully. I worked on some code bases using long descriptive function names (inherited from another dev) and it wasn’t always useful. It was overkill. The descriptions were either too generic, too descriptive (redundant in half the cases) or handling such long saussage names was simply tiring. When you have to scroll right to read a name it becomes a problem..

The takeway is that merely writing very long descriptive names isn’t a silver bullet and could cause harm despite good intentions.

I suggest going back a after a few days for a refactoring (renaming functions or variables) and see how it reads to you. Then repeat the process a few days later when you’re even more detached from that code.

In many cases self documenting code is going to help you remember the model you had in your head when building it. And if done well it will help others as well, your code will end up not giving headaches to future maintainers.


x100 on intent!

Even something simple like “if the function starts with “is” it will return a Boolean has helped me organize things that return status codes and have side effects. Small thing, but I like it.


I worked with a guy who wrote a CPU simulator and exceeded the MSVC 256-char function name limit. It's hard when you like 80/100 col code and a function name is longer than that.


Does escape new line work in function names in C? Brb.

Edit: Hah. It does.


I have a simple rule to go by. Comments should describe “why” and the code should describe “what”.


While I agree with the general sentiment, the first example in the article IMO demonstrates how "what" comments make understanding code easier. When I first read that code, I was pleasantly surprised how easy it was to understand everything going on there, even though I had no familiarity with the code base. This is rare for non-trivial projects. I'm not sure if it could be made as clear without those comments.


Agree; could be very helpful if somebody not fam with token buckets arrives at this code for some reason and needs to make sense of it. Also, I think one person's "what" is another person's "why":

> if self.one_time_burst > 0 {

What is this code doing? It's setting up a flow control statement. It's dereferencing the "one_time_burst" property on the object self and comparing the value to 0. If the value is greater than 0, it will execute the proceeding code block.

Why is this being done?

> [to] consume the one-time-burst budget.


"Why not" comments are also very helpful

Self-explanatory code can't explain themselves if they do not exist


I upvoted you, but there are plenty of exceptions. For example:

* Common name of the algorithm you're using

* Reference to somewhere else in the code that's related

* "You may think this code only does X, but it really also does Y"

* "We used to also do XYZ here, but that was removed because of ABC"

and so on.


How do you know what a function does, without a "what" comment to explain it? Even "obvious" functions like Math.max have edge cases, and I rely on "what" comments to understand how those cases are handled.


WHY comments drastically improved my code.

I’ve similarly been paying much more attention to convey INTENT which I think is massively missing from computer science. The code might suck, but if the intent is clearly defined, it helps to refactor later.


I agree and first example in the article is all comments describing what the code is doing. There is not a single one explaining why.

Well I stopped reading further.


Agreed.

That's what works for me.


A lot of this I find to be kinda bad commenting, you've got to remember NOT to just directly comment on what the code is doing, but rather explaining why. With future updates your comments will eventually be wrong if they do anything other than give context.

If all your comment does is say what the code on the next line is doing, don't. Instead just try to make the next line more readable.


More and more I feel these huge comment blocks with the following inline comments sprinkled everywhere should really be a separate documentation file with its own structure where anyone not familiar enough can get up to speed, and understand the core principles behind the code written here.

For instance in the firecracker example, they don’t write 3 pages of block comments to explain the token bucket algorithm (or do they?), as anyone could go read the wikipedia article instead.

Comments living with the code have their benefits, but here I can’t imagine these huge comment blocks or algorithm related warnings get revised anytime outside of project wide refactoring.

What we also miss by having these comments inlined is the higher perspective of how it works with the other classes or how it fits in their running context. Anything that spans multiple class and/or methods makes it difficult to decide where to put the info and what scope it should have. People usually end up explaining the same things at 3 or 4 places in sublty different ways, which makes no one happy.

An exemple of that, this struct description

> /// TokenBucket provides a lower level interface to rate limiting with a /// configurable capacity, refill-rate and initial burst.

This comment is hinting at a sequence where ‘TokenBucket’ needs to be a configurable lower level interface. But what is that sequence, what needs it to be lower level, is there higher level interfaces ? I’ll sure end up spelunking the rest of the doc to get answers to that.


Exactly, I dont want to have to go in and refactor a bloated class into 4 classes and then have to go refactor the short story they wrote about it.


But the new class(es) needs to be documented regardless as the relation to the code has changed from the old.

Keeping documentation in another document would see this going out of sync over time until the document doesn't at all reflect how the code looks. Not updating anything at all will leave others clueless of why the change was needed 4 months later and might see the other classes grow to the same size. Thus, eventually the code base will once again be opaque unless you ask the nearest person that worked on it last.

A nicer way of handling the flow of the document would be to use a literate programming tool that allows for keeping code snippets in an order which makes sense for the documentation rather than sprinkling as comments between the code. That way, code review happens at the same time as documentation review with code being local to the text documenting it in a format which is easy to understand for both.


I like meta comments about the business or technical decisions behind the code. You can come up with clever method and variable names to help enlighten the reader, but nothing beats an intimate, first-hand account explaining the why.


I agree with others, code commentary shouldn't be a novel. Comments should only provide info that the code can't make easy to understand. Why it was done this way, what it's intended to do, complex or nonlinear interactions that aren't clear from the code. But adding things that make reading the code sound more like English isn't helpful, it's extra work and prone to getting out of sync.


The code should explain what it's doing (self documenting code) and tests should explain why it's doing it.

Comments tend to just become a place for misinformation or get disconnected from the actual logic.

Adding more comments sometimes doesn't clarify the situation, it just acts as a second source of truth.


> The code should explain what it's doing (self documenting code) and tests should explain why it's doing it.

I agree at a surface level, however:

  - this makes us assume that there are tests in the first place
  - this makes us assume that someone will read these tests instead of asking the developer
  - this makes us assume that these tests will be readable enough on their own to replace free form text
  - this makes us assume that these tests will even be able to contain a representation of why the code exists
  - personally, i have never seen a test that explains a business requirement with enough context not to miss out on details
  - i have also never seen code that encapsulates these requirements 1:1, without getting too mixed up in implementation details
I think that code definitely should describe what it's doing, tests should explain how it's doing it as far as possible (though this requires discipline to have loosely coupled and testable code, which isn't always the case) and that there still should be comments that explain the realities of needing technical solutions for people problems and all of the aspects that are carried with this.

Doing anything less feels like giving ourselves constraints that will lead to suboptimal results because the tooling and testing solutions aren't quite there yet. Maybe when frameworks and languages will become more expressive, we'll be able to live without comments. However, that day may not come anytime soon.

Nor will companies be interested in the overhead that writing extensive and good tests entail, nor will they want to wait for their engineers to figure out how to convey everything in computer code, as opposed to just dropping some comments into the codebase, right next to the actual code that they concern. Maybe when companies are ready for 50% of the development time to be spent on testing software, that won't be an issue. That day may also not come anytime soon.

As a possible counterpoint, look at RSpec, i think it's headed in the right direction: https://rspec.info/


> and tests should explain why it's doing it

I was going to say "So for performance "hacks" there should be a clean implementation that's benchmark against the current implementation for example?" as a way to disprove what you said, but while writing it I realized that it may actually be a pretty good idea.


Yes, there should pretty much always be this.

You've just written some messy unclear code for performance reasons, so it deserves more thorough tests than average.

And you've already written the clean version anyway, since you hopefully didn't write the optimized version until after you profiled the slow one. So it's not even significant extra work to turn that into your test suite.


This could also be a great way to check if your optimized version is still necessary with new versions of your language implementation. The more I think about it, the more it seems like a great idea.


I wouldn't generally throw benchmarking into the unit test process, because it can be brittle. What if something else clobbers the CPU on your test machine at just the wrong time. What you really want from your unit tests is for them to run quickly, so you run them as often as possible, and reliably, so any failure means something is wrong. I'd use the slow implementation just for checking correctness.

But separately, if you want to add a performance benchmarking process (great!), you have a reference baseline implementation for free.


Maybe not directly in the unit tests that you always run on your machine, but having them around could be a good idea.


Worth comparing: http://steve-yegge.blogspot.com/2008/02/portrait-of-n00b.htm... A "n00b" is scared of code, and wants as much help to understand it as possible. An "expert" already knows all they need to know to work on the code, and is more productive by putting as much code on the screen as they can.

Rather.. I think part of the point is, context matters. If the team is small, and the project is changing rapidly.. I think the needs and expectations for documentation are different than if the project or team is large and change is slow.


> an "expert"

Here's the problem: "Experts" (or "veterans" as the author calls them) are not created equal: There's the expert that already knows the code because she's worked on it countless times and already has a very detailed mental model of the code and there's the expert who's a very experienced programmer but has never seen the code before.

Even experts of the latter category appreciate good comments when they read code (surely not comments of the "n00b" kind portrayed in the article but comments nonetheless).


Time and active readers.

Want well documented (and tested and thought out) code?

Then give the coders people outside of their team / hierarchy who are going to check it / read it / use it.

And then give them time.

And repeat.


Don't sleep on /usr/share/doc/, /usr/share/info/, and /usr/share/man/ either. Code comments are not a replacement for real documentation that is written from the perspective of either a user or a developer. If you can somehow auto-generate meaningful documentation, that's great, but often whatever is auto-generated is only good in theory.


The view I've developed on docuemntation over time is that it should be addressed outside in. What I look for--and also create, where possible--is a high-level architecture diagram that depicts all major components of the system. Then comes some description of typical control flow through the system, either visually or via. textual description. For example, "the request hits the server, then it goes here, does that etc.).

What would be even better is to create a model of the system using something externally altogether, like TLA+. It need not be perfect right at the beginning, but over time, invariants can be put in place, and even possibly mechanically tested by a model checker. Just even the Next state disjunct would work great - it'll give me the superset of actions the system is capable of.

To me, well-commented functions are all well and good; but without a high level understanding of how everything fits and works together, I doubt it'd be very helpful. It's like winning the battle but losing the war.

I'm not a fan of relying on IDE-centric features either. What if my IDE is not yours?!


I had a lot of trouble reading that code. I kept getting distracted by the comments and missed the code. The code seems pretty self-explanatory. I have some changes I would make, but they really depend on performance vs depth (modulo the burst vs budget bug).

  // Hit the bucket bottom, let's auto-replenish and try again.
  self.auto_replenish();
Is ugly. Why repeat in English exactly what the code says? I don't even know what language this is, but I know it's replenishing the tokens in the bucket.

All of the comments in reduce are redundant, this is what I would write at the top, so that anyone using the method doesn't have to page through to figure out what it does, it also makes the "smells" more obvious:

  // reduce: attempt to consume tokens from bucket, 
  // starting with one_time_burst, overflowing to 
  // budget. 
  // (keep comment only if _expensive_):Performs
  // auto-replenish when
  // burst is emptied, and budget has insufficient
  // tokens.
  // Asserts size, but only on replenishment.
  // On OverConsumption, entire bucket is emptied. 
  // On Failure, only one_time_burst is emptied.
  // In all cases, tokens is reduced by amount 
  // fulfilled.
I don't understand why OverConsumption is different to Failure. Both will result in throttling by the caller. The reason for the difference should be documented.

I'm also curious why burst is consumed, then budget. I would expect _budget_ to be consumed first (with refill) with overflow into burst? My expectation is for burst and budget to have different refill schedules in auto_replenish, so using burst first would result in more failures by missing refill opportunities.

I'm also curious why the size assertion is only on requests that result in refills. Is it guaranteed that size = max(one_time_burst + budget)? Why care about the difference? Is hogging the tokens "bad"?

Then, I'd work to remove the need for those comments - particularly the need to explain what happens on error - which should be "nothing", or that the bucket is emptied of whatever is available, not something in between.

Finally I'd remove the mutation of tokens, returning either unfulfilled or fulfilled in the return value.


If we change the code, getting rid of if statements, it's easier to see what happens by removing if's.

I've changed BucketReduction to be a pair, and fixed the strange behavior on error. Swapping "fromBurst" and "auto_replenish" would reduce the number of throttling events. This solution does have more branches than the original, hence readability vs performance. It's still not exception safe, auto_replenish could throw and leave the state undefined.

    BucketReduction reduce(long tokens) {
        long fromBurst = min(tokens, burst);
        burst -= fromBurst;
        tokens -= fromBurst;

        if (tokens > budget) {
            auto_replenish();
        }

        long fromBudget = min(tokens, budget);
        budget -= fromBudget;
        tokens -= fromBudget;

        long totalConsumed = fromBurst + fromBudget;
        if (tokens == 0) {
            return new BucketReduction(Success, totalConsumed);
        }

        if (totalConsumed + tokens > size) {
            return new BucketReduction(OverConsumption, totalConsumed);
        }

        return new BucketReduction(Failure, totalConsumed);
    }


This is a great comment: instead of making generic arguments, you actually tried to show how to do it better. Thank you.

I don't find the comments in the original code distracting, but I do like your version better.

> I'm also curious why burst is consumed, then budget. I would expect _budget_ to be consumed first (with refill) with overflow into burst? My expectation is for burst and budget to have different refill schedules in auto_replenish, so using burst first would result in more failures by missing refill opportunities.

This behavior is documented in the public API [0], so whatever is the reason why it was chosen, I don't think it can ever be changed.

> I don't understand why OverConsumption is different to Failure. Both will result in throttling by the caller. The reason for the difference should be documented.

My understanding is this. If the number of tokens requested is greater than the remaining budget but less than the size of the bucket, the call is rejected and the caller is blocked until it has enough tokens. But if the number of requested tokens is greater than the size of the bucket, the caller will never have enough tokens. Instead of blocking the caller forever, the rate limiter lets the call go through, but then blocks the caller for a while to compensate for the over-consumption. Here's the handling [1]. I wish it was documented better.

[0] https://github.com/firecracker-microvm/firecracker/blob/fc2e...

[1] https://github.com/firecracker-microvm/firecracker/blob/2f92...


There are too many comments in there.

Comments are not the first tool people should try to reach because:

1. That's a lazy way to make the code clear, most code should be self-descriptive. If the teams are spamming comments it's likely to be a bad code base already.

2. There's no guarantee the comment is compatible with the code: It's natural language and there's no obvious way to test that. After versions of modification, the comments might still look like they make sense but that's not how the code work anymore.

Two alternatives:

1. Declarative programming: Try to leverage declarative programming styles more. For example, top-down style dynamic programming or memoization is always more readable and less error-prone than the bottom-up ones. You don't have to fully adopt functional programming or logic programming, but do take some inspiration from them.

2. Testing. Tests are like comments, they reflect some meaning of the code. Tests are not like comments, if the code breaks, they break.

Of course, comments are very valuable, but it's most valuable when applies in the right spots only.


When I see code with a lot of inline comments, I can’t help but think that a lot of them would make sense to implement as trace log messages instead. If you’re going to write all these messages documenting the flow of the program, why not give yourself the ability to flip a switch and output them somewhere?


I have a really similar motivation behind a project that aims to teach Jetpack Compose in the same fashion and funny enough, the name is also pretty similar to the title of this thread - Learn Jetpack Compose By Example. You can find it here - https://github.com/vinaygaba/Learn-Jetpack-Compose-By-Exampl...

There's a ton of comments in every example (even though they might be redundant across the examples) and the idea is that you should be able to learn this new framework by merely reading through the examples.


On my team fit/engineering values interview I always asking about benefits of autotests. People who said that tests written alongside the code are the best documentation are getting a mental "plus one point".


I have seen two extremes on this over the years... comment a lot, or do not comment at all. In the teams I work in, I would love to see a more moderate and practical view on the matter.

I get the whole document with your code, and practice that. But sometimes we end up with variable names that get very long and start leading not the most aesthetic code that gets formatted in a very vertical way.

Don't know the right answer, but suspect the two extreme views are not correct.


IMO, comments are needed when the code itself can't express something necessary to understand its overall function.

I've found that hence e.g. for rather complex statements where the code isn't lacking information, not commenting is the correct approach.

Whereas sometimes, when I'm forced to write exceptional code because of e.g. a workaround, a comment can nicely introduce my future self and other readers to the context.


Well documented code is respectful of the maintainer's time. Nobody wants to read a monologue full of noise to understand what is happening.

Comments are not monologues. Explain what the line you are commenting does, and perhaps elaborate why. But always acknoweledge: more words = more patience required = more maintenance cost.

Also, nobody will care about what your name is in 2 years. Do not make the code about you.


On the other hand, if something is missing in the comments, it could take days to find out exactly what you need. Maybe a "level of comments" like log levels could help here?


Making it easy to hide comments == making it easy to forget to update comments.


That's a good point. Maybe add a check if the function changed and the comment didn't to remind people? But comment are already easy to forget to update, as they are not code.


I find that doing a "headerdoc" kind of thing works for me.

I did write up a thing about it, but it's a long read, so folks don't usually read it: https://littlegreenviper.com/miscellany/leaving-a-legacy/

Has some pretty heavy-duty examples.


I went from my last job where every method and variable declaration had to be commented to my current job where there isn't a specific rule against comments in code but if you put in a PR with comments in it, you're going to get rejected after being laughed at.

I thought I would miss it, but I don't miss the documentation. Code doesn't lie.


But code doesn't always explain why it is the way it is.


It does a hell of a lot better job than comments.

I can't believe you've got me justifying this policy. But it does seem to work. shrugs


You really can’t believe someone wants you to justify a stupid rule and toxic workplace environment?


Sounds like a toxic environment.


Comments like

x = 4 # Sets x to 4

Are of course silly and I'd laugh at them too, but comments when used correctly are hugely valuable.

Even a stale comment can be helpful in giving you a sense of what people were trying to do.

Do you not use commit messages either? Do you have any documentation at all?


You're an idiot.


I was not very impressed by the token bucket example. Firstly, it is totally ridiculous to need to write your own implementation of gcd and the comment is silly and unnecessary.

Secondly, the implementation for the token bucket seems to just be bad. A simpler implementation is as follows:

- Input number of tokens and refill time.

- Output the constant state describing the limiter: the max number of tokens and the time to get a single token (= refill time / size, ignoring rounding issues which shouldn’t generally matter if your times are in nanos)

- The only mutable state is a single time instant: the time when the limiter becomes full

- to attempt to take a token, compute max(0,full_time - now) / new_token_time and if this is less than or equal to the size of the bucket minus the number of tokens you want, you may withdraw and set full_time := max(full_time, now) + tokens_to_withdraw * new_token_time

If that division is concerning then you should either attempt to withdraw from the rate limiter less frequently or you should not be using one at all.

The rest of the code is about scheduling events to happen when there are free tokens in the ratelimiter. This is a different problem and also over-complicated. You should either poll and retry periodically with your most important event (rather than whatever event you thought was most important when the bucket became empty) though you then want to be careful about stale tasks never running, or you should just have a queue of pairs (Task, cost) where cost is an int. after withdrawing tokens from the ratelimiter or when the queue transitions from empty to non-empty, peek the top item and set a timer for rl.full_time - rl.new_token_time * (rl.size - cost) and when the timer fires (or immediately if the time was in the past) you can dequeue, withdraw cost tokens, and run that top task (then repeat.)


The first has an enormous amount of visual noise and it's definitely over commented.

The second one is ok, but please don't follow the first one as a good example, it takes away the ability to scan the code


> The second one is ok, but please don't follow the first one as a good example, it takes away the ability to scan the code.

IMO if at all, this type of comment ("Why was X designed this way") should go to the very bottom of the file (maybe with a very short comment at the top of the file referencing it), so it doesn't bother anyone who works on the code on a regular basis. And one could (should) also decrease its verbosity.

EDIT: Confused first and second example


A lot of code files starts with lengthy copyright notices, and no one seems to mind. I find comments at the top very easy to ignore if I work on that code regularly. Also, I don't think I've ever seen comments at the bottom of a file!

I wonder if Protobuf wasn't open-sourced by Google, that comment would've been replaced with a link to a design doc. Google docs weren't a thing in early 2000s when Protobuf was originally written though.


> A lot of code files starts with lengthy copyright notices, and no one seems to mind.

Hmm, maybe I'm the odd one out but I actually find them quite annoying if they're longer than just a few lines.

In any case: People often just scroll past and ignore lengthy comments at the top (maybe precisely because they're used to copyright notices), so I think a short comment at the top like "For design considerations and reasons this code has been written the way it is, see the very bottom of this file" will attract much more attention.

> Also, I don't think I've ever seen comments at the bottom of a file!

True, but how often do you see code comments about design considerations in the first place?


I agree that some comments in the first example are unnecessary, e.g. the one pointed out in [0], but I find comments like that at worst useless, but not harmful. They have a potential to become harmful if someone updates the code but doesn't bother to update the comment, but I can't see how it takes away the ability to scan the code.

[0]: https://news.ycombinator.com/item?id=28416777


It’s tricky, I think I remember reading a paper by Microsoft research (I’m really sorry; looking for a source) that showed a correlation between bugs and all lines of code including comments.

So if that’s true redundant comments do hurt you.

Anecdotally in dynamic languages incorrect comments related to types seem to be a constant pain for me. They’re unverified and so a second source of truth that often wrong and can’t be trusted.


> correlation

> So if that’s true redundant comments do hurt you

Anyway, even if there was causation proven it might need to be weighed against other priorities.


The "first things first" comment was bad, but I'd rather have it + the other comments than have none of them.


IMHO writing a well-documented code requires a discipline and a set of good habbits.

One of those habbits is to update documentation (tests/comments) when software is evolving.

Unfortunately it is a very rare habbit it seems.


If you must dump a 5-paragraph essay about why something exists, you put it in the git commit message so when you git blame those lines, you see the history of the code.

These examples are a fucking nightmare.


Code Catalog has such great potential. I really hope to see it succeed. Thanks to all of the contributors who are making it possible.


I like the idea of codecatalog. Would be nice to see some functional Kotlin examples that go beyond list processing.


Thanks! If you know some good examples, please consider proposing them: https://github.com/ainzzorl/goodcode/issues/new?assignees=&l...


i've found that well written code + useful comments + comment with link to design document works pretty well.

i got an email about code i'd written 10 years ago...thanking me for taking the time to document everything.


I wish there was an option for dark mode or something.


Way to much




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: