Personally I found most of those examples to be not great. I think they would be improved better names overall, more well named private methods and most importantly well structured companion test classes proving the behavior.
I really dont want to read another developers comments on their mental state while I am feverishly debugging their code during a 3am page.
The number of times I have been misled by an outdated comment. I generally open git blame to see if the comments predate any code now. Tests on the other hand almost never lie.
Readmes or comments explaining why something exists vs what it allegedly does are generally much better. If you can’t write expressive code why would you assume your English is much better?
The problem with all discussions of good commenting style is that it's like debating good footware. Good for what? Office? Hiking in snow? Walking on the beach? Except that we have different footware for different uses but only one set of comments.
Having such a limited commenting syntax is like assuming one permanent type of footware and debating which one would be best. I'd much rather have a smarter commenting system.
I'm not talking about supplementary docs stashed elsewhere to gather dust. I'd like to have expandable comments right where they are needed but usable differently for a glance at familiar code or puzzling through unfamiliar code. Something very brief that doesn't clutter the code, a couple of words or no words at all, with an expansion arrow to something more detailed, expanding to something fully documented with links to everything.
And comments with scope (this comment applies to this line, this one to the function, this to a group of functions...) that actively use the scope. By active I mean that if you make any edits to code in the scope of a comment, that comment changes color or some other mechanism similar to a Git merge conflict: You changed the code without changing the comment, so check and make sure the comment says what you want and click OK. You don't have to fix the comments immediately for each edit. Make a few changes, test, make sure it's working, but when it is, you can see at a glance which comments need to be checked.
The general idea is instead of debating The Best Way to Comment, consider ways to improve the commenting system itself.
> (this comment applies to this line, this one to the function, this to a group of functions...)
This is probably the major reason I much prefer introducing new variables and functions solely so I can give them descriptive names over using comments. Comments aren't scoped so they're often ambiguous and imprecise compared to code, which also contributes to them becoming out-of-sync with the code over time. Comments above a function definition are generally better than comments inside a function for example as it's clearer what the scope is.
>The number of times I have been misled by an outdated comment
Imo a problem with code review and not leaving comments. Just like tests need changed during refactoring, documentation and comments should also be cleaned up as part of the change
I have found those as well but place the blame on the person who made the subsequent change and ignored them
There’s not many code review platforms that are great at showing you relevant comments in a diff because they are often in a much different part of the file, so I kind of think it’s hopeless in a large code base to catch this sort of thing in a code review
I was mostly thinking about how small changes can sometimes touch a larger quantity of files on moderate or large code bases, so a bit more things to wade through when doing a review, but point taken.
Comments are "reached for rarely and last". Whatever shortcomings comments have pale in comparison to being faced with and trying to figure out undocumented or too sparsely commented code, which, in my long experience, has been an order of magnitude more prevalent than sufficiently documented/readable code. Heck, even a wrong comment is sometimes preferable to no comment if it at least gives me a clue as to what the purpose of that section of code was at some point in its evolution.
I personally strive for code that can be read as a sentence, especially in complicated sections of code.
This sometimes leads to weird intermediate variables which might look superfluous, e.g. intermediates like:
foo_has_at_least_1 = len(foo) >= 1
(not a great example, usually it’s a bit more semantics. than that) but later code using those intermediates reads as english,
which can reduce cognitive load when reading the code and obviate comments, and preconditions are more readily verified.
It’s literate programming and I prefer that to long comment blocks like those in some
of the examples. All that said, it can be hard to get coworkers into such a habit, especially those who feel brevity and abbreviation is a virtue.
The problem comes not with what the code does but why the code exists in the first place.
The only time I ever feel the need to leave comments is to explain code that doesn't appear to be needed or isn't obvious why it exists. Usually this comes from agreed upon technical debt, short notice requirements with tight deadlines, or some other form of tradeoff.
"We're doing a simple check and throwing an exception here since we have work to introduce a more comprehensive permission system in xyz epic"
or something like
"There's a complicated race condition in xyz so we're doing a simple sleep here since the proper fix involves introducing a distributed lock"
Sure those types of things are cases you want to avoid but unfortunately you occasionally end up there anyway and leaving a short explanation about a compromise is better than ambiguous code that appears to serve no purpose
I dont think that held in the examples or a lot of what I have seen in practice once comments become the norm. Most classes should not need a comment in them at all. Most methods should be self-explanatory. A comment detailing every line of code is a huge smell.
I agree, but git and most code reviewing tools I've used only show a few lines of surrounding context so the reviewers might not even realize there's a comment further up that needs updating.
Generally speaking, for all but the smallest of changes you should never only review the diff, but pop open the actual file on that branch (GitHub makes this fairly easy, I can't speak to other tooling) and consider the change in the context of the surrounding code. Does it contradict comments? Needlessly duplicate code from elsewhere in the file instead of creating a single function to be used in both places? Shadow variables confusingly? Add yet another 5 lines to a 100 line function instead of leaving the code better than it found it?
You can't really judge any of this if you're just eyeballing a diff. Diffs are just the starting point for a review, and while tooling could make this easier, the full file is always just a git checkout away.
In a vacuum I like them and when I bump into a piece of 10 year old code commented like this it is great.
There is one caveat however, they have to be meticulously maintained and always updated to reflect every change made to the code. It is a huge caveat because more often than not the comments are out dated and no longer reflect the actual code and at that point they are detrimental.
That's why Knuth had the idea of creating literate programs where documentation and code was maintained in a single block. Unfortunately, this idea was never used in large scale.
Yeah, I think there's a fundamental problem with the implementation -- with small, single-file programs where the code can be written in a chronologically linear manner it works well. But with that there's no need for reordering logic, so much of the more complex parts of weaving are redundant. Scripts, for example, are pretty easy to set up as [for example] markdown files, which works great (particularly in a project that requires a set of helper scripts which need why communicated -- having a set of markdown files in a folder makes it IME much more ergonomic for other devs). Anything larger and it gets increasingly fiddly and (again IME) not worth it.
In my experience, it's much easier to get just the right level of abstraction for quick reader comprehension in natural English, compared to any programming language.
I have to agree. Unit tests are some of the most helpful things I’ve used whilst refactoring the Libreoffice vcl module. In fact, I’m now at the stage where if I only move a class into a new header I’m writing a test that covers that class.
Yup, the rate limiter comments are unhelpful. A top level comment explaining the idea is fine (and very useful) but code should largely be self documenting.
To me, comments in code are like descriptions in a novel.
I would agree that comments which unnecessarily describe what is to come are bad, in the same way that a novelist should prefer to show, not tell, unless there is a compelling reason for the reader to keep that description in mind as the immediate events unfold.
But to require almost all comments gone in favour of well-named code and external docs feels wrong too, in the same way that one would probably not like to read a book with no chapter headings, no visuals, simply describing the story in a dry (but informative) language, and relying on cliffnotes for actual context.
And if you say code is more like a technical report than a novel, I will disagree. Good code is always a pleasure to read, and typically for similar reasons that a good novel is pleasurable.
Agreed, I think that the code itself should be readable through better names, with the comments being a bonus rather than critical to the understanding of the code.
Speaking of source code comments, antirez (of Redis fame) wrote a fantastic article[0] about that topic some time ago and I still recommend it to colleagues whenever they make the hollow statement that "code should and can be intelligible on its own, without any comments".
I love this line: "Comments are rubber duck debugging on steroids, except you are not talking with a rubber duck, but with the future reader of the code, which is more intimidating than a rubber duck, and can use Twitter."
I've forgotten who first wrote this, so I can't credit them, but there's a saying that if someone can't write clear, clean, and understandable code, why does anyone expect them to be able to write clear, clean, and understandable English (or whatever language)?
It's quite a bit easier to write something the compiler or runtime can understand, there's no need for meaningful names, consistent structure, or conceptual coherence. As long as it builds and passes whatever tests, the computer will accept it.
> if someone can't write clear, clean, and understandable code, why does anyone expect them to be able to write clear, clean, and understandable English
Isn't the answer to that obvious? The code is significantly constrained in form by having to be executable. The comments have no such restriction.
While surely not the source of the saying, antirez argues in a similar direction:
> You may think that writing comments is a lesser noble form of work. After all you can code! However consider this: code is a set of statement and function calls, or whatever your programming paradigm is. Sometimes such statements do not make much sense, honestly, if the code is not good. Comments require always to have some design process ongoing, and to understand the code you are writing in a deeper sense. On top of that, in order to write good comments, you have to develop your writing skills. The same writing skills will assist you writing emails, documentation, design documents, blog posts, and commit messages.
In any case, you're making a good point, though I would argue that the reverse of the saying (with the modification that "expectations" get replaced with "trust") also holds:
If someone is not able to write intelligible, stringent English, I wouldn't trust them to write good code, either, because writing good code requires similar qualities (good stringent structure, empathizing with the future reader of your code etc.) and, as antirez argues convincingly, it also requires writing comments (which are in plain English).
> if someone can't write clear, clean, and understandable code, why does anyone expect them to be able to write clear, clean, and understandable English (or whatever language)?
The comments don't say the same thing as the code. In the comments, the author might tell us - albeit not clearly or in proper English, what they want to achieve, or why they're doing it this way.
If someone takes a dozen lines of convoluted code to express what can be done in 2-3 lines concisely, the explanation isn't going to help. Think of the worst mess of if/then/else clauses you've seen, or someone who switches on a type instead of using polymorphism. Even if they can explain it well enough that you can follow it, the comments aren't going to improve the understanding much. They are more likely to add cognitive load and make it harder to grasp the intent.
I'm not sure we agree. But I guess the question comes down to what you and I mean by "intelligible".
antirez provides several examples where the code without the comments would be much harder to understand and/or maintain. Whenever this is the case, I would argue that comments are a must and without them the code is at least barely intelligible.
Very often it is just not possible to truly understand code without comments (or other form of documentation). This becomes obvious if you realize that code by itself can only tell you what is being done, but not why.
Maybe it's a buggy HTTPS redirect in GP's browser? Due to HSTS or something?
The HTTP version of your URL works fine. The HTTPS version has an expired cert. If I click past that, it redirects to redis.io, then gives a 404.
I know antirez is a smart guy, but if this is some kind of "I'm taking a bold stance against HTTPS" then I'd rather read the archive.org link with functioning HTTPS. My threat model prefers trusting archive.org to correctly reproduce the blog, over trusting everyone between me and antirez.com.
Wow, you just opened my eyes – I really thought he had deleted his blog! Must be the HTTPS Everywhere extension I'm using – though I really don't understand why it redirects me to redis.io just because antirez's TLS certificate is for redis.io, not antirez.com. I would really prefer a clear error message here.
For the first example with conditions, I much prefer rolling the "why" of the comments into boolean variable names where possible e.g.
// We still have burst budget for *all* tokens requests.
if self.one_time_burst >= tokens {
...
} else {
// We still have burst budget for *some* of the tokens requests.
becomes something like (I'm missing context but you get the idea):
enoughBudgetForAllTokenRequests = self.one_time_burst >= tokens
if enoughBudgetForAllTokenRequests
...
else
...
I don't see this often though. I see the commenting pattern a lot where the comment usually duplicates the "what" of the conditional code, and it's often ambiguous if the comment is talking about the condition or a whole branch.
I think a descriptive variable makes the code easier to skim, it's more precise and less likely to become inaccurate. For a big complex conditional, you can break in up into several well-named boolean variables you compose together (but obviously comment the bits that can't be covered by a concise variable name).
Long variable names mean you perceive your simple code as being too complicated. It's an if statement. I would refactor your long variable name to something more compact like enough_budget.
IMO that should be in a comment, instead of duplicating that information every time you mention the variable. (This applies more for local variables in short blocks than for, say, global variables.) At some point the length of the name makes everything around it harder to read, since it’s been pushed so far apart.
On the other hand, token bucket implementations are hard to understand and hard to debug. And an ambiguous variable name can cause trouble because some devs might think it "obviously" refers to enough budget for some other slice of requests.
I think my preferred solution here is to extract the condition into a boolean function with a long name like HasBudgetForAllRequests. This solves the long variable name issue and also gives you a useful place to add to your budget computation logic if you need to.
The words Has and For are unnecessary words in that variable name.
Think of code as mathematical notation. Some degree of familiarity with the domain is expected. You can use shorthand symbols or words as variable names.
Otherwise you're trying to write English sentences into compact notation and that's just noise. Hard to read.
Try read the A* algorithm when the variables have all been expanded out and you'll find it's impossible to see the forest for the trees. Sometimes G is the correct name for the collection of the cheapest scores known so far. It's compact and is best explained with a comment.
I prefer this too, but sometimes poor type-narrowing support hinders it. For example, TypeScript only recently added support for aliased type-narrowing:
This. And the extra documentation is provided by tests, which are living code and evolve together with the production code.
That should be enough to understand the code.
The top of the file comment is good and well received. At the same time, you could have a wiki page with that info, where you can also have diagrams etc.
These days I write my code as comments first and then write the code.
But for example this one
// If either token bucket capacity or refill time is 0, disable limiting.
if size == 0 || complete_refill_time_ms == 0 {
return None;
}
The comment is the same as the code. It is pointless. Instead I might say under what circumstances I expect them to be zero or why Im disabling limiting when they are zero.
This one is great, though at first glance I cant relate the actual code to the comment, at least I understand why the code is there.
// We still have burst budget for *some* of the tokens requests.
// The tokens left unfulfilled will be consumed from current `self.budget`.
tokens -= self.one_time_burst;
self.one_time_burst = 0;
In defense of the first comment, when first approaching a body of code like this, it's not always going to be clear what returning None is going to mean in any given method.
Also, one thing I think doesn't get mentioned enough is the use of comments as anchors for text search. Having "disable limiting" expressed in English that way makes it easier for newcomers to explore the codebase.
Good point. Also the phrases in the initial comment "token bucket capacity" and "refill time" may serve as hints and can be used as entry points to searching external documentation.
It’s hard to judge out of context. IMO None meaning “disable limiting” should be a docstring/comment at the method level, describing what its return values mean, not inline next to this code.
I don't think the comment is the same as the code. If "size" was called "token_bucket_capacity" I would agree but since it's not, if you remove the comment you lose some context.
You hit the nail on the head here. This is the single most important reason for having comments in code (aside from just being a part of a coding standard).
The Protobuf documentation made me cringe. Every character of its "personality" adds a cognitive burden for the reader and a maintenance burden for anyone who has to update it. If a developer wants to be cute with the same brevity and clarity that the comments would have otherwise, fine. Otherwise it's self-indulgent performative trash that should be rejected in review.
I just prefer nice, long, descriptive function names.
I was working on an unfamiliar codebase yesterday and had to work with the postMessage function. It was React, so one might assume that it sends the message to the backend, which was my initial assumption. Nope. It posted it to the page. Didn't help that another utility had another postMessage that was about what to do after the message was sent.
Better than writing really long descriptive function names is to not get carried away but write well described function names that show some intent or description and are chosen carefully. I worked on some code bases using long descriptive function names (inherited from another dev) and it wasn’t always useful. It was overkill. The descriptions were either too generic, too descriptive (redundant in half the cases) or handling such long saussage names was simply tiring. When you have to scroll right to read a name it becomes a problem..
The takeway is that merely writing very long descriptive names isn’t a silver bullet and could cause harm despite good intentions.
I suggest going back a after a few days for a refactoring (renaming functions or variables) and see how it reads to you. Then repeat the process a few days later when you’re even more detached from that code.
In many cases self documenting code is going to help you remember the model you had in your head when building it. And if done well it will help others as well, your code will end up not giving headaches to future maintainers.
Even something simple like “if the function starts with “is” it will return a Boolean has helped me organize things that return status codes and have side effects. Small thing, but I like it.
I worked with a guy who wrote a CPU simulator and exceeded the MSVC 256-char function name limit. It's hard when you like 80/100 col code and a function name is longer than that.
While I agree with the general sentiment, the first example in the article IMO demonstrates how "what" comments make understanding code easier. When I first read that code, I was pleasantly surprised how easy it was to understand everything going on there, even though I had no familiarity with the code base. This is rare for non-trivial projects. I'm not sure if it could be made as clear without those comments.
Agree; could be very helpful if somebody not fam with token buckets arrives at this code for some reason and needs to make sense of it. Also, I think one person's "what" is another person's "why":
> if self.one_time_burst > 0 {
What is this code doing? It's setting up a flow control statement. It's dereferencing the "one_time_burst" property on the object self and comparing the value to 0. If the value is greater than 0, it will execute the proceeding code block.
How do you know what a function does, without a "what" comment to explain it? Even "obvious" functions like Math.max have edge cases, and I rely on "what" comments to understand how those cases are handled.
I’ve similarly been paying much more attention to convey INTENT which I think is massively missing from computer science. The code might suck, but if the intent is clearly defined, it helps to refactor later.
A lot of this I find to be kinda bad commenting, you've got to remember NOT to just directly comment on what the code is doing, but rather explaining why. With future updates your comments will eventually be wrong if they do anything other than give context.
If all your comment does is say what the code on the next line is doing, don't. Instead just try to make the next line more readable.
More and more I feel these huge comment blocks with the following inline comments sprinkled everywhere should really be a separate documentation file with its own structure where anyone not familiar enough can get up to speed, and understand the core principles behind the code written here.
For instance in the firecracker example, they don’t write 3 pages of block comments to explain the token bucket algorithm (or do they?), as anyone could go read the wikipedia article instead.
Comments living with the code have their benefits, but here I can’t imagine these huge comment blocks or algorithm related warnings get revised anytime outside of project wide refactoring.
What we also miss by having these comments inlined is the higher perspective of how it works with the other classes or how it fits in their running context. Anything that spans multiple class and/or methods makes it difficult to decide where to put the info and what scope it should have. People usually end up explaining the same things at 3 or 4 places in sublty different ways, which makes no one happy.
An exemple of that, this struct description
> /// TokenBucket provides a lower level interface to rate limiting with a
/// configurable capacity, refill-rate and initial burst.
This comment is hinting at a sequence where ‘TokenBucket’ needs to be a configurable lower level interface. But what is that sequence, what needs it to be lower level, is there higher level interfaces ? I’ll sure end up spelunking the rest of the doc to get answers to that.
But the new class(es) needs to be documented regardless as the relation to the code has changed from the old.
Keeping documentation in another document would see this going out of sync over time until the document doesn't at all reflect how the code looks. Not updating anything at all will leave others clueless of why the change was needed 4 months later and might see the other classes grow to the same size. Thus, eventually the code base will once again be opaque unless you ask the nearest person that worked on it last.
A nicer way of handling the flow of the document would be to use a literate programming tool that allows for keeping code snippets in an order which makes sense for the documentation rather than sprinkling as comments between the code. That way, code review happens at the same time as documentation review with code being local to the text documenting it in a format which is easy to understand for both.
I like meta comments about the business or technical decisions behind the code. You can come up with clever method and variable names to help enlighten the reader, but nothing beats an intimate, first-hand account explaining the why.
I agree with others, code commentary shouldn't be a novel. Comments should only provide info that the code can't make easy to understand. Why it was done this way, what it's intended to do, complex or nonlinear interactions that aren't clear from the code. But adding things that make reading the code sound more like English isn't helpful, it's extra work and prone to getting out of sync.
> The code should explain what it's doing (self documenting code) and tests should explain why it's doing it.
I agree at a surface level, however:
- this makes us assume that there are tests in the first place
- this makes us assume that someone will read these tests instead of asking the developer
- this makes us assume that these tests will be readable enough on their own to replace free form text
- this makes us assume that these tests will even be able to contain a representation of why the code exists
- personally, i have never seen a test that explains a business requirement with enough context not to miss out on details
- i have also never seen code that encapsulates these requirements 1:1, without getting too mixed up in implementation details
I think that code definitely should describe what it's doing, tests should explain how it's doing it as far as possible (though this requires discipline to have loosely coupled and testable code, which isn't always the case) and that there still should be comments that explain the realities of needing technical solutions for people problems and all of the aspects that are carried with this.
Doing anything less feels like giving ourselves constraints that will lead to suboptimal results because the tooling and testing solutions aren't quite there yet. Maybe when frameworks and languages will become more expressive, we'll be able to live without comments. However, that day may not come anytime soon.
Nor will companies be interested in the overhead that writing extensive and good tests entail, nor will they want to wait for their engineers to figure out how to convey everything in computer code, as opposed to just dropping some comments into the codebase, right next to the actual code that they concern. Maybe when companies are ready for 50% of the development time to be spent on testing software, that won't be an issue. That day may also not come anytime soon.
As a possible counterpoint, look at RSpec, i think it's headed in the right direction: https://rspec.info/
I was going to say "So for performance "hacks" there should be a clean implementation that's benchmark against the current implementation for example?" as a way to disprove what you said, but while writing it I realized that it may actually be a pretty good idea.
You've just written some messy unclear code for performance reasons, so it deserves more thorough tests than average.
And you've already written the clean version anyway, since you hopefully didn't write the optimized version until after you profiled the slow one. So it's not even significant extra work to turn that into your test suite.
This could also be a great way to check if your optimized version is still necessary with new versions of your language implementation. The more I think about it, the more it seems like a great idea.
I wouldn't generally throw benchmarking into the unit test process, because it can be brittle. What if something else clobbers the CPU on your test machine at just the wrong time.
What you really want from your unit tests is for them to run quickly, so you run them as often as possible, and reliably, so any failure means something is wrong. I'd use the slow implementation just for checking correctness.
But separately, if you want to add a performance benchmarking process (great!), you have a reference baseline implementation for free.
Worth comparing: http://steve-yegge.blogspot.com/2008/02/portrait-of-n00b.htm... A "n00b" is scared of code, and wants as much help to understand it as possible. An "expert" already knows all they need to know to work on the code, and is more productive by putting as much code on the screen as they can.
Rather.. I think part of the point is, context matters. If the team is small, and the project is changing rapidly.. I think the needs and expectations for documentation are different than if the project or team is large and change is slow.
Here's the problem: "Experts" (or "veterans" as the author calls them) are not created equal: There's the expert that already knows the code because she's worked on it countless times and already has a very detailed mental model of the code and there's the expert who's a very experienced programmer but has never seen the code before.
Even experts of the latter category appreciate good comments when they read code (surely not comments of the "n00b" kind portrayed in the article but comments nonetheless).
Don't sleep on /usr/share/doc/, /usr/share/info/, and /usr/share/man/ either. Code comments are not a replacement for real documentation that is written from the perspective of either a user or a developer. If you can somehow auto-generate meaningful documentation, that's great, but often whatever is auto-generated is only good in theory.
The view I've developed on docuemntation over time is that it should be addressed outside in. What I look for--and also create, where possible--is a high-level architecture diagram that depicts all major components of the system. Then comes some description of typical control flow through the system, either visually or via. textual description. For example, "the request hits the server, then it goes here, does that etc.).
What would be even better is to create a model of the system using something externally altogether, like TLA+. It need not be perfect right at the beginning, but over time, invariants can be put in place, and even possibly mechanically tested by a model checker. Just even the Next state disjunct would work great - it'll give me the superset of actions the system is capable of.
To me, well-commented functions are all well and good; but without a high level understanding of how everything fits and works together, I doubt it'd be very helpful. It's like winning the battle but losing the war.
I'm not a fan of relying on IDE-centric features either. What if my IDE is not yours?!
I had a lot of trouble reading that code. I kept getting distracted by the comments and missed the code. The code seems pretty self-explanatory. I have some changes I would make, but they really depend on performance vs depth (modulo the burst vs budget bug).
// Hit the bucket bottom, let's auto-replenish and try again.
self.auto_replenish();
Is ugly. Why repeat in English exactly what the code says? I don't even know what language this is, but I know it's replenishing the tokens in the bucket.
All of the comments in reduce are redundant, this is what I would write at the top, so that anyone using the method doesn't have to page through to figure out what it does, it also makes the "smells" more obvious:
// reduce: attempt to consume tokens from bucket,
// starting with one_time_burst, overflowing to
// budget.
// (keep comment only if _expensive_):Performs
// auto-replenish when
// burst is emptied, and budget has insufficient
// tokens.
// Asserts size, but only on replenishment.
// On OverConsumption, entire bucket is emptied.
// On Failure, only one_time_burst is emptied.
// In all cases, tokens is reduced by amount
// fulfilled.
I don't understand why OverConsumption is different to Failure. Both will result in throttling by the caller. The reason for the difference should be documented.
I'm also curious why burst is consumed, then budget. I would expect _budget_ to be consumed first (with refill) with overflow into burst? My expectation is for burst and budget to have different refill schedules in auto_replenish, so using burst first would result in more failures by missing refill opportunities.
I'm also curious why the size assertion is only on requests that result in refills. Is it guaranteed that size = max(one_time_burst + budget)? Why care about the difference? Is hogging the tokens "bad"?
Then, I'd work to remove the need for those comments - particularly the need to explain what happens on error - which should be "nothing", or that the bucket is emptied of whatever is available, not something in between.
Finally I'd remove the mutation of tokens, returning either unfulfilled or fulfilled in the return value.
If we change the code, getting rid of if statements, it's easier to see what happens by removing if's.
I've changed BucketReduction to be a pair, and fixed the strange behavior on error. Swapping "fromBurst" and "auto_replenish" would reduce the number of throttling events. This solution does have more
branches than the original, hence readability vs performance. It's still not exception safe, auto_replenish could throw and leave the state undefined.
BucketReduction reduce(long tokens) {
long fromBurst = min(tokens, burst);
burst -= fromBurst;
tokens -= fromBurst;
if (tokens > budget) {
auto_replenish();
}
long fromBudget = min(tokens, budget);
budget -= fromBudget;
tokens -= fromBudget;
long totalConsumed = fromBurst + fromBudget;
if (tokens == 0) {
return new BucketReduction(Success, totalConsumed);
}
if (totalConsumed + tokens > size) {
return new BucketReduction(OverConsumption, totalConsumed);
}
return new BucketReduction(Failure, totalConsumed);
}
This is a great comment: instead of making generic arguments, you actually tried to show how to do it better. Thank you.
I don't find the comments in the original code distracting, but I do like your version better.
> I'm also curious why burst is consumed, then budget. I would expect _budget_ to be consumed first (with refill) with overflow into burst? My expectation is for burst and budget to have different refill schedules in auto_replenish, so using burst first would result in more failures by missing refill opportunities.
This behavior is documented in the public API [0], so whatever is the reason why it was chosen, I don't think it can ever be changed.
> I don't understand why OverConsumption is different to Failure. Both will result in throttling by the caller. The reason for the difference should be documented.
My understanding is this. If the number of tokens requested is greater than the remaining budget but less than the size of the bucket, the call is rejected and the caller is blocked until it has enough tokens. But if the number of requested tokens is greater than the size of the bucket, the caller will never have enough tokens. Instead of blocking the caller forever, the rate limiter lets the call go through, but then blocks the caller for a while to compensate for the over-consumption. Here's the handling [1]. I wish it was documented better.
Comments are not the first tool people should try to reach because:
1. That's a lazy way to make the code clear, most code should be self-descriptive. If the teams are spamming comments it's likely to be a bad code base already.
2. There's no guarantee the comment is compatible with the code: It's natural language and there's no obvious way to test that. After versions of modification, the comments might still look like they make sense but that's not how the code work anymore.
Two alternatives:
1. Declarative programming: Try to leverage declarative programming styles more. For example, top-down style dynamic programming or memoization is always more readable and less error-prone than the bottom-up ones. You don't have to fully adopt functional programming or logic programming, but do take some inspiration from them.
2. Testing. Tests are like comments, they reflect some meaning of the code. Tests are not like comments, if the code breaks, they break.
Of course, comments are very valuable, but it's most valuable when applies in the right spots only.
When I see code with a lot of inline comments, I can’t help but think that a lot of them would make sense to implement as trace log messages instead. If you’re going to write all these messages documenting the flow of the program, why not give yourself the ability to flip a switch and output them somewhere?
I have a really similar motivation behind a project that aims to teach Jetpack Compose in the same fashion and funny enough, the name is also pretty similar to the title of this thread - Learn Jetpack Compose By Example. You can find it here - https://github.com/vinaygaba/Learn-Jetpack-Compose-By-Exampl...
There's a ton of comments in every example (even though they might be redundant across the examples) and the idea is that you should be able to learn this new framework by merely reading through the examples.
On my team fit/engineering values interview I always asking about benefits of autotests. People who said that tests written alongside the code are the best documentation are getting a mental "plus one point".
I have seen two extremes on this over the years... comment a lot, or do not comment at all. In the teams I work in, I would love to see a more moderate and practical view on the matter.
I get the whole document with your code, and practice that. But sometimes we end up with variable names that get very long and start leading not the most aesthetic code that gets formatted in a very vertical way.
Don't know the right answer, but suspect the two extreme views are not correct.
IMO, comments are needed when the code itself can't express something necessary to understand its overall function.
I've found that hence e.g. for rather complex statements where the code isn't lacking information, not commenting is the correct approach.
Whereas sometimes, when I'm forced to write exceptional code because of e.g. a workaround, a comment can nicely introduce my future self and other readers to the context.
Well documented code is respectful of the maintainer's time. Nobody wants to read a monologue full of noise to understand what is happening.
Comments are not monologues. Explain what the line you are commenting does, and perhaps elaborate why. But always acknoweledge: more words = more patience required = more maintenance cost.
Also, nobody will care about what your name is in 2 years. Do not make the code about you.
On the other hand, if something is missing in the comments, it could take days to find out exactly what you need. Maybe a "level of comments" like log levels could help here?
That's a good point. Maybe add a check if the function changed and the comment didn't to remind people? But comment are already easy to forget to update, as they are not code.
I went from my last job where every method and variable declaration had to be commented to my current job where there isn't a specific rule against comments in code but if you put in a PR with comments in it, you're going to get rejected after being laughed at.
I thought I would miss it, but I don't miss the documentation. Code doesn't lie.
I was not very impressed by the token bucket example. Firstly, it is totally ridiculous to need to write your own implementation of gcd and the comment is silly and unnecessary.
Secondly, the implementation for the token bucket seems to just be bad. A simpler implementation is as follows:
- Input number of tokens and refill time.
- Output the constant state describing the limiter: the max number of tokens and the time to get a single token (= refill time / size, ignoring rounding issues which shouldn’t generally matter if your times are in nanos)
- The only mutable state is a single time instant: the time when the limiter becomes full
- to attempt to take a token, compute max(0,full_time - now) / new_token_time and if this is less than or equal to the size of the bucket minus the number of tokens you want, you may withdraw and set full_time := max(full_time, now) + tokens_to_withdraw * new_token_time
If that division is concerning then you should either attempt to withdraw from the rate limiter less frequently or you should not be using one at all.
The rest of the code is about scheduling events to happen when there are free tokens in the ratelimiter. This is a different problem and also over-complicated. You should either poll and retry periodically with your most important event (rather than whatever event you thought was most important when the bucket became empty) though you then want to be careful about stale tasks never running, or you should just have a queue of pairs (Task, cost) where cost is an int. after withdrawing tokens from the ratelimiter or when the queue transitions from empty to non-empty, peek the top item and set a timer for rl.full_time - rl.new_token_time * (rl.size - cost) and when the timer fires (or immediately if the time was in the past) you can dequeue, withdraw cost tokens, and run that top task (then repeat.)
> The second one is ok, but please don't follow the first one as a good example, it takes away the ability to scan the code.
IMO if at all, this type of comment ("Why was X designed this way") should go to the very bottom of the file (maybe with a very short comment at the top of the file referencing it), so it doesn't bother anyone who works on the code on a regular basis. And one could (should) also decrease its verbosity.
A lot of code files starts with lengthy copyright notices, and no one seems to mind. I find comments at the top very easy to ignore if I work on that code regularly. Also, I don't think I've ever seen comments at the bottom of a file!
I wonder if Protobuf wasn't open-sourced by Google, that comment would've been replaced with a link to a design doc. Google docs weren't a thing in early 2000s when Protobuf was originally written though.
> A lot of code files starts with lengthy copyright notices, and no one seems to mind.
Hmm, maybe I'm the odd one out but I actually find them quite annoying if they're longer than just a few lines.
In any case: People often just scroll past and ignore lengthy comments at the top (maybe precisely because they're used to copyright notices), so I think a short comment at the top like "For design considerations and reasons this code has been written the way it is, see the very bottom of this file" will attract much more attention.
> Also, I don't think I've ever seen comments at the bottom of a file!
True, but how often do you see code comments about design considerations in the first place?
I agree that some comments in the first example are unnecessary, e.g. the one pointed out in [0], but I find comments like that at worst useless, but not harmful. They have a potential to become harmful if someone updates the code but doesn't bother to update the comment, but I can't see how it takes away the ability to scan the code.
It’s tricky, I think I remember reading a paper by Microsoft research (I’m really sorry; looking for a source) that showed a correlation between bugs and all lines of code including comments.
So if that’s true redundant comments do hurt you.
Anecdotally in dynamic languages incorrect comments related to types seem to be a constant pain for me. They’re unverified and so a second source of truth that often wrong and can’t be trusted.
If you must dump a 5-paragraph essay about why something exists, you put it in the git commit message so when you git blame those lines, you see the history of the code.
I really dont want to read another developers comments on their mental state while I am feverishly debugging their code during a 3am page.
The number of times I have been misled by an outdated comment. I generally open git blame to see if the comments predate any code now. Tests on the other hand almost never lie.
Readmes or comments explaining why something exists vs what it allegedly does are generally much better. If you can’t write expressive code why would you assume your English is much better?