Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Anti-If: The missing patterns (2016) (joejag.com)
223 points by fagnerbrack on June 27, 2018 | hide | past | favorite | 229 comments


Another pattern I've found helpful is to have branches "exit early"; if you have code like:

    if !cache.contains(key) {
      v = db.query("..")
      expiry = calcexpiry(v)
      cache.put(k, v, expiry)
      return v;
    } else {
      return cache.get(k)
    }
by the time you get to the "else", your mental stack is filled up with the cache loading logic - you have to work to recall what this else branch is triggered by.

instead, make the branch be the least complex option and exit early, and skip the else. That way, it can be read as "if this condition, do this tiny logic and exit", and that branch can then be mentally pruned.

    if cache.contains(key) {
      return cache.get(key)
    }
    
    v = db.query("..")
    expiry = calcexpiry(v)
    cache.put(k, v, expiry)
    return v;


I have heard this called early-returns. I like to write code this way too. Proponents of one-return-per-function probably write code in languages like C which does not have GC or RAII so having only one exit point makes it easy to ensure all resources allocated by the function have been released.

There are ways to do early returns in C without duplicating resource releasing code though, using gotos (it's one of the few cases where gotos make sense, IMHO). Code looks like this:

  int aFunctionWhichMightFail() {
      int result = FAIL;
      Foo* foo = allocateSomething();
      if (!someFunction(foo)) {
          goto fail;
      }
      moreWork(foo);
      result = SUCCESS;
  fail:
      releaseFoo(foo);
      return result;
  }
The Linux kernel makes heavy use of this.


> There are ways to do early returns in C without duplicating resource releasing code though, using gotos (it's one of the few cases where I think gotos make sense).

Gotos like that are fine. Despite the title of the paper I don't think Djikstra's meaning was that gotos should be removed entirely.


Yeah. Dijkstra was talking about not building your whole program out of if and goto, and using functions and for/while/etc. Particularly he's talking about Dartmouth BASIC that didn't have anything else.

Like, imagine the spaghetti lovecraftian horror of a non trivial program that's just if and goto. At the time, there was a particularly vocal camp that essentially made the argument that it shouldn't matter since if/goto is equivalent to what exists in the machine code anyway, and you can technically do all the same things.


Keep in mind adopting any strategy beyond if/goto as is in machine code comes with reduced efficiency. So we trade efficiency for readability. Some talented genius might make a super efficient piece of art by restricting to if/goto only, that no other approach could beat performance-wise. I guess anyone who does low-level machine code execution optimization is in that camp most of the time.


Not really. You want structured constructs, even in asm, because they play nicer with the speculation and prefetching hardware. Going nuts and throwing away all of the conventions of higher level languages typically kills your perf.


Well, it depends; if you can employ locality of code, cache lines etc. to your advantage it might pay off doing crazy things to optimize the innermost loop and similar. Though as you said, you'd structure whole blocks of functionality more higher-levelish. On some embedded systems it still pays off to modify code during execution, e.g. replacing a constant in innermost instruction by pre-computed value from higher level to avoid memory access penalty.


It doesn't make sense to me. the absence of "advanced" control flow constructs is one of the things that define assembly language - only branches and conditional branches, plus sometimes conditional execution of single instructions. Then I don't see how you can claim a non-existing thing can play nicer with speculation and prefetching.


There's a lot of optimization at the microarch level towards the kind of code a compiler emits. Modern chips are designed to run C very well, not arbitrary, technically allowed assembly. You'll hit a lot of perf bottlenecks if you throw weird code/data flow graphs at them.


Well, I'd like to see that. Instruction sets designed with high level languages in mind, sure; but speculation and caches disturbed by code that is not shaped like what a compiler does (which compiler, anyway?), that's doubtful to me.


Djikstra's meaning was that gotos should be removed entirely. By replacing them with the high-level control-flow constructs them were being used to implement. If your language doesn't have a high-level equivalent of `goto fail`... well Djikstra would probably say you shouldn't use that language in the first place, but given that you are, it's no worse than using `jlt .forloop` in assembly.


You are going to wind up with some kind of 'goto' regardless, whether it's having to navigate through independent files or various other constructs one takes for granted as being immutable. The point being, when you notice them, automate them out of the reasoning space so you don't drown in the complexity of one mundane detail being used to represent the overall state of the program (all the stuff you have to shove in your head to understand what your program is doing).

These things are hard to notice, and even harder to find elegant and effective solutions!

The point is to reduce the problem space so all you are left to look at, when you do come back to your code, is what's important to reason about without removing important details or adding trivialities. These problems haven't gone away just because the literal word goto isn't used anymore.


> If your language doesn't have a high-level equivalent of `goto fail`

Such as?

I am aware of writing this in different ways, of architecting the logic quite differently. But the only control-flow construct I've seen as alternatives are to abuse exceptions or labelled while with named breaks.

What were you thinking of when you wrote this? I'm genuinely interested.


Golang's defer statement gives some of that power, (and some extra guarantees) but doesn't as far as I know allow return statements.

You can however give named return values and reference those names, and I believe it will work.

I've not seen any other constructs in any language that would come close, and I still maintain `goto fail` is a reasonable construct.


Deferred statements (in go) are really useful for delaying the runtime of blocks of code to the end of a context, but leaving them in the logical context where you're setting up a resource.

I'm actually not sure offhand how multiple deferred statements are handled. I could envision that the specification might make no explicit guarantee about the runtime order or it might create and pop off of a stack (at least in behavior).


Thank you. For clarity, here would be the GP example:

   Foo foo = acquireFoo()
   defer releaseFoo(foo)

   if !isValid(foo) {
      return FAIL
   }

   moreWork(foo)
   return SUCCESS
That is impressively clear. I don't use go, myself.

I agree, I am quite comfortable using goto in C/C++ in this context. Though it may be better in C++ to use RAII in this specific example.


Common Lisp has `with-open-file`, which guarantees files get closed regardless of what happens in the body, even if an exception is thrown (the macro closes the file and then propagates the exception in that case):

    (with-open-file (fhandle path)
        (if (has-record fhandle)
            (process-record (get-record fhandle))))
Python basically has this, but Python doesn't give you the macros needed to write your own for other kinds of resource.

http://clhs.lisp.se/Body/m_w_open.htm

> If a new output file is being written, and control leaves abnormally, the file is aborted and the file system is left, so far as possible, as if the file had never been opened.

Shades of PCLSRing!


> Python doesn't give you the macros needed to write your own for other kinds of resource.

Python doesn't use macros, true. But it absolutely gives you the ability to do 'with' for other kinds of resource (and anything else). Look for the with-statement and documentation on defining your own context manager.


with-open-file is just a convenience macro around the underlying functionality which is provided by unwind-protect which is pretty much identical to try/finally in Java.

The ease of writing macros such as with-open-file means that most libraries that need to have similar types of features for their resources that needs releasing also provides similar functionality.


Java and C# have the `finally` block which is essentially the `goto fail`. Typically it is used like:

  try {
    doSomething();
  }
  catch (Exception e) {
    handleException();
  }
  finally {
    cleanUpResources();
  }
When the code exits the try block, you are guaranteed that it will run the finally block, possibly entering the catch block first if an exception was thrown. Since the most common use of `goto fail` pattern is freeing memory, the `finally` block isn't actually used a lot in Java/C# code in practice.


Thank you. Since the example was nothing to do with exceptions, and we don't want to abuse exceptions by using them when they are not needed, I will rewrite your example to cover the GP case:

   Foo foo = acquireFoo();
   try {
       if (!isValid(foo)) {
          return FAIL;
       }
       moreWork(foo);
       return SUCCESS;
   } finally {
       releaseFoo(foo);
   }
Python could also use the same approach. Though it also has context managers / the with-statement for this specific application.


> What were you thinking of when you wrote this? I'm genuinely interested.

That there is no high-level equivalent, and Djikstra, while he sometimes had useful insights, was ultimately a self-righteous twit.

Sorry to disappoint; I would also be interested a (not-even-worse-than-goto-like-exceptions-are) answer to your question.


I could be misunderstanding here, but I'm fairly sure that in the spirit of the paper `goto fail` would be preferable to throwing an exception and unwinding the stack at any arbitrary function call.


Dijkstra's original title was a "a case against the goto statement". The clickbait title "goto considered harmful" was chosen by the editor of the CACM

[0] https://www.cs.utexas.edu/users/EWD/transcriptions/EWD02xx/E...


I've extensively written code like this, though I don't seem to see others do it very often. I suppose if you really hate gotos, then a "do { } while(0);" with a "break;" in the middle of it is equivalent.

There's just too much cleanup, such that avoiding gotos means you're either nested 200 columns into the screen or have half your function filled with partially-duplicated cleanup code.


gcc has __attribute__((cleanup)) extension for this:

https://gcc.gnu.org/onlinedocs/gcc/Common-Variable-Attribute...


I used this pattern a lot writing Win32 / COM code.


Somewhere, in the long dark ago, a bunch of people got upset about long functions with return statements peppered throughout.

So we ended up with a Best Practice of one return clause per function. Which is triply stupid because 1) long functions are part of the problem, 2) break to the end with multiple assignments to the same variable is almost as bad. It’s just trading one stupid for another which is a waste of time and energy.

And then for many years a loud minority of us, some of whom actually have a vague understanding of how the human brain processes data, insisted on the pattern you describe here. Some of us call it “book ending”. All the returns are fast or proceed to the end. Any method that can’t accomplish either has too high a complexity anyway and gets split until you get either one return, or book ends.

[another responder calls them guard clauses, but this doesn’t quite apply in all situations. A clause that handles a default or a bad input is why you can rearrange, but it misse why you should: ergonomics]


The single return best practice comes from the old days of C. Back then, compilers didn't inline function calls as aggressively as they do now, and the overhead of a call was a relatively significant fraction of execution time. That led to longer functions being a good idea.

At the same time, C does not have any features for reliably managing resources. No GC. No C++'s RAII. Instead, you have to be careful to release every resource you acquire in the body of a function, along every execution path.

In that world, avoiding multiple returns makes a lot of sense because it ensures every execution path reliably reaches the end of the function so it's easier to ensure resources are released.

We aren't in that world anymore, so the guideline no longer applies. But it wasn't a stupid rule. It's just unhelpful to take it out of its original context and expect it to apply to a new one.

"Check your boots for scorpions before putting them on" isn't a dumb rule in the desert, but it is in Antarctica.


Ah yes, that's true about C. I think my aggressive distaste comes from seeing people cargo cult this into languages like Java, Python, and Ruby.

Makes no sense to turn your code inside out for resource management when you picked a language where you don't have to turn your code inside out for resource management.


Even in C these kind of early returns can be good. In the example above you would be returning before allocating any memory. This makes the memory you do have to manage simpler.


> Somewhere, in the long dark ago, a bunch of people got upset about long functions with return statements peppered throughout.

I got taught this matter of style in the context of 90s C/C++. The general idea was that many return points likely means many point of having clean up any state you've been trying to manage/encapsulate with the function... in particular the state of manually managed memory. Which probably means mistakes.

It's a reasonable argument for manually managed memory, loses much of its power once you're not. And these days, most of us are not.


I believe this is commonly referred to as a guard clause? I tend to prefer writing things this way too... but I could see an argument for putting something in an else block instead of just leaving it there.


Personally, I find the first version easier to read since it makes the structure explicit. I.e. there are two blocks and you will execute one or the other.

To me the second version makes it more difficult to differentiate the structure:

  if (...) {
    foo()
    return ...
  }

  return bar()
from:

  if (...) {
    foo()
  }

  return bar()
I find that when I'm reading code I'm generally more interested in the overall structure than in the specific implementation details. Especially since if the first block was complicated enough to tax my cognitive overhead, then I would probably extract it into a function anyways.


There's a reason the pattern is referred to as "exiting early". Another description might be using guards. Basically, you shouldn't do this if you're doing something complex Basically you can imagine each of the if statements being a "guard" against an condition that might cause you to exit early. This becomes much clearer if you have multiple such clauses, and add the mental note that each conditional should contain only a return statement with a single expression.

    function foo(bar) {
        if is_null(bar) {
            return TypeError
        }

        if !validate_input(bar) {
            return ValueError
        }

        if cached(bar) {
            return cache(bar)
        }

        // Added as a workaround for client XYZCorp since...
        if (bar == "A special value) {
            return "A special result"
        }

        return do_a_complex_thing_to(bar)
    }
You could equivalently structure it as a bunch of else ifs, but there's no real reason to do it that way.


I like and use early exits when they fit well, which is often. And in this case I'd probably do as you suggest. But it's worth mentioning that even if you stick with if/else it's often worth it to put the simple case first to reduce mental load:

    if cache.contains(key) {
      return cache.get(key)
    } else {    
      v = db.query("..")
      expiry = calcexpiry(v)
      cache.put(k, v, expiry)
      return v;
    }


This example makes me whish an early return even more. Wouldn't it be sweet?


This. I would amost say: Don't use 'else' (if you can).

I also would like to add that it helps to fail early.

The article shows a null check for an add function. But the question is: why not throw an exception when the input is not what you expect.

This helps to find bugs very early.


I think 'else' shouldn't be shunned where appropriate, I usually like a lack of else whenever checking function pre-conditions while as I'll include an else (even with both branches returning) if the two branches represent comparable operations. English can be horribly abused to fit anything, but usually saying it out loud can help "If the user id isn't numeric - we're done. If the user doesn't have access - we're done. If the user wants to view the resource in purple shift, let's format it like so and give it back to them, otherwise we'll use a yellow shift and hand it off."


If it's a bug, sure. However, it might simply be a "not found" - for example, you're trying to decide whether to insert or update a record. You look for an existing one and don't find it - you shouldn't throw here, because "record not found" is a valid outcome.


Unless it’s a one-off thing, it’s also worth considering adding a cache class. This can have the advantages of further decoupling the cache and computation logic, and being able to switch the cache implementation (e.g. to use a no-op cache for debugging). It also hides the “if” by introducing a “cache pattern”:

    return dbcache.getOrCompute(key, function() {
        v = db.query("..")
        expiry = calcexpiry(v)
        return (v, expiry)
    });


Thank for posting this! I recently had quite a few discussions about this topic with people who insist to _always_ use if/elseif/elsif/... constructs in such situations. I tried to argue that in most situations a "if (a) {return}; if (b) {return}; do_c()" construction is much easier to understand but I had only little success convincing them.


I always say that if you know the answer, why convolute the code? Just return the damn answer and we're done here.

I see these early returns as a list of assumptions. It helps making them nicely lined up, with perhaps some comments before each, and an empty line between them.

And then when you get below to the, now non-nested, code that actually does some work you have a smaller mental burden needed to understand what is actually done.


Personally, it is better to have functions small enough that they fit on a handful of lines, and the if-else structure makes it obvious that you are doing one or another thing, rather than having separate if structures at the top of the function that may-or-may-not exit before you reach the main body of the function.


Your code should be written this way instead:

    if !cache.contains(key) {
      v = db.query("..")
      expiry = calcexpiry(v)
      cache.put(k, v, expiry)
    }
    
    return cache.get(k)
Only one return, and everything is still in the logical order.


I don't think there is anything virtuous about a single return in languages that have catch, finally, and garbage collection.


I think it’s a judgement call: I like clearly indicating when there’s always a single output path but wouldn’t use the single return style in that cache example if it involved an extra lookup operation for a value which is already known.


Yes - the cache example is bad; my point is the general idea of avoiding branches that merge back into the main function.

In your example above, the branch does merge back, and clearly that's fine for a small function - and common pattern - like this. But the general idea would be to avoid it if possible, because that removes unnecessary complexity.


I have used this pattern too. I like it.

One small problem is that depending on how your threading and cache works it might be possible for the cache to be cleared after the contains call but before the get.


Or for the put operation to finish first every time on development machines and only sometimes in production, thus creating a very tricky bug.


IMO, both examples are bad because they search for the key twice.

That’s inefficient and, worse, introduces a concurrency bug. The cached value may expire between the check using ‘contains’ and the call to ‘get’.

A good API would have something akin to C#’s TryGet (https://msdn.microsoft.com/en-us/library/bb347013(v=vs.110)....).


imho, a get() from cache when you already have the value seems like an unnecessary waste of cycles and network traffic.

plus this arrangement would seem vulnerable to a pathological case where you found the V from the db, but have a full/failed/partitioned cache and end up faulting or otherwise not returning the V.


I’ve used Python’s decorators for similar pattern. Syntax and scope is bit different but same structure in general level.


Good practice, something I tend to catch only after writing something inefficient.


These things really should be optimized by the compiler. No reason for it not to.


This is an optimization for readability, not execution speed.


Don't use boolean parameters to toggle behaviour - they're indecipherable from the calling context. But don't then mindlessly create the cross-product of methods from every combination of true and false for your boolean parameters: If the caller would have used an expression rather than a literal to determine which version of the method to call, use enums instead of booleans or multiple methods. And don't turn setEnabled into enable/disable pairs, or setVisible to show/hide pairs - these are understood as properties, not parameterized methods.

Enums vs inheritance isn't straightforward. Enums make it easy to add new kinds, while inheritance makes it easy to add new methods. They are more extensible in different directions. If it's very rare for you to add new values to your enum, and more frequent for you to switch on that enum, prefer enums (and always throw in your default clause!). If you are adding enum values all the time, and only have a couple of methods that switch on them, then prefer inheritance. Although if you're using Java, consider methods on your enum values instead.

To understand the enum / inheritance tradeoff better, examine the Expression problem: https://en.wikipedia.org/wiki/Expression_problem


> Don't use boolean parameters to toggle behaviour - they're indecipherable from the calling context

This is mostly solved in languages with named parameter calling (if you use it). Example: createFile(name, contents, temporary=True)

Sometimes it is already obvious enough. In C I have no problem with set_visible(item, true);


You can mimic this in most languages: createFile(name, contents, /* temporary= */ true). Of course that doesn't guarantee that that's what that parameter is actually named, but you can write a lint for that (with quite a bit of effort).


> Of course that doesn't guarantee that that's what that parameter is actually named

I’ve found the odds of that happening approach certainty once you have more than one or two parameters. If you’re using a language without named arguments I usually treat adding a third parameter as a cue to reconsider using a more idiomatic interface, whether that’s something like classic Java OOP (.setTemporary(true)) or limiting that knowledge to one place by e.g. creating a subclass, struct, or a specialized function so you call createTemporaryFile() and know that ordering check only needs to be done in one place rather than by every caller.


I'm not sure about this, because it's easy to mix up two arguments of the same type.

For example, you could mean

    createFile(name, contents, /* temporary= */ true, /* overwrite= */ true)
but write

    createFile(name, contents, /* overwrite= */ true, /* temporary= */ true)
by accident.


Now I want to see something like Prettier (a code formatter that enforces _exact_ whitespace and specific optional syntax) that will parse C code and fill in the correct parameter names, in comments, based on the headers, for any boolean parameters (maybe excluding single boolean parameters?).


A decent linter checks those Boolean comments for accuracy.

Writing documentation that that merely copies existing code is terrible: it's either going to be wrong (because it's not automatically checked and will go sour in a refactoring, or it's right because it's so obvious that no one will guess wrong when looking at it.)


Even something like

    int temporary = true;
    create_file(path, contents, temporary);


In some languages you can use the named-parameter version to indicate what's being true-ified:

  create_file(path, contents, temporary:=true);


C++ has named parameters, too. If you want it to.


Would like an example; this is hardly "solved" in C++. There is an old Boost library that does this, but it's not idiomatic and rarely used.

Unless you are using a different definition of named parameters, I've never seen C++ code in the wild that used them.


That’s a bold statement, I’d love to see an example to show this.


Sounds like the "named parameter idiom", the function comes with a struct that contains all the possible parameters and the caller passes one of those to the function instead of each parameter seperately.

http://www.cs.technion.ac.il/users/yechiel/c++-faq/named-par...


I'm also usually pro enum and against inheritance. I find inheritance is harder to read when you jump into a codebase. The expression problem of course is to be considered.

Many languages now also have discriminated unions (enums with data), like F# and Rust, which make enum the better choice in more cases.


It always depends on the language that you are using. In c#, for simple cases, enums are perfect, but otherwise you should switch to classes hierarchies. In Java you can use them in more complex cases because you can encode data quite easily. If the amount of data encoded is becoming too much (let’s say more than a couple/three fields per case) than you should switch to classes. In F# you can use a combination of sum and product types and have a happy life with automatic deconstruction and a somehow less happy one with lenses/optics.


Erlang provides an 'if' expression as syntactic sugar for 'case' (switch), which must always have a true (else) clause. The 'if' form is strongly deprecated.

Even the 'case' expression is mildly deprecated in favor of pattern matching and guards in function heads (definitions).

There are no null values, no classes and no inheritance, but the presence of atoms (arbitrary tokens that can be used like null, true, false or enumerations), and pattern matching (data polymorphism and destructuring), means that conditional execution is elegantly implemented by functions.

This is the second, often forgotten, meaning of functional programming. Providing functions as first-class types, supporting anonymous lambdas (closures), and applying functions over collections are given all the glory, but deprecating conditionals in favor of pattern matching over many fine-grained functions is just as significant.


Now that I'm comfortable with Erlang, writing in any language that doesn't support branching at the function heads is so depressing.


also not being able to write multiple versions of a function with different arity is annoying.


I recently started learning Clojure and using this pattern is immensely satisfying as well.


Very very much so, good point.


> There are no null values,

Unbound variables are the equivalent (detected if used, at compile time), but a much saner way to deal with value-less variables


I disagree with the comments here. This is mostly not good advice.

Ifs exist because most real world problems require them. “If the box is checked, do this, otherwise do that”.

Masking if statements by syntactic sugar doesn’t serve any purpose in my opinion and if anything it makes the code more opaque, or worse, may force you to duplicate code...


I was wondering which of the examples you think particular fall into this category of making the code more opaque?

I actually went into this with similar misgivings. A lot of "get rid of ifs" advice, particularly from OOP people ends up making things more complex IMO. However most of the advice other than the "Switch to Polymorphism" seem very down to earth and a simplification. Less about getting rid of ifs as if their mere presence was a stain on humanity and more how to rewrite what they are doing more clearly.


I still "believe in" OOP, but there's something that is increasingly troubling me lately: I keep thinking about the difference between the unbounded set of possible derived classes which could be substituted for a base class, versus the small (probably 1-3) number of different actual implementations are ever used in a real program. It seems like this disparity indicates the mechanism we're using is overly general. Maybe this is an argument for Sum Types as opposed to runtime polymorphism.

It seems the biggest thing OOP polymorphism has going for it is that it is "open", that is, the possible derived classes need not be known in advance. But, the fact that our toolchains use a separate link step, and that linkers remain primitive and link-time code generation is not widespread, is merely an historical accident. There's no a priori reason things have to be this way.

For example, an alternative would be if all of the source for an executable were in a single image file a la Smalltalk, and compilation consisted of converting this "database of code" into a single executable (no shared libraries!) in one big-bang compilation step (no object files, no static libraries!) You could even have something which syntactically resembled open polymorphism, maybe even using existing languages almost as-is, but instead of compiling down to type erasure and dynamic pointer casts, would compile down to Sum Types (variants). (The compiler could scan the image to find all of the derived types it could be and could construct a variant out of that.)

I'm not saying this is better, just putting it forth as an example of how things could be radically different than they are now (even for statically typed, compiled languages), had we made different choices. Consider the alternative, the situation we have now: we use a technique which could support anywhere from 1 to infinity polymorphic derived classes to implement interfaces which probably have approximately 2 different implementations in a typical finished program.

Perhaps an analogy would help: If we approached hardware the way we approach software, we would be using connectors which support between 1 and infinity pins everywhere when connectors which support N pins would suffice.


Sum types are a huge benefit to branchy code. Philosophically, OOP polymorphism can do a similar thing, but it doesn't really cover it as succinctly: if your operation is of the "similar data, mostly the same algorithm but with variations at points depending on the data's type", going down the polymorphic route requires a lot more naming of things to cover each variation, and then when you go to inspect it, the code is twisty and jumps all over the place. I've had a lot of bugs introduced by method boundaries obscuring the execution flow.

What polymorphism is good at is building the black box, the soft boundaries, and that isn't a good thing to do for fine-grained details. I do occasionally find a use for extension, but it's at a scale closer to "call this entire subprogram as a kind of state machine", vs "write this algorithm as a collaboration of objects abstracted away from each other".

The "alternate hard and soft layers"[0] pattern comes to mind: if what you really need is modularity, going towards full dynamism and reflection seems like a better choice than to try to extend everything statically, which creates a very large latency issue(the default response to every form of change in a static system is "recompile from the beginning, precompute all answers"). At the same time, it's not a good fit to have a vast codebase do everything dynamically since the comprehensibility and throughput suffer.

[0] http://wiki.c2.com/?AlternateHardAndSoftLayers


I didn’t mean to do down OOP. I think there are a lot of useful concepts to be had looking at it. Polymorphism for extension is definitely one of those. It’s more a lot of the more ‘interesting’ takes I’ve seen to replace ifs come from that direction and seem to be an exercise in decreasing complexity in the small (easier to understand methods) whilst increasing it a lot in the large (harder to understand architecture).

Your idea of replacing polymorphic variants with sum types sounds interesting. It also strikes me as quite similar to compile time polymorphism (e.g. generics) where you are 1 to infinity in the design space but the generated program boils down to only the variants created. Language wise I really like the mix of sum types, traits and generics in Rust.


Sorry I answered as a top level comment. Im on my phone


No problem!

It seems that the issue is less that the code is made more opaque but that you can quickly think of some counter-examples or nitpicks that make the advice less valid to you?

I think the problem of not being universal is true of most programming advice. So I don't necessarily see it as a reason to discount it. I also think in contrast to a lot of articles the author has done a great job to couch their advice as being contextual. For example for your criticism of Pattern 4 the author does actually point out the obvious solution to these complex expressions. To split them out into several parts. Which is what you get with if statements to an extent but spread out much more with a lot of clutter.

Some of your criticism also seems to be based on a misreading of the authors intent. For example Pattern 5 where the goal isn't to remove the if statement but prevent having to repeat the same error checking pattern everywhere.


Well despite a few sentences like “remember, if statements are not all bad”, the author still makes a case that if statements are generally to be avoided.

My point is that this in itself is bad advice.

Overuse of oop concepts is just as harmful (if not more) as overuse of if statements.


Overuse of anything is generally harmful (since it suggests doing too much of it) otherwise it would just be use!

But I'm fairly convinced the author isn't saying that they are to be generally avoided, to quote directly:

> If statements usually make your code more complicated. But we don’t want to outright ban them. I’ve seen some pretty heinous code created with the goal of removing all traces of if statements. We want to avoid falling into that trap.


> Overuse of oop concepts is just as harmful

The author's examples are not all about oop -- many of them having nothing to do with oop, they are just good advice for clean code in general.


It depends on the language. Languages with pattern matching usually solve the problem with code like this (pseudocode)

    fn box(status=checked)
      do domething

    fn box(status=*)
      do something else

My Elixir projects have 0 to 10 ifs and they do solve real problems. I use plenty of ifs in other languages but I follow some of the advices of the post.


I don't consider this style of pattern matching to be semantically different than ifs or switch cases. There's literally no "if" but they are fundamentally similar to having a single function with a switch/if inside for the "doing something".

Most pattern matching in languages is basically souped-up switch statements much of the time. Note that pattern matching is not one of the solutions mentioned by the author.


`switch boolValue` insists you do something with the false part, `if boolValue` does not. This is just looking at it from the most atomic level possible. If you have a group of booleans you get pretty complex lines of code. Switching on a tuple of three boolean where you can have cases like "if the first two are true ignore the third one" is super powerful. It becomes even better if you return an enum that indicates what it really means. Like CONNECTED, CONNECTING, DISCONNECTING, DISCONNECTED. Now every possible meaningful value of those three booleans has a word attached to it.

A Swift-enum allows you to have methods inside enums:

    enum ConnectionStatus {
        case connected
        case connecting
        case disconnecting
        case disconnected

        func `for`(isConnected: Bool, isConnecting: Bool, isDisconnecting: Bool) -> ConnectionStatus {
             switch (isConnected, isConnecting, isDisconnecting) {
             case (true, _, false): return .connected
             case (false, true, _): return .connecting
             case (true, _, false): return .disconnecting
             case (false, false, _): return .disconnected
        }
    }

Then somewhere inside your code:

    public func connect() {
        switch ConnectionStatus.for(isConnected: self.isConnected, isConnecting: self.isConnecting, isDisconnecting: self.isDisconnecting {
        case .connected, .connecting: return // We're already good
        case .disconnecting: reconnectWhenDisconnected()
        case .disconnected: startConnection()
    }
The amount of if's that have to be juggled to do the same and the resulting code that is hard to parse when you're maintaining it definitely is more problematic.


Whose comment are you responding to? It seems you agree with my comment about pattern matching and switch cases.


I must've misunderstood your comment. I thought you were saying these pattern matching rules were basically souped up if's. Elixir has pattern matching on function level but it's also heavily ingrained in the whole philosophy. It's quite the antithesis to Java.


The don't care _ is doing a lot of work keeping that code less verbose.


In haskell passing a bool is still considered an anti pattern, it's called boolean blindness. This is mostly because bool's don't carry semantic information.

Same goes for `Optional<T>` in an argument position, though https://github.com/quchen/articles/blob/master/algebraic-bli...


Of course there are legitimate cases to use if/else.

This blog post suffers from the classic problem of blog post code examples - to fit the example in a blog post it needs to be trivial, and because it is trivial it doesn't demonstrate the real value of these techniques.

I see if/else overused all the time - every single day in fact - when the techniques in the blog post should have been applied.


My previous team leader was a pretty poor programmer and would always describe the output of SQL queries he wanted me to generate with lots of "if, then".

It was then my job to translate that into SQL which doesn't really have "if" in the same sense as an imperative language does.

Sure I could have pulled out all the data and done the branching in Python, but that would have been a performance hit, but mainly because it would be ugly, require a lot more code and likely be a lot buggier. These days I try to keep as much work done in the database because the declarative nature of SQL generally works out with a lot less bugs as well as the performance benefit.

Sure it's not going to be possible for every case, but I have noticed that the more experienced I get, the less I like "if"s.


All rules for how to code better should be considered but potentially ignored while you're coding in practice. This includes the rule in the article ("never use if", although it doesn't really say that) and the rule in your comment ("never replace if with polymorphism", although this too is an exaggeration of what you said).

If your code contains quite a lot of branching - especially the same pattern of branching in several places - then you should certainly consider replacing that with polymorphism, in the form of virtual methods or perhaps templates/generics (compile-time polymorphism). But you should also consider leaving the branching in place if that is simpler overall. That doesn't make the article bad advice IMO, you just need to not take it too seriously.


Fair enough.

It’s annoying though (and kind of strange, really) that these articles (which discuss very basic concepts) make it to the front page, and may influence a bunch of rookie developers the wrong way.


Eh, there's no shortcut to experience. Beginners are going to get ideas from _somewhere_, without being able to thoroughly judge them. And they'll try some, and eventually they'll figure out what works and what doesn't. I thought this article was reasonably clear that it did not offer a silver bullet, just some structures to watch out for, consider, and maybe improve.


I would rather have new developers think in a more object oriented or functional way instead of procedural. If statements imho promote procedural thinking, while pattern matching, polymorphism, or to a somehow lesser extent, if expressions promote a more OO or functional way of thinking.


I agree that this stuff is mostly counterproductive.

As design theory? Fine.

As HN frontpage stuff? I mostly see it get discussed in the form of new devs or students asking things like "wait, how should I remove all the 'ifs' in my Java code?" (Yes, that's a real example.)

Spreading the good news about case statements, function passing, and so on is great. But this stuff is so often written up as absolutes, when simple conditionals really are a part of most practical cases.


> especially the same pattern of branching in several places - then you should certainly consider replacing that with polymorphism

As a first step, I'd say better to just isolate the branching in one place, as a separate function, rather than immediately jump to polymorphism which could have high refactoring costs in some cases (even if it might eventually be worth it).

Because premature abstraction is just as bad as premature optimization.


> Ifs exist because most real world problems require them. “If the box is checked, do this, otherwise do that”.

There's a notable difference between "if the box is checked, do [thing 1], otherwise do [completely different thing 2]" and "if the box is checked, do [thing 1], otherwise do [basically thing 1, but with this additional metadata from service y]."

And that difference, I think, is what tells you where to put the `if` statements. If things are distinct enough, you have some easy-to-read top-level ifs and switch based on that. Otherwise it's more complicated to trade readability vs code reuse/deduplication.

But where you get into big trouble is when you don't think about the dangers of if's at all, and end up with call stacks 5+ methods deep with no rhyme or reason to why certain things are done in if statements in level 5 and other things are done with if statements in level 1. Creating a mental map of what's done where in what conditions for that kind of code is really hard - as is thoroughly testing it, since this often means you're passing a lot of context really deep and don't have easily separable units. (And yeah, "Switch To Polymorphism," I think, is a risky one here - that can turn into "still have the deeply nested if statements, but make them invisible.")

I don't think the article really hits that well, though.


Yeah, as if a line with condensed conditional logic using Boolean operators really got rid of the branching and the running it through your head.

My attempted better advice would be to keep methods short and the nesting/indentation limited. Use good method names. Make it easy to understand each method in isolation.


I guess if you’re not familiar with Boolean operators that would be a thing?

Seriously are you saying that

    if (x) {
        return y;
    } else {
        return false;
    }
is as easy to read as

    return x && y;


Depending on the language and values, the latter does not mean the same thing.

    <?php
    $x = 1;$y = 2;

    function test1($x, $y) {
        if ($x) {
            return $y;
        } else {
            return false;
        }
    }

    function test2($x, $y) {
        return $x && $y;
    }

    var_dump(test1($x,$y));
    echo "\n";
    var_dump(test2($x,$y));


Since we’re discussing readability, let’s be generous and assume the code behaves the same.

Since otherwise, it’s not a question of readability anymore. (But yes, incorrect transformations are incorrect.)


I agree, there shouldn't be any argument on boolean logic being easier than the branching logic. The other examples are more discussion worthy.


I'd definitely say it was. The branch is much clearer and more explicit.


Branches are often important things to consider such that hiding them inside in-line conditional expressions can make a code reader miss a key piece of logic. Use in-line branches with caution.


The former can be understood by anyone with a basic grasp of English grammar, while the latter requires understanding the Boolean operator. So yes, the former is more readable.


Borrowing the variable examples from another comment, do you believe that the former is easier to understand for someone who knows English grammar than the latter? (I'm using Python syntax since we're going for English-like readability and added parantheses to the boolean operation to indicate order of operations for the liberal arts major that we've decided is the target audience of this code):

  if isAdmin:
    return false
  if not isActive:
    return false
  return true
vs

  return (not isAdmin) and isActive


If someone doesn’t have a basic grasp of Boolean logic then maybe programming is not the right choice..


Doesn't matter. Unless one option leads the compiler to generate better code than the other, it's a matter of aesthetics, not correctness. And since you don't even need to be a programmer to understand what the first one means, it's still easier to read.


An observation that comes from studying foreign languages: you learn “and”, “or” and “not” a long time before “if”.


Earnestly: Yes. But there may well be programming tasks or jobs that they are suited to, for which no understanding of boolean logic will make no difference on a day to day basis.


Huh, is this python-style logical evaluation, i.e. "x && y" need not be a boolean but could be whatever "y" is?

Regarding which way is better, there's not much benefit arguing one way or the other. This is a leaf decision. It doesn't affect the structure of other parts of the code.


I'm curious, what about "x && y" is "python-style" ?

It's been in nearly all languages, and for decades before python existed.


It was not about "x && y" but about the construct from the parent post as a whole. It never uses y in a boolean context, but only as a value. So I was thinking y could be anything. In python this is how it works:

    >>> 0 and 'asdf'
    False
    >>> 3 and 'asdf'
    'asdf'
    >>> 3 and ''
    ''
'and' and 'or' in python are short-circuited as in e.g. C, but it's even more short-circuited! The "last" component is not evaluated in a boolean context. For a complete evaluation you have to use the expression in an if or while statement or apply the bool() function. I would say it's like a monad, if that helps.


What you are talking about has nothing to do with short circuiting.


Agreed, bad choice of words. It's only somewhat similar. Maybe "lazy" would be a better term. Whatever, man! I hope it's clear what I meant.


It’s not that you’re unclear, it’s that it is irrelevant to the discussion, as you were talking about a fairly trivial and unrelated language feature — and it is also a feature that exists in many languages, hardly unique to python.


Whatever, man. Maybe it was not totally related. Agreed. Maybe I had a bad day. It started as a single line that wasn't even the point of my comment. Ok?


No, but:

    if (a) return false;
    if (!b) return false;
    return true;
May be easier to read than:

    return !(a || !b);


But is it easier than

    !a && b
? :)


Rename a and b to reasonable names:

a -> isAdmin

b -> isActive

    const isDeletableUser = !isAdmin && isActive;

    return isDeletableUser;
(Better would be naming the function "isDeletableUser" instead of the intermediate const, but was keeping with the original.)

Point is for me that booleans with readable names shows how eliminating ifs becomes not only doable but arguably preferable.


Seriously, it took me something like 10 seconds to parse your parent post. How a negated chain of if else would ever be easier to understand than a much simpler !a && b really defies my understanding...


What about:

    return a ? false : b;


I still prefer the Boolean logic version.

I’m a heavy user of ternary, but it’s still an “if”. And and Or are (not very much) higher-level operations.


I was being mostly facetious.


De Morgan laws are seriously your friends when trying to make complex boolean expressions optimally readable.


I agree with you on the keep indentation / nesting limited, but I am not a big fan of splitting things into short methods for the sake of it. Provided indentation isn't going too deep I often find it easier to read things in a script like manner instead of jumping in and out of lots of shortish methods.


I completely agree with you for the case when you say that avoiding an if brings you to code duplication. In that case from my point of view it’s a big NO. But in all the other cases, if you are using an OO language, there are always better tools than ‘if statements’. If you are using functional programming with if expressions instead of if statements then the ‘best balance’ moves a bit in favour of the ifs or a lot towards exhaustive pattern matching. On the whole I would say that I agree with the article discussed here, but with some caveats.


If you really think it doesn’t serve any purpose this means you should always use if and never the proposed constructs. This is clearly wrong and I’d recommend you rethink your position.


I don't think the argument is that the constructs are useless. The argument is that it's useless (not good advice) to use them if it is only for the purpose of avoiding if statements.


As per another commenter I used Elixir for 6 months a couple of years ago and very quickly started using pattern matching and guards to provide most of the If-type control flow. It didn't cover every case but it came very naturally in most cases where it was useful.

The overall logic remains the same, it's really just giving one a clearer view of a specific path.


Like anything else in life it is about balance. If you abuse ifs, then your code is unreadable, hard to debug and hard to maintain. If you abuse the alternative patterns then your code, again, becomes hard to grasp and hard to maintain.

You need to look for a balance between ifs and alternative patterns.


> Problem: Any time you see this you actually have two methods bundled into one. That boolean represents an opportunity to name a concept in your code.

I like this statement. After 30 years of coding, I've found that trying to do something quick/clever without naming it explicitly, for fear of letting the code get bigger, is what often leads me to write confusing code and harder to understand ifs.

I think maybe it's that naming in new code is easy, but adding new concept names to existing code as it grows gets a lot harder. I'm automatically naming when I create a new class, but when I fix bugs or touch existing files, I'm actively trying to avoid naming things and I'm trying to touch the least possible code, so over time it trends toward having unnamed concepts.

I usually try to apply naming to Pattern 4. So maybe instead of:

  return foo && bar || baz;
I might:

  bool bad = foo && bar;
  bool worse = baz;
  bool isHorrible = bad || worse;
  return isHorrible;


Just today, I was talking with a team mate about this. We had some pice of code that it's like this :

    public boolean foo() {
      

      if (!this.doesSomethingWithA()) {
        return false;
      }
      
      if (!this.doesSomethingWithB()) {
        return false;
      }
      
      if (!this.fooBar2000.isEmpty()) {
        return false;
      }
      
      if (!this.anotherLongAttribute) {
        return false;
      }
      
      if (!this.anotherMethod()) {
        return false;
      }
      
      return true;
   }
We try to replace to a "one liner" return !A && !B && !C ... Well, the expression was very long and we noted that was more hard to read that the multiple if's. I suggested split the one liner expression on multiples lines doing something like this :

    return !this.doesSomethingWithA()
        && !this.doesSomethingWithB()
        && !this.fooBar2000.isEmpty()
        && !this.anotherLongAttribute
        && !this.anotherMethod();
However, the code formatter (eclipse), changes every time it to the hard to read one liner expression. So we ended using the multiple if's.


There's a semantic thing going on with this expression that is rather interesting.

First, consider this reworking:

    boolean a = this.doesSomethingWithA();
    if (a) {a = a && !this.doesSomethingWithB();}
    if (a) {a = a && !this.fooBar2000.isEmpty();}
    if (a) {a = a && !this.anotherLongAttribute;}
    if (a) {a = a && !this.anotherMethod();}
    return a;
And then this one:

    boolean a = this.doesSomethingWithA();
    a = a && !this.doesSomethingWithB();
    a = a && !this.fooBar2000.isEmpty();
    a = a && !this.anotherLongAttribute;
    a = a && !this.anotherMethod();
    return a;
In the first version and your original, it's clear that there's short circuiting and it can return early, but automatic formatting will bloat up the vertical size of the code. In the one-liner there's also short circuiting, but it doesn't format well. In my second version you get the formatting, but it will always evaluate every possibility.

In general, when I'm favoring code formatting, the variable declarations come out. Quickly aliasing something into a name when it could exist purely as an expression adds a degree of conceptual flexibility. It keeps the code local(no new function name and jumping over into it). But it does also result in this kind of unnecessary computation.


Ah yeah, the one-liner would get killed by my work's code formatting rules too. (clang-format)

Total tangent, but I'm very much looking forward to more and better alignment rules becoming pervasive. I love it when similar things line up, it's easier to read, easier to modify, easier to see mistakes, better supports column selection and multi selection. I'll take alignment over almost any other code formatting issue that programmers debate endlessly. ;)

Having a code formatter is a greater good though, so yeah sometimes multiple ifs is just a necessity. At least the original version is clear and easy to follow.


Just surround it with // @formatter:off // @formatter:on


Yeah the first version looks like something that many people would try to 'optimize' but in reality it is extremely readable and should be very easy to change/extend.


I suggest reading "code complete". contains a whole section about this kind of implicit behavior into self documenting code, plus a lot more. a great deal of a lot more. really a mandatory read for all developers.


While I can see some value, the problem I seem to be running into the most lately is over-abstracted code. Untangling the web of naming and redirections leading to small pieces of code or single variables becomes extremely frustrating. Maybe it comes from education in an engineering background.

I would much rather see

    return foo && bar || baz;
and a comment explaining behavior (although perhaps parens so I wouldn't have to reason about order of operations)


Overall, very good advice. I disagree on polymorphism through virtual methods being a solution to the maintainability problem switches pose, though.

With such polymorphism it's difficult to know the things that can happen. Implementations of the virtual method are not closed - they could come from anywhere in the codebase.

Another problem is that it's hard to cleanly separate the common code from the specialized code in each implementation class. That's why one ends up with ugly helper functions that receive a lot of state, or alternatively with unmaintainable class hierarchies.

IME the only solution is to minimize usage of datastructures that have choices in them ("ADTs") and minimize the coupling of the code to the choices. The latter is done by separating code paths early, and by only switching if absolutely necessary.


I disagree that avoiding ADTs is the solution, it only exacerbates the issue further. When you only have flat records, they often have a lot of data that is or isn't set based on the state of the record. For example, a Workflow data structure that doesn't have an ErrorMessage field set when there's been no error. This could be a data structure with an FailedWorkflow and SuccessfulWorkflow, and the ErrorMessage only on the FailedWorkflow.

Helper functions that receive a lot of state aren't ugly, they are pure. Pure code is easily tested. It is easy to reason about and reuse, because it's clear what the expected inputs and outputs are.

http://deliberate-software.com/anemic-domain-model/


Yes, deliberate denormalization is not a solution. If you need an ADT, use an ADT. However, you can put all the case-by-case data in separate non-ADT structures. Then think hard if you really need to "unify" them and only if so make an ADT that simply links to the other structures and contains no additional data. Works well for me.

(This is not a hard rule. For example, layouts for disk serialization might be better done as a single union with a discriminator field. In this case we're not optimizing for code but for disk accesses).


Polymorphism and inheritance often assume that differences are hierarchical. The real world is often NOT hierarchical, or at least may easily change to not be hierarchical in the future. Noun "classification" of most domain things is messy. It's one of the reasons for the backlash in the 2000's against OOP as a domain modelling system.

IF's often handle unpredictable variations better. Features are often better viewed as a buffet restaurant instead of an animal kingdom classification tree as found in biology. The biology classification-like approach popular in the 90's usually fails outside of biology. And, "lots of little methods" can make code rather hard to read & follow in my opinion. Your eyes may differ.


> The real world is often NOT hierarchical, or at least may easily change to not be hierarchical in the future.

The real world is an undifferentiated unified whole, but you can't actually work with it without dividing it up and labelling/modelling it, and most useful models have a heirarchical structure. OTOH, they usually aren't a single-inheritance heirarchies, and most OOP languages either handle multiple inheritance poorly or not at all.


OOP is lousy as an ontologicsal model of the stuff making up the real world. But that doesn't make it not useful for building designed systems.


True, but that doesn't imply one should turn most IF statements into polymorphism, or other OOP constructs. OOP is one of multiple tools in our toolbox.


Are you sure meant to reply to me? I agree with you. But note that even better than a fixed number of if/elses or switches, is physically separating the data for the different cases from the start (where possible). This way if statements _can_ be avoided. That was my point.

It's the approach taken by the relational model, used in RDMBSes.


The devil's in the details and each situation is different. One has to know the domain well to make decent guesses about likely future change patterns. Making code future-friendly is hard and requires experience both in coding and the domain.

Let's look at UI's. Should "buttons" and "hyperlinks" be considered different "types" of things? Or should we blur the distinction to be flexible? In this view, a button is a style applied to a hyperlink rather than a distinct "kind" of thing. Once you pre-type them into distinct things, it's hard to merge them: the code base hard-wires a clear distinction and too many things rely on a type-centric arrangement.

A more flexible approach is to take the view that things, perhaps ANY thing, wants the ability to "act like" a hyperlink and/or button. Linking is then viewed as an optional feature, not part of some base "type". Same with drop-downs versus combo-boxes versus text boxes: we can model these via feature smorgasbords instead of sub-types and probably get a more flexible framework in the process that may also be reusable in other widgets. But the flip-side is that such flexibility can confuse newbies. Hard-wired hierarchies are quicker to learn, and test.


ADTs are not choice types. Algebraic Data Types are composed by product types, commonly referred as records, and sum types, called choices or discriminated unions or probably something else. You mean to minimise sum types, but I don’t really agree with that. Sum types are invaluable in domain modelling.


Thanks. Yes, sum types is what I wanted to say.


> the only solution is to minimize usage of datastructures that have choices in them ("ADTs")

And then what? This only works if you can eliminate the "choice" altogether. Otherwise the "choice" still will be encoded if your system, be it much less obvious.

Think using null/nil instead of an optional type. Now everywhere you use the value (including when passed to other functions) you have to test the value for not being null. With a proper ADT (Maybe, Optional, what you prefer) it is obvious when you have a value and when you don't know.


If you follow a separate-tables approach, you can process the choices separately, and independently, in most cases, simply by iterating over the individual tables. So the choice is indeed largely eliminated. In the majority of cases you're only interested in one of the choices, or are following links that point to instances of a specific known choice. So, no switching code there.

> you have to test the value for not being null

Don't assume possibility of null if it's not a valid value in your data schema. There are so many codebases rotten by never ending null checks. There are no benefits from them, and only drawbacks: They confuse the reader by checking for a situation that should never happen, giving the impression it could be a valid situation. (This is actually a point the OP makes).

Let me exaggerate a little bit. Superfluous null checks are similar to something like

    def add(int x, int y) -> int:
        return x + y

    int z = add(1, 2)
    if z != 3:
        return 0
    else:
        return 3
Just because it's "possible" that z is not 3 because there are other values that are also integers, that doesn't mean you have to cover them. Type systems are much less helpful than many FP advocates want you to think, and if you depend too much on them you get a bad, bad codebase (and you deserve it).

The main benefits that type systems bring: 1) they catch your typos early, 2) they support compilation to efficient machine code.

If you use tables and indexes, there are no pointers and no null anyway. Then again, I've used -1 as a sentinel in the past, and there are many more unused negative values available if you need them :-)


How do you reconcile the fact you say type systems aren't useful but also that data schemas are useful. SQL is a typed language and it would probably be insane if it wasn't


I didn't say that! I even said where I think they are useful. I only said that just because a type system might have no facility of expressing that null is not possible (for better or worse) it does not mean you should write a terrible program!

Having structure in your programs is of utmost importance. I just don't think type systems can capture all of the programs' finer points. And if they could, why would we write any program code at all?

Why are relational data schemas a good idea? Because they are simple. They bring clear benefits and rarely any problems for implementation. (I mean, if we constrict ourselves to simply record definitions: They are basically a physical description of your data, so can hardly be a constraint).


>With such polymorphism all hope is lost to be able to know the things that can happen. Implementations of the virtual method are not closed - they could come from anywhere in the codebase.

Anywhere in the codebase that satisfies the interface (or inherits from some base class) -- which is trivial to find, and what you want.

Who said they had to be closed?

And what if you need a plugin architecture for example?


If the cases are fixed, that should be reflected in simpler code. The code can only be simpler if it takes advantage of the fact that the cases are fixed. Because of that it's kind of required to "close" them.

A plugin architecture is an entirely different kind of beast, and rarely needed. It requires some sort of interface to achieve the decoupling. Inheritance is one option here, albeit an ugly one.


> With such polymorphism all hope is lost to be able to know the things that can happen. Implementations of the virtual method are not closed - they could be coming from anywhere in the codebase.

Well, they could be implemented in any class that derives from the ancestor class that declared the virtual function. Unless you have one main base class, that should be a very small subset of the code. What's more, it has to have the same name. A recursive grep should show every implementation in the codebase.


Reformulated "all hope is lost".

In terms of ergonomics, I'd say having to recursively discover implementations is quite a stretch from reading a simple switch block that contains all the possible continuations case-by-case. Psychologically, obstacles like this can prevent you from maintaining the code.


In OOP polymorphism is the way. In functional programming, if you have exhaustive pattern matching, then that is the best answer. Given that the examples are in Java 8 I would use without doubt polymorphism.


It's important to take this kind of advice in it's context, which I would hand-wavingly describe as "imperative old-fashioned OO".

For example, the advice "Pattern 2: Switch to Polymorphism" definitely does not hold in languages with algebraic data types, where the compiler will check that your "switch" expressions check all possible cases. Given that at least Rust, Swift, Scala, and Typescript can all do this, this kind of advice is becoming increasingly outdated.


> Pattern 2: Switch to Polymorphism

As a matter of fact it does. Polymorphisms come in many forms (pun intended). The OO-pattern of class inheritance is simply just one of them - not quite so good too if you ask me :)

The author seems to believe that polymorphism is OO-specific. But it isn't. As a matter of fact, if/switch statements are _also_ polymorphic, often used in more procedural code.

The more functional way to match on different type constructors is a fine example of a polymorphic function.


> As a matter of fact, if/switch statements are _also_ polymorphic, often used in more procedural code.

Polymorphism has a straightforward meaning, that the same code is operating on different data types. Switch statements (if's are just a special case) are running different code on different types, so they're monomorphic.

> The more functional way to match on different type constructors is a fine example of a polymorphic function.

I think I see where you're coming from. Here's an example of what you're talking about.

    data Foo = Foo Int | Bar String | Qux

    what :: Foo -> Foo
    what (Foo num) = Bar $ show num
    what (Bar str) = Qux
    what Qux = Foo 0
That's not polymorphic; there's a single type there, and any given code is strictly acting on a single type of data. It's just a nice way of writing a case.

And, in fact, you find out it's not magic because it gets just as messy and confusing once you have to nest logic as with imperative programming, for example, if Foo contained another ADT, I'd often have to use a case statement or equivalent to manage it.


It's also important to consider comments in context :-) The advice I was commenting on explicitly advised against using switch statements.

I agree with everything else you've said.

(For the pedantic: one can argue about the definition of polymorphism. "Type constructors" do not create distinct types in an algebraic data type as implemented in ML and Haskell, so arguably destructing an algebraic data type using pattern matching is not an instance of polymorphism as the input type is not varying. However that is not germane to this discussion.)


Fascinating. I love constrained writing and constrained poetry.

No If-statements in code reminds me a bit of the E-Prime movement (https://en.wikipedia.org/wiki/E-Prime) which was about removing the verb 'to be' and all its variants like 'is', 'was' etc from English language. Supposedly it made English easier to understand and encouraged clarity and precision.


Although I generally agree with the concept the truth is it's not easy to stop using `if` or `switch` completely. By applying rules described in the post you are just pushing the problem around. At some point, somehow, a decision must be made which branch to follow. Want to create two methods instead of `if` fine, do it - how are you going to decide which of them to call? You want to use polymorphism, fine, do it - how are you going to decide which class to instantiate? What to use enums with methods on them, fine, how are you going to decide which enum to use? You may say "Oh, I am just deserializing user's input to enum. I don't have to decide." Fine, so you've just pushed decision onto users of your application. Sure, you may use some other methods, especially when you must choose from multiple options (map instead of switch, etc.) but you should always remember - just by avoiding decision making in one place, doesn't mean you've completely eliminated it from business process.


>... By applying rules described in the post you are just pushing the problem around. At some point, somehow, a decision must be made which branch to follow.

Right, and you should try and make that decision as early and as few times as possible. For example, let's say you work with older systems that have grown. You went from one type of config to two types, so now you have If (configOne) ... else ... scattered all over your code. You can't really get rid of the decision, it has to be made, but you can refactor the code to make the decision only once when the app starts up.


After all the regularities have been abstracted away, one is left, inevitably, with the messy details.


Interesting article and definitely some good advice, but I completely disagree with example 4.

Surely I can't be the only person who finds

    if (foo) {
        if (bar) {
            return true;
        }
    }
    
    if (baz) {
        return true;
    } else {
        return false;
    }
Much easier to parse than

    return foo && bar || baz;
It's a bit better to my eye with parentheses:

    return (foo && bar) || baz;
Of course the first version is far from perfect (no need for the nested if, etc.) but I've always found too many comparisons on one line just looks messy.


Personally, I find

    return (foo && bar) || baz;
The most readable.

The top version just has too many curly braces and parentheses. I can keep each branch in mind easily, but keeping the whole program flow at once requires me to first parse it and translate it.

The last version, I can read as "either baz, or both foo and bar", which is easy to keep in mind at once, and would be much easier to pattern match for errors ("Oh no, I wanted foo and baz, not foo and bar") than the top version, for which I would possibly have to iterate each potential execution path in my mind to spot a mistake.

Consider, more concretely, (since I can't think of a better example), knowing whether to give someone an alcoholic drink. Let's go US style. Either they order a soda, or they are old enough and have ID. Three variables: isOver21, isSoda, isIdPresent.

(Putting aside the fact that the order here is weird and it could be designed better) we can either go:

if (isIdPresent) { if (isOver21) { return true; } }

if (isSoda) { return true; } else { return false; }

Or we can go: return (isIdPresent && isOver21) || isSoda;

Now what if we accidentally swapped two variables:

if (isSoda) { if (isIdPresent) { return true; } }

if (isOver21) { return true; } else { return false; }

vs

return (isSoda && isIdPresent) || isOver21;

Note how in the first case, if I am tired, I could miss it. Most of the code paths seem correct. If someone is ordering a soda and has an ID they should get the drink. If they are not ordering a soda, and they aren't over 21, they shouldn't get anything. If they are over 21, they should probably get something.

In the second case, I can see all the logic in one line, and I can say "hold up, why are we asking for soda and ID, or age? Something seems off." Not that I can't do that with the multi-line if, just that it might take a bit longer.

Source: I've been bitten by multi-stage if statements on a few occasions.


Doesn't the if version also have the possibility of an unintended undefined return value (maybe JS only)?

I've been bitten by that before and thinking through all the possibilities when using multiple ifs is harder to me than with a boolean expression where a truth table can be used as a formal proof of validity.

The other downside to ifs is that the order of the ifs matters. That makes the code more brittle and requires the reader to parse the code on two levels: thinking about the booleans and keeping their values in mind over multiple lines of code. In a short sequence it's not a big issue, but it can be annoying when a very distant if statement affects your reasoning much later. Of course order matters without ifs, but everything important is on one line.

So the boolean version keeps all the relevant reasoning in one place -- no need to scan multiple lines of code (and risk missing an important guard).


I think it depends on what you're trying to do. Is the expression a fundamental business rule? Then splitting it up runs the risk of introducing error/inconsistency, I think.

But if you're just trying to condense down separate business rules, where it happens that today that you want the same thing in the case of `baz` as for `(foo && bar)`, then you're setting yourself up for a more painful change when someone comes to you and says "hey actually let's have `baz` do the other thing after all. Yeah, in the trivial case where it's just returning bools, that's not really an issue, but when I see code that looks like the former it's usually more complicated, and the overlap is more often incidental rather than fundamental.

So my approach is more "remove nested ifs" than "simplify it into as few cases as possible."


I guess it may be because you don’t remember the Boolean operations precedence? For me it’s much, much, much easier to parse the second expression in your post than the first monstrosity. Obviously it depends from the complexity of the expression, just 1-2 weeks ago I refactored an over-complicated Boolean expression to an if-else. But in this case I’m really overwhelming in favour of the extremely simple Boolean expression, and whenever I see an over complicated if I refactor it to a simpler Boolean expression if it’s possible.


Also, it's not really removing the `if`, it's just making it into `&&` and `||` instead, which are shorthand conditionals anyways. So a bit misleading to call this a anti-if strategy.m


If you have a function that's using if statements to decide which boolean to return, it's almost always better to return the expression directly.

`if (foo) { return true } else { return false }`

is the same as `return boolean(foo)`


Fair point. It's basically syntactic sugar.

Hard to say how to properly refactor out the if statements without knowing the problem domain. And I guess that's basically the point - using if statements means that you are writing imperative code rather than modelling the problem. You are specifying to the compiler how you want things done, rather than what it is that you want done.


It is a fairly well-known fact that -- most of the time -- fewer lines of code translates to fewer bugs and decreased maintainability costs.

Moving from 11 lines to 1 line is, for me, vastly superior readability. And anyone who has been programming for a couple years at least would have no problem parsing the order of operations and precedence between && and || (logical 'and' is always higher precedence than logical 'or' in every language I've seen), but parenthesis are a trivial addition if desired.


Pattern 1 is impracticable: most reasons why you would add a Boolean parameter to a function is to avoid code duplication:

void blah(int x, bool logInfo) {

   ...

   if (logInfo) ...

   ...
}

Pattern 2 can be horrible when applied to the wrong problem. As in: you can use polymorphism to implement fact(int x) without if statements, but it’s not going to be pretty...

Pattern 3 is not a pattern, it just says “if the if is useless remove it”...

Pattern 4 is particularly confusing, especially when you start to have nots in there:

var x = !(a || !b) // ugh

Pattern 5: doesn’t remove the if...


Pattern 1 you can pass in a logger there or a noop logger and not have any duplication.


It's not my job to pass a logger to some given class method. Should I pass in a database handle and a DNS resolver, too?


Well, I could do operations for logging that are otherwise not required...

   if(log) { logger.write(computestuff()); }


Depending on language if you have first class functions you could also do:

  logger.write(computestuff);
Or for parameterized functions:

  logger.write(() => computestuff(a, b, c));
Then all that logging policy stuff can live with the logging and doesn't need to be checked explicitly throughout the codebase.


This means to interrupt whatever you're doing and implement the logger and do the associated refactoring.


I would suggest one use these as a sign of "there is another way to write these code", rather than follow it without knowing why you need it.


am I reading this right? it seems number 3 is suggesting I accept Optional as function/method parameters? That is an anti-pattern if ever there was one, and it also adds another layer of if's instead of removing it. an empty collection is fine, but if I accept an optional, I need to first check that the optional isn't null (because it can be!) and if its not, check if the optional has a value(basically another nullcheck, jay!). optionals should only ever be used as a return-type. I hear a lot of people say that they should be used as parameters to indicate that one parameter is not necessary, but that's what overloads are for!


It's very weird to see people genuinely suggest, "Conditional logic is complicated, so try burying it in inheritance."


But it is complicated when that (I'm assuming you mean Pattern 2) switch-case expression is in a large number of locations. Every time you add or remove (in that example) species from your enumeration, you have to address that switch-case in every location (or verify that the default is fine).

By addressing it via (well, I'd use an interface or similar) inheritance the conditional logic isn't removed, but it is isolated to a few decision points. All you need to know is that you have a Foo, and my implementation whether it's a Bar or a Baz will provide the needed functions.


Semi-serious: switch cases are only complicated when you're using compilers that can't check totality for you.


I'll mostly agree with that. If you have a compiler that can check totality (I've been playing with Idris lately at home, that feature is fantastic), then the work of adjusting your switch-cases becomes much easier. But there's still a lot of work relative to, for instance, creating a type class (or in Idris terms an interface) and implementing the functions of that type class/interface for your specific data type.

Similarly, in OO-languages with interfaces, you can apply this same approach. It doesn't eliminate the conditional, it still exists in the code. But it moves where you have to think about it and reduces the redundancy across your code base.

With these things you don't even have to modify every function when you add a new value to your equivalent to an enum, you just add the implementations for that new type/class.

Though it does shift the difficulty when you want to add a new function/method to the interface. Now you will have to go to each implementation and add this new capability. But the nice thing, here, is that most languages with static typing (whether fancy dependently typed languages like Idris, or relatively simpler ones like Java) will make it immediately clear which implementations are incomplete.


IF statements increase cost of unit tests, is what I have found. By enforcing tight code coverage standards you start seeing IF statements disappear from code as developers do not want to write a test case for each of those IF statements.

With tools like JaCoCo we track number of branches and cyclomatic-complexity in our code (which is a measure of IF statements in most cases, try/catch being the other big source of branches). We try to keep that number down to 1 as much as possible. Once was able to take a method with over 160 branches down to 1 by eliminating every IF statement. The unit test was then very trivial and only had one test.


This doesn’t make any sense to me. You should still be testing the different behaviors, what difference does it make to testing if they are in a branch or in another class?


Garrett Smith tackled similar themes. While his emphasis is on Erlang with its function heads (a feature nearly every other language could really use) he also shows examples in Python.

His ideas changed the way I think about clean code.

* http://www.gar1t.com/blog/solving-embarrassingly-obvious-pro...

* http://www.gar1t.com/blog/more-embarrassingly-obvious-proble...


> Moderation in all things, especially moderation

Brilliant!

> being defensive inside your codebase probably means the code that you are writing is offensive. Don’t write offensive code.

There are some catchy insights here.


Looking at the code in the examples, the first style thing that I would suggest fixing is that the name of a boolean variable should be a yes/no question which the value is an answer to. So don't name the boolean "temporary". Name it "isTemporary" instead.

Moving on to the actual point, for a balanced view just look at cyclotomic complexity. In a function, count the decision points. Every if and loop counts. Any function with 10+ decision points is too complex and needs to be refactored to make the control flow more obvious.

Following this rule will let you use if quite happily, while also avoiding the problems with it.

(A side tip. If your language has some variant of "unless", don't use it. Ever. Our brains do not do de Morgan's laws naturally. So you can write code that is clearer upon writing, but is much harder to keep straight when you're debugging. The time that you need to understand your code best is while debugging, so ease of debugging is more important.)


It could be renamed "Things that suck about Java" and you wouldn't need to remove much. It's true that if-statements are something you might want to avoid in any language but Java doesn't give you as many ways to do it as I would like to see. The first part about the boolean is pretty universal for most languages.

For example the switch on type: wouldn't it be nice you could simply have a compiler error whenever you forgot to either add default or added all values? I use enums a ton in other languages that do and it's like a to-do list of code to implement if I would add for example another specie. Java just doesn't so the only compile-time safe way to have it is to force subclassing.

Equally the argument about not checking nulls inside your own application. Nullable pointers are a language problem. The argument to that function should not be nullable to begin with. It doesn't make sense to call it with null. Ever.

I find foo && bar || baz is pretty hard to parse as well. It might be easier to throw all three arguments in a switch / case statement and give all relevant states their own case. But that doesn't work in Java.

The hidden "null means something" example could easily be rewritten to a response that simply has a Error or Result state that either contain the error or the result we wanted to have. Java only allows you to return an object that you need to cast to use (Error extends BaseResult, Result extends BaseResult) or declare it might throw a particular type of error.

I think all if not most of these problems are solved in Kotlin. Java is just not a great language to express problems in a short and succinct way without introducing side effects or massive amounts of boilerplate (sub sub sub sub classing).

more experienced in Swift but Kotlin devs can follow the patterns I use


Let's eradicate nulls first, then do the ifs in your own time ;)


I agree that null/nil/None is one of the worst programming constructs ever invented [0] and deserves more attention than if statements.

[0] https://www.infoq.com/presentations/Null-References-The-Bill...


I'd argue against using abstract classes in general (just use interfaces) and stuffing logic in domain classes. Domain classes should be simple, not contain any algorithms or logic, and preferably be immutable. If you are putting ifs in domain classes you are doing it wrong. If you are moving logic from service classes to domain classes, you are no longer separating concerns.

If you have sixteen polymorphic methods in one service class (which is of course possible), maybe using a switch statement would actually be a bit shorter and easier on the eyes. It depends. I think about this in terms of unit tests and code paths. More ifs means more paths. Units with lots of code paths are complex and hard to test. On the other hand having code paths cross many classes and methods just moves the problem. It's a tradeoff. A few other tricks you can apply here is using enums with e.g. lambdas and other feature flags. That's a nice way to tie functionality to types or associate some default functionality with domain classes without littering them with logic.

Not using abstract classes is somewhat controversial for some but I have disliked them for quite long and find I don't need them and can trivially refactor them away whenever I encounter them in an codebase. Preferring delegation over inheritance is a good thing IMHO and using inheritance generally is something I end up regretting. I find framework code (Spring, cough) that uses abstract classes to be mostly unreadable and convoluted. You have a Thingy that is also an AbstractThingy that inherits an EvenMoreAbstractThingy that implements a gazillion interfaces that are also implemented by other thingies and yet more AbstractClasses, etc. If you have to control click through six classes to actually get to a place that actually has logic, something is deeply wrong. Deep inheritance hierarchies are a good sign your type system is struggling with your abstractions and sets you up for maintenance problems when the cohesiveness of your classes drops and coupling increases as you are forced to add stuff in places where it really shouldn't be for API compatibility reasons.


> Pattern 5: Give a coping strategy

> Context: You are calling some other code, but you aren’t sure if the happy path will succeed.

I’d love to see some good examples of this if-less code in React.

I find myself writing conditionals in data provider component render methods for things like “isLoading”, “hasError”, and “items.length == 0”.


Testing if(null) inside my code always makes me feel like I've been punched in the face. Why should my code have to be on the lookout for bad parameters all of the time? But on the other hand, my code should be on the lookout for bad parameters all of the time. Advice?


If it's not an interface, then it is probably correct to crash when called incorrectly. Problems should be found early on. If it is an interface, you may need special processing. A clear way of doing this is to do the sanity checking in the interface method and then pass to a module-private method that does the work. Internal logic that calls the same method can bypass the sanity check.


>> Solution: Use a NullObject or Optional type instead of ever passing a null. An empty collection is a great alternative.

That doesn't work if you are building a library, or working with other software developers. The alternative is to have a strictly typed language that forces the user to pick the type of the variable.

It doesn't also fix the problem of whether you function is returning a functioning result or an error.

That's one of the thing that I like about idiomatic Rust. The Result/Option types are implemented as standard language features. You get to use them from the get-go and they reduce lots of if/then/null friction.


While I agree on reducing the amount (especially either too long or nested) ifs in the code, I disagree with some of the proposed patterns, especially the #3.

Sometimes we might want to use "null" rather than an empty collection to signal something different i.e null -> error, empty collection -> no result (avoiding the debate on whether we should use some exceptions for all error management).But generally speaking, creating a new object everytime instead of passing/checking null looks like a real potential loss of performances in some (not so rare) cases.


Instead of passing/returning null, there are much clearer things for all your examples, depending on the situation: Optionals, Empty Containers, Errors. Just using null communicates very little.


Empty container isn't much better communication aid, only in some specific situations. Much better would be for VM to save the stack at the "null" value creation time and later attach that stack trace to the NullPointerException. Perhaps it's finally time to adopt this from Lisp :D


Said so many times. Do not ever use nulls as flags.

It’s incredibly like Python exception handling: it works well until you try to work with a somewhat complex codebase: eg paramiko. The IOErrors can be raised by any layer of the stack. So how do I handle this in the code?

Well, turns out I can’t definitely tell one from another.

Same is with the null values: never use nulls as a flag unless you’re absolutely sure that null value will not be boxed in another “Optional”.


So your understanding of the pattern #3 is that, null was being used instead of an empty list, and in this case it's better to use an empty container. In that case, it makes sense, that's not how I understood it at first.

Oh and, I agree about the not using null thing at multiple levels.


Pattern 1: Reasonable

Pattern 2: Yes absolutely

Pattern 3: Seems a bit inefficient, allocating memory when a simple if statement would avoid calling the method at all.

Pattern 4: Would prefer

if (foo && bar) { return true; }

return baz;

Than the proposed solution which is harder to read.

Pattern 5: Just no! The repository getRecord method should have no care about coping strategies, which may change depending on the calling context. It should instead throw an RecordNotFoundException. You could also use success and error callbacks in the method signature.


Regarding pattern 3, I think the most correct way to do what the author proposes is to use Collections.EmptyList(). You get an immutable empty list, which by virtue of being immutable need only be allocated once.

[1] https://docs.oracle.com/javase/7/docs/api/java/util/Collecti...


Surprised it didn't mention passing closures around ...



It probably makes me rather pedantic, but if you write about programming style, I expect you to format your code consistently. It's either `if(record == null) {` or `if (result != null) {`. Make up your mind (I know which one I prefer, but at least be consistent)


I'm not sure if it's that simple. Sometimes, you want to guard your code with an early return, which means `if (x == NULL) return errorvalue;`. However, if the function should handle both cases, I prefer to start with the "happy path", which usually means `if (x != NULL) { happy; } else { handle_sad_case; }`.


Yeah, I was addressing the inconsistent space between "if" and opening parentheses.


In "Pattern 2: Switch to Polymorphism", he recommends not writing huge switch-statements, but instead recommends inheritance. Polymorphism is entirely doable without inheritance using interfaces and parametric polymorphism. Besides that, great advice.


Mostly good advice, except that being able to execute the code in your head is a good thing. You know you have it bad when you can't do this reliably and have to break out the debugger.

(Of course, debuggers are useful too, but code that requires them is not good.)


Dynamic combinators can compose logic without if statements or switches.

Imagine a combinator that takes an array of curried functions that are composed from a filter or reducer. No if statement needed if you can dynamicly compose a functional pipe


The trouble with solution 1, removing a boolean method parameter and instead creating two methods like:

    public class FileUtils {
        public static void createFile(...) { .. }
        public static void createTemporaryFile(...) { .. }
    }
is that there is now no way to create a single wrapper function.

Let's say you want to create a function "createFileAndLog" which takes the boolean parameter and passes it to the underlying "createFile" function, now you have to create two "createFileAndLog" functions rather than just one, even though you didn't have the if statement in your code.

I haven't provided a great example, I know, but I've definitely split up such methods myself, felt good about it for increasing readability, then come to regret it later.


Pass the "create file" function itself as a parameter to the "create file and log" function?


There is actually a current book out that has 70 of similar advice. Java by Comparison: https://java.by-comparison.com/


Seems like an interesting book, but for some reason (your username and the fact that the only two comments you have ever written were about this same book) I am getting a feeling that you might be one of its authors.

IMO, if you are you should disclose that when you write a comment. I am all for people promoting their stuff — as long as they do so in a way that make it clear that they stand to gain from sales.


All of this is not new. For anyone interested in similar patterns, check out Refactoring: Improving the Design of Existing Code by Martin Fowler.


I've disagreed with many of Martin Fowler's opinions. He assumes future changes will mirror current variations. That's often false. It's over-extrapolation of what's known as a pattern for handling a future that may not fit the current pattern.


Mostly good advice, but without multiple dispatch, polymorphism can only work in the simplest cases, and even then it's often not worth the boilerplate.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: