This is neat! Some if the alternative solutions discussed here seem to confuse compilation with evaluation. Fortran is trying to rewrite the computation in such a way that, later on, a machine that knows nothing about precedence can evaluate the expression. It's incredible that such a rewrite can be performed from left to right without recursion and while using only O(1) memory at a time.
The Wikipedia article about Operator Precedence Parser [1] is very informative. It describes this technique as well as other ones such as Pratt parser and precedence climbing.
I would suggest against relying too closely on this article in its current state. The "precedence climbing" code is needlessly complicated -- it has a nested loop which doesn't do anything, only one loop is necessary -- and it doesn't really explain what's going on (which is a relatively simple idea), just writes some pseudocode and mechanically evaluates it.
Pretty neat! Takes advantage of the fact that the parser needs to use a recursion or stack to handle parenthesis. If it can handle parenthesis, it has all the tools necessary to handle operator precedence. If we think of how a recursive descent parser would implement operator precedence, by means of one level of recursion per each precedence level, this trick is basically encoding that in terms of parenthesis.
The problem where it messes up precedence if you already use parentheses, so that you need to add more parentheses to the substitutions, reminds me of the problem with one common explanation of CSS specificity. Simplifying the model a bit, specificity is a tuple of (id, class, tag), so that `#foo #bar` is (2, 0, 0) and greater than `#foo .baz` which is (1, 1, 0). Well, some tried saying “treat it as a number”, leading to 200 and 110, and you can clearly see which number is bigger. The trouble with that approach is the numerical base: no number of classes will outweigh an ID, so (1, 11, 0) is not greater than (2, 0, 0), but if you want to represent it as a three-digit number, well, you’re stuck. Just like you can here add more parentheses to the substitution to keep it working, you could increase the numeric base; e.g. at this point hexadecimal would still work, 0x1b0 < 0x200.
Not sure exactly what to call this category of limitation, but you see it in diverse places where people have designed things without considering the upper limits of what they can cope with.
Implementing operator precedence isn't that hard actually.
Essentially you take note of the first operator's precedence and keep consuming input until you reach an operator with lower precedence than you currently have. The precedence you use for comparison is updated each step.
Only thing you have to watch here: with a recursive descent parser you might need an extra step to fix the associativity (right-to-left vs. left-to-right) if that's important for your application.
It's a problem that's been around for quite a while. Like, since the dawn of computers. And it's been solved numerous times in compiler generators, explained in textbooks, etc. Right now most solutions compete on clarity, simplicity and adaptability, not solving the problem itself.
These days the general consensus is to use the Pratt parser, which is somewhat harder to understand, but ends up not too hard to build and integrate into a recursive decent parser, and is supereasy to modify.
> What I find funny is that Pratt's parser was never really a theoretical result and was never properly covered in popular compiler textbook.
The theory behind precedence parsing was studied a decade earlier by Floyd (who is cited in Pratt's paper). Pratt's own paper is more about the justification of considering operators to have semantic importance rather than productions of nonterminals.
One recent book that covers it is Crafting Interpreters, by Bob Nystrom.
Maybe if we give it some more time it will start counting as a popular compiler textbook ;)
What I wanted to say is that it's a popular practical algorithm. But theoretically speaking it's a variant of the shunting yard algorithm. Books typically describe the latter.
It's only not hard if you have a parser that allows you to carry that kind of context. Admittedly, operator precedence is so important that most parser (generators) have some canonical way to handle it. But try to implement a language with user-defined binary operators and consider the mess at your hands.
Pratt parsing (as described in https://matklad.github.io//2020/04/13/simple-but-powerful-pr...), handles operator precedence and different operator types (prefix, infix, postfix and mixfix) in a very neat manner, and it's easy enough to stuff user-defined operators in there.
What complexity?! The precedence is a simple integer value. Even when you include associativity, there are only two options: right-to-left, left-to-right.
Doesn't matter whether that information is language defined or user defined.
For instance in Haskell, that's all the information you provide. You'd use either `infixl` or `infixr` (depending on which associativity you want) followed by an integer defining the precedence for the given operator.
The parser just needs a table to look up the precedence (and associativity) for a given operator. But a parser already needs a bunch of different tables in order to work, so that's certainly not a problem.
> The parser just needs a table to look up the precedence (and associativity) for a given operator. But a parser already needs a bunch of different tables in order to work, so that's certainly not a problem.
Think about how you'd implement an independent parser for such a language, say for the sake of pretty printing it to html. Module A uses a user-defined operator that has been defined in Module B. You effectively cannot parse A correctly before you parse and load B, but the fact that you need to parse and load B is "hidden" in some import declaration that in turn needs to be parsed first. That's what I call a mess.
What alternatives do you believe offer easier support for user-defined binary operators? Almost all parsing technologies are more rigid and more complicated that numerical precedence offers.
I also find operator precedence quite easy to implement.
In my parser, when it encounters two binary expressions `a * b + c`, it always initially parses them as `a * (b + c)` into temporary AST nodes. Then it compares the precedence of the operators and, if needed, rewrites the nodes to form the correct expression.
> Implementing operator precedence isn't that hard actually.
Like all the claims about making languages: Truth until is a big fat lie.
Try to do the full precedence for more complex languages than formula evaluators could become harder than expected (you think you nailed, then a edge-case happens).
There are other unary operators (!, ~), although I'm almost sure FORTRAN didn't support those, but 1 - -13 => 1 - 0 - 13 = -12 would fail, wouldn't it?