XL features 8 simple node types, 4 leafs (integer, real, text, name/symbol) and 4 inner nodes (infix, prefix, postfix and block). With that, you can use a rather standard looking syntax, yet have an inner parse tree structure that is practically as simple as Lisp. The scanner and parser together represent 1800 lines of C++ code with comments. In other words, with such a short parser, you have a language that reads like Python but has an abstract syntax tree that is barely more complicated than Lisp.
It's also the basis for the Tao 3D document description language used at Taodyne, so the approach has demonstrated its ability to work in an industrial project.
I designed the Earl Grey language (http://breuleux.github.io/earl-grey/repl/) on pretty much exactly the same principle, it's pretty interesting to see someone else had the same idea. Operator precedence grammars are surprisingly powerful and pretty simple to write (you can write a working parser in ~100 lines of Python or JavaScript, including support for multifix).
I did simplify the AST further than XL and added a few invariants. I have 3 leafs (literal, symbol, void) and 3 inner (send, multi, data). The inners roughly work like this:
f x (send f x)
[x] x
[x, y, ...] (multi x y ...)
{x, y, ...} (data x y ...)
a + b (send + (data a b))
+ a (send + (data (void) a))
a + (send + (data a (void)))
[] serve as grouping brackets and are therefore purposefully erased from the parse tree if they only contain one expression. Prefix/postfix operators are interpreted as infix operators that have a blank operand on either side. Infix operators are themselves reduced to send/data, so "a + b" is equivalent to "[+]{a, b}". I find that a bit more robust and easier to manipulate generically.
Judging from the article, if you parsed "if 3", you'd get back out
(prefix if 3)
instead of an error, since `if` is just a regular prefix operator. So, presumably there's some sort of well-formedness checking that goes on after parsing? Does anyone know more? The website says very little.
True, 'if 3' will stay there if there is no rewrite for it. You can also use treat a lack of possible rewrite as an error, this is what the Tao 3D document description language does. You'd get an error like "No form matches 'if 3'".
Underneath, there is a rewrite operator, ->, that means to transform the shape on the left into the shape on the right.
The precise semantics are a tiny bit more complicated than that, but it iterates until there is no more transformation that applies. Some specific rewrites will be transformed into "opcodes" of the underlying machine model. So A+B with integers on each side will execute an addition (the inner representation of the transform is something like A+B -> opcode Add).
Yeah that puzzles me a bit too. Multifix operators are not more complicated than prefix/postfix/infix (well, only marginally). For instance, you can give each operator a left priority and a right priority and merge them when they meet with the same priority. Then `if x then y else z` would become (if/then/else x y z) or something like that. I think that's saner and easier to handle than what he does.
In XL, a multifix is a combination of infix, prefix or postfix. Note that the various nodes exist only for the programmer's convenience, i.e. they represent a user-visible syntax. Internally, I could do with only a node with two children.
Something like [A,B,C,D] writes as:
Block with [] containing
Infix , containing
Symbol A
Infix , containing
Symbol B
Infix , containing
Symbol C
Symbol D
So this allows me to represent multifix along with their expected syntax. I could also represent for example:
[A, B; C, D]
if A then B else C unless D
for A in B..C loop D
The last one is actually one of the for loop forms in XL
Multifix is more general than a combination of infix, prefix and postfix, though. For instance, a [] block is a multifix operator, but you can't implement it as a combination.
The advantage of supporting multifix directly is that you get a more convenient and less confusing representation. For instance, given "if A then B else C" why should I expect (else (then (if A) B) C) and not (if (then A (else B C))? It's not obvious, and both are unwieldy compared to a ternary operator like (if/then/else A B C).
Multifix is not more general than combinations of the 8 node types in XL, only more powerful than a combination of some subset.
It's not simpler. XL started with a multifix representation, see http://mozart-dev.sourceforge.net/notes.html. Switching to the current representation was a MAJOR simplification.
The current representation captures the way humans parse the code. Infix captures "A+B" or "A and B". Prefix captures "+3" or "sin x". Postfix captures "3!" or "3km". Block captures "[A]", "(A)", "{A}" or indentation. Since humans perceive a difference, you need to record that difference somewhere. XL records that structure in the parse tree itself, not on side data structures such as grammar tables.
This approach also enables multiple tree shapes that overlap. Consider the following XL program (I replaced asterisks with slashes because the asterisk means "italics" for HN):
A/B+C -> multiply_and_add A, B, C
A+B/C -> multiply_and_add B, C, A
A+B -> add A, B
A*B -> mul A, B
In that case, I can match a multifix operator like multiply_and_add without needing a representation that would exclude matching A+B or A/B in other scenarios. This is especially important if you have type constraints on the arguments, e.g.:
A:matrix/B:matrix+C:matrix -> ...
A:matrix/B:real+C:real -> ...
Those would be checked against X/Y+Z in the code, but if X, Y and Z are real numbers, they would not match.
I have gone the other way with the languages I designed, from the system you describe to multifix, and have had the opposite experience. A proper implementation of general operator precedence grammars is much simpler than implementing prefix, postfix, infix and blocks specifically.
Now, I'm not sure how you implemented multifix, but a good general implementation for it is not immediately obvious, so I suspect you simply used an algorithm that's more convoluted than necessary and made you overestimate the scheme's actual complexity. Implemented properly, though, it's ridiculously simple. To demonstrate, I have hacked together a JavaScript version here (sorry, I didn't get around to commenting the code yet):
The core of the parser is the oparse function, which clocks in at 32 lines of code. Along with a basic tokenizer, order function, evaluator and some fancy display code, the whole thing is barely over 200 LOC. I dug around for XL's source and from what I can tell, what you have is not simpler than this.
I also never suggested making multifix operators like multiply_and_add. Multifix is for operators which are inherently ternary, quaternary and so forth. For instance, when I see "a ? b : c" I don't parse it as prefix, postfix or binary infix, I parse it as ternary.
XL features 8 simple node types, 4 leafs (integer, real, text, name/symbol) and 4 inner nodes (infix, prefix, postfix and block). With that, you can use a rather standard looking syntax, yet have an inner parse tree structure that is practically as simple as Lisp. The scanner and parser together represent 1800 lines of C++ code with comments. In other words, with such a short parser, you have a language that reads like Python but has an abstract syntax tree that is barely more complicated than Lisp.
It's also the basis for the Tao 3D document description language used at Taodyne, so the approach has demonstrated its ability to work in an industrial project.