Generators are a fairly natural way to write recursive descent parsers. The alternatives are either to parse everything and return it in one big structure (which can be awkward for large documents) or to supply Visitors (which works, but often doesn't match your mental model).
It's nice to be able to write "a lexer just yields a stream of tokens" and "expr ::= term op expr" as:
function* expr() {
x = term();
operator = op();
y = expr();
yield apply(operator, x, y);
}
Backtracking takes a little setup, but overall it's a very elegant way to write code.
Not many people really need to write parsers, but even if you're using a black box from somebody else, it can be fairly elegant to use if it supplies it as an AST generator or result generator.
+1 There is a lot of code that is easier to write in "push style" where the code runs blocking and "pushes" results (either to a queue, a callback or a result array that is returned at the end) but it is better for the consumer if it is "pull style" where they can process items as they receive them and can do streaming/incremental parsing. Language supported generation functions/courtesies make it very easy to write in a "push style" while being possible to call in the "pull style".
An example of what you’re describing in a different domain, is how we are using channels between tasks in FreeRTOS in our embedded firmware. Other tasks just push commands or data into a channel that the consumer can pull at is leisure (modulo the size of the channel of course, we have fun constraints!)
In my own lexer in JS, I considered using a generator but instead used a class with an index property. In your experience, what advantages does a generator have over that approach?
The big advantage (and possibly disadvantage) is that the class properties (i.e. the internal state of the iterator/lexer) are implicit, which means you don't need to worry about setting them and restoring them each time. For example, you could write something like this:
let idx = 0;
while (true) {
switch input[idx] {
case '"': {
const [token, length] = lexString(input.slice(idx));
yield token;
idx += length;
case '...': {}
}
}
Here, we don't need to worry about saving the state of the index, because it's implicitly "saved" at the yield, and we know we'll end up back on the line after the yield once control is returned to the generator.
In this case, where the only relevant state is the index, that's not that much of an advantage, but it could be more of an advantage if you were writing a generator function with more internal state - for example, a parser using a state machine to determine which tokens are valid next.
That's really cool. Do you have any further more detailed examples of writing such a parser using generators? It's very coincidental because I've been learning about this topic recently.
It's nice to be able to write "a lexer just yields a stream of tokens" and "expr ::= term op expr" as:
Backtracking takes a little setup, but overall it's a very elegant way to write code.Not many people really need to write parsers, but even if you're using a black box from somebody else, it can be fairly elegant to use if it supplies it as an AST generator or result generator.