Rx is a perfect thing to bring up in this discussion, and I encourage everyone t...

afiori · on Feb 10, 2023

There is a third point that trumps these two: having a structured format allows you to programmatically compose regular expressions.

In most string encoding it is hard or unergonomic to safely embed regexes or string literals in other regexes.

In this sense a regex is quite similar to SQL, just used for simpler operations generally.

It is to be noted that while regexes are used to parse (mostly) regular languages, the language of regex expressions is fully context-free.

Personally I dislike manually embedding context-free languages inside a regular encoding (strings) already embedded inside another context free language that could have just added support for structured regexes in the first place.

nemoniac · on Feb 10, 2023

> Rx expression grow large very quickly, so for any non-trivial problem, you'll quickly reach the point past which it's less readable than the raw regexp, by virtue of sheer size.

That's not even the case. You can define and compose rx expressions. This allows you to build larger more complex rx expressions from smaller simpler ones in much the same way as you manage the complexity of a large program by building it up from subroutines.

TeMPOraL · on Feb 10, 2023

You can compose raw regexes too. There's a bit more things to keep track of than with composing clean s-expressions, but you can do it, and if you try, you'll hit the same problem as with composing Rx.

> build larger more complex rx expressions from smaller simpler ones in much the same way as you manage the complexity of a large program by building it up from subroutines.

Yes, and with it comes a problem: factoring out legos and composing the solution out of them reduces complexity, but it sacrifices locality. There's no free lunch: some things get easier because you get to ignore irrelevant detail, other things get harder, because the relevant details are all over the place. Humans have a limited working memory, so too much composition, or factoring along the wrong dimension to the problem you're solving, destroys readability - the solution no longer fits in your head, and you keep chasing pointers, constantly evicting one piece of the puzzle from your head to fit another one.

In this context, terseness and locality become desirable qualities. They make you expend more cognitive effort up front, but save you the working memory overhead of extra abstractions that come with composable pieces, and eliminate the cost of pointer chasing, as the whole thing is literally in front of your eyes all the time.

This is, to my understanding, the actual reason math-heavy fields (including mathematics itself) stick to dense equations built of single-character names chosen from several alphabets - it ultimately saves time. A single line in a math paper may fully express a complex thought which, were you to rewrite it in "clean code" style, would take 10 pages and involve several extra layers of abstractions.

I'm not advocating we should rewrite everything in APL (though I'm also not convinced it wouldn't be better on the net) - abstraction and composition are the fundamental tools that let us deal with complexity. But they have a cost, and sometimes that cost is too high. Regular expressions are, in my experience, usually a case of that.