Just wondering, what is it about testing repetition [a-z]{1,256} with an upper bound that's so heavy? Intuitively it feels like greedy testing [a-z]+ should actually be worse since it has to work back from the end of the input.
This depends heavily on how repetition is implemented.
With a backtracking-search implementation of regexes, bounded iteration is pretty easy.
But the linked webpage appears to compile regexes to finite state machines (it shows you their finite-state-machine, for instance), and eg [a-z]{1,256} will have 256 states: 256 times the 1 state needed for [a-z]. If [a-z] were a complex regex, you could get a combinatorial explosion.
This alone probably isn't the issue? 256 is not a very large number. But I suspect there are follow-on algorithmic issues. This is just speculation, but I wouldn't be surprised if that 256-state machine were computed by applying DFA minimization, an algorithm with worst-case exponential running time, to a more naively generated machine.
you're right. inclusion/intersection/etc. aren't actually computed via DFA but instead are computed directly on the regular expression representation itself. and large disjunctions (with 256 branches) are what is very heavy.
(it's possible to instead do these operations on DFAs but at the time i found it hard to get from an automata back to a reasonable-looking regular expression.)
the library uses a fairly simple data representation where x{m,n} is compiled using conjunction and disjunction. so x{1,4} ends up being represented as x|xx|xxx|xxxx.
this simplifies the code for testing equality and inclusion, since logically x{n} is just xx... (n times) and x{m,n} is just x{m}|x{m+1}|...|x{n}.
but when you have x{m,n} and n-m is large you can imagine what kind of problems that causes.
if you replace that with (...)+ then it seems to work (at least for me). smaller expressions like (...){1,6} should be fine.