Hacker News new | past | comments | ask | show | jobs | submit login

> 1. Only supporting DFA-able Rust regexes. I'd love to use a lighter-weight regex library in ag, but users are accustomed to full PCRE support. Switching would cause me to receive a lot of angry emails. Maybe I'll do it anyway. PCRE has some annoying limitations. (For example, it can only search up to 2GB at a time.)

The standard trick here is to use the faster method for searches that it supports, and use the slower but more capable method only for searches that require it. Parse the regex, see if a DFA will work, and only use PCRE for expressions with backreferences and similar.




Do you know any engines that actually do this? As in, is it really standard? I thought maybe Spencer's Tcl regex engine did it? Although I confess, I've never read the source.

I guess RE2/Rust/Go all kind of do it as well. For example, RE2/Rust/Go will actually do backtracking in some cases! (It's bounded of course, maintaining linear time.) But this doesn't actually meet the criteria of being able to support more advanced features.


This is probably a digression.. While it's not a regex engine, and I suppose this is standard in compiler development, the Neo4j query planner team uses this approach extensively to incrementally introduce new techniques or to add pointed optimizations.

For instance, Neo4j chooses between a "rule" planner, which can (at least while I still worked at Neo) solve any query and a "cost" planner, which can solve a large subset. For those queries the cost planner can solve, it usually makes significantly better plans, kind of like the example with regex engines here.

For the curious, that happens here: https://github.com/neo4j/neo4j/blob/3.1/community/cypher/cyp...

Likewise, once a plan is made, there are two runtimes that are able to execute them - an interpreting runtime that can execute any plan, and a compiling runtime that converts logical plans to JVM bytecode, much faster but only supports a subset.

That choice is made here: https://github.com/neo4j/neo4j/blob/3.1/community/cypher/cyp...

This goes on in finer and finer detail. Lots of similar examples in how the planners devise graph traversal algorithms on the fly, by looking for patterns they now and falling back to more generic approaches if need be.

FWIW, the overhead of this has, I would argue, massively paid for itself. It has made extremely ambitious projects, like the compile-to-bytecode runtime and the cost-based query planner safely deliverable in incremental steps.



> Do you know any engines that actually do this? As in, is it really standard?

I don't know about choosing between multiple regex engines, but GNU grep and other grep implementations check for non-regex fixed strings or trivial patterns, and have special cases for those.


Well... yeah... I was more thinking about really big switches like between PCRE and a DFA.

It looks like the sibling comment found one that I think qualifies based on the description (assuming "NFA" is used correctly :P).


Or use fancy-regex. Not ready for prime-time, but potentially the best of both worlds.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: