I don't think I'm all *that* well-known. :-) Staying away from streaming is a go...

I don't think I'm all that well-known. :-)

Staying away from streaming is a good idea for a lot of designs. We all lost a lot of our hair making streaming work in Hyperscan, and it was often the thing suppressing easier optimizations. Or at least we would have to have a separate approach for streaming and block mode; not exactly great s/w engineering.

Word-based approaches are interesting. So is large scale. Hyperscan does embarrassingly badly on natural language stuff at scale due to some baked in assumptions about relative frequencies of literal factors and the literal matchers tend to fall apart on really huge literal sets in this domain. So there's definitely an opportunity here.

I would strongly encourage you to produce some repeatable benchmarks (at least, things where we can see what the task is for something like RE2 or Hyperscan). We spent many years in stealth mode with a lot of superlatives and NDA-only Powerpoint decks and I don't think it did ourselves any favors. It means that you wind up being the only guardian of your own honesty, which is ... tricky ("What the hell is a medium-complexity pattern, anyhow?").