Monoid machines: a O(log n) parser for regular languages (2006) [pdf]

pjscott · on July 12, 2018

If I'm reading this right, it's a very cute trick. If not, maybe someone will correct my summary:

You can parse regular languages with a DFA -- think of a set of states and a transition function saying "if I'm in state x and we see letter c, go to state y".

Equivalently, for any letter you can ask "for each DFA state, where will this letter take me?" Like, "if we see an 'a', state 1 will go to state 5, state 2 will go to state 1, ...". You could represent this as a lookup table with an entry for each state, if you like.

These functions can be composed in the obvious way: just look up where the first one will take you, then look up where the second will go from there. So e.g. from the functions for 'c', 'a', and 't', you can get the function for 'cat', that tells what the DFA will do -- starting at any of its states -- if it next sees 'cat'.

(For a DFA with a constant number of states, composing two of these functions takes O(1) time.)

Composition is associative, of course. If you see 'ca' and then 't', this is the same as seeing 'c' then 'at', which is the same as seeing 'cat'.

To run a DFA on some string, you can think of it as equivalent to using the first letter's function to see where you go from the start state, then using the second letter's function on whatever state you end up in after that, and so on... which is the same as using the entire string's function to get to the final state in one jump.

How do you compute the whole string's function? Just compose the ones for its letters. This composition is associative, so you can do it in parallel as a tree-structured thing. For a string of length n, you can do this in O(log n) time as long as you can throw enough processors at it.

Boom, logarithmic-time parallel regular expression matching.

EDIT: I thought this sounded familiar, and sure enough: Danny Hillis and Guy Steele wrote essentially the same thing back in 1989, as one of the examples in "Data Parallel Algorithms". They also give a very clever explanation of how this can be extended to make a parallel lexer, also in logarithmic time:

http://cva.stanford.edu/classes/cs99s/papers/hillis-steele-d...

glangdale · on July 13, 2018

I mention the same trick, without the theoretical underpinnings, in https://branchfree.org/2018/05/25/say-hello-to-my-little-fri... in my notes, although the parallelism here serves no practical purpose on modern hardware at this time.

I do this stuff with permutes and of course permutes are associative.

The limitation is that you can do this only on DFAs that are the size of your biggest usable permute (currently 16, but will be 64 once AVX512VBMI comes out - looking dicey now with the logjam on 10nm Cannonlake :-( ).

The idea also appears in Guy Blelloch's parallel prefix paper https://www.cs.cmu.edu/~guyb/papers/Ble93.pdf. To my knowledge actual practical regex matching using this technique is thin on the ground. For a start, detecting not just the fact that something accepts but that it accepts at a particular state at a particular offset (not part of the formal regular expression definition, but very useful) is less straightforward in these parallel formulations.

batmansmk · on July 13, 2018

Thanks! The paper is hard to read, mostly due to poor editing and writing skills :S. I didn't know we could call a process log N assuming we have infinite processors in parallel.

abecedarius · on July 13, 2018

Yes, that matches my memory from Hillis's book based on his thesis (iirc it came out a few years before that paper).

I imagine you could also do this for NFAs -- composing them is just a little extra work.

pkhuong · on July 13, 2018

Yes, you can do the same thing with set of states. In either case, https://en.wikipedia.org/wiki/Method_of_Four_Russians might be more practical.

foota · on July 13, 2018

I'm missing something here, how do you know your start state when you do it in parallel?

Tarean · on July 13, 2018

You don't compute an end state, you compute a transition table. For each character you parse you look up the corresponding table and then you compose them.

Here is a paper on how to do this efficiently with simd instructions https://www.microsoft.com/en-us/research/publication/data-pa...

pjscott · on July 13, 2018

The part you do in parallel is computing the function "If the DFA is in state x, and it sees the whole input string, where will it end up?" You can represent that function as a table, one entry for each state. Once you've got that table, just look in the entry for the start state.

Dylan16807 · on July 13, 2018

So each thread might have to do up to m times the work, where m is the number of states? That sounds like it might be a pretty bad hit on the number of processors you need just to break even.

mac01021 · on July 12, 2018

Interesting. One thing they leave implicit is the amount of work that parsing requires.

The result is that you can parse in time O(log n) if you use O(n) processors, so parsing requires O(nlogn) work. (This is probably to be expected.) I think this means that, if you have o(n) strings to parse, you're better off with the good old DFA solution which uses O(n) work and parsing the different strings in parallel.

gue5t · on July 13, 2018

Yes, given no context one assumes serial computation, where work and depth are identical. An O(log n)-work parser would be astounding, because it's impossible--it would have to ignore effectively all of the input. This isn't clickbait because it's a real article and a good technique, but it's definitely exaggerating the surprise factor with misleading phrasing.

smadge · on July 13, 2018

Pretty cool! I was having trouble understanding the relationship between monoids and deterministic finite automata, but once I worked out an example by hand I was able to implement that transformation in Haskell:

    data MonoidMachine monoid input = Monoid monoid =>  MonoidMachine
      (input  -> monoid)
      (monoid -> Bool)

    -- From the proof of Theorem 1 in the paper.
    translate :: DFA state input -> MonoidMachine (Endo state) input
    translate (DFA _ _ delta initial accepting) = 
    MonoidMachine f g where
      f a = Endo (\q -> delta q a)
      g (Endo h) = h initial `member` accepting

    runMonoidMachine :: MonoidMachine monoid input -> [input] -> Bool
    runMonoidMachine (MonoidMachine f g) = g . mconcat . map f

The rest is here: https://gist.github.com/newjam/428613f083d289b83de0ddfeedf4e...

dwenzek · on July 13, 2018

Here is a very smart blog about "Incremental Regular expressions" which is based on this relationship with monoids.

http://jkff.info/articles/ire/#f7