Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Damn Cool Algorithms: Levenshtein Automata (notdot.net)
99 points by mbowcock on March 17, 2011 | hide | past | favorite | 8 comments


For those of you like me who tried to read this article without the proper education to understand everything at first pass, here is the definition of a couple of acronyms he used without defining first:

NFA = Nondeterministic Finite Automata (http://en.wikipedia.org/wiki/Nondeterministic_finite-state_m...) DFA = Deterministic Finite Automata (http://en.wikipedia.org/wiki/Deterministic_finite-state_mach...)


I wonder if this method could extend to character transposition and weight-able costs; e.g., allow "hello"/"ehllo" = distance 1, as logically this is a single typing mistake of whacking the 'h' a bit early.

I've used the perl String::Approx module to great effect, but it takes a lot of resources to blast through a large dictionary.


You can do both supporting variable weights and transposition. The key is understanding the initial construction of the DFA.

It's pretty general and clever in its simplicity. It's really just trying all possible edits, recording the cost and progress until the costs exceed the budget or you've reached the goal.

To handle transposition, you need the state to remember the last character typed if it happens to match the next character, so the state space gets bigger. In your example for hello, from the start state you'll introduce a state for starting with e. This state encodes typed 'last char was e, matched nothing, cost 1'. From here, when you get an h, you move to the 'matched first two characters, used cost of one' state.


I thought for a moment that Nick is back writing blog posts, but this is from the archives.

If you are a developer, esp Python and AppEngine, you really should read all the archives - it is one of the best general online sources for python, appengine (and the datastore) and dev.


I wasn't familiar with the site but found this post interesting. After digging around for a while there is a lot of good content.


Unfortunately, "food" is a poor example for explaining how Levenshtein Automata work in general because it has a double letter. This means that in one state, you have the "o" transition to more states than usual because it is both the next letter and the next letter after that. If you want to generalize his automaton, do the powerset construction for some other word like "news".


I use levenshtein heavily for one of my products http://www.getatomiq.com


Me too, and for a rather similar purpose, though for a different market. http://www.myhandin.com




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: