Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If I understand correct, you're wondering how you might match "a bro do" with "a brown dog" or "a brown door", etc. Similar to what Google does. I've done this before. It's not a single data structure, but a process.

First, you normalize your strings to index (convert to lower case, throw out stop words such as "a", "the", "is", etc.) then you generate prefixes (or n-grams, I suppose). Say you want to do musicians and you have "Bob Dylan" as an entry. That would become "bob dylan" which would generate the prefixes "bo", "bob", "bob d", etc. You include spaces but once you hit a space, you start generating new prefixes. So you would also start doing "dy", "dyl", "dyla", and "dylan". The max length of the prefixes depends on how much memory you're willing to use.

Lets say Bob Dylan is mapped to some musician table in your DB and he has id 1000. I've done this with Redis, but you can use a trie as well. You'd store every prefix in the trie and the data on the node would be an array of musician ids. Now the magic: if you search "bo dyl" you would look up the prefix "bo", grab all the ids and then look up the prefix "dyl" and grab the ids (you would also try the full "bo dyl" first, but let's just assume nothing is there). Once you have those two arrays, you would take the intersection of them. Which, in our case, would leave an array containing id = 1000 along with any others that matched both.

That's, in general, how a simple autocomplete works. But you're better off using Redis or Elasticsearch if your data set is large. And if it's not, well... a simple hashmap is probably competitive and easier than tries.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: