Hacker News new | past | comments | ask | show | jobs | submit login

Go's implementation seems to mostly live in https://golang.org/src/strings/strings_amd64.go and https://github.com/golang/go/blob/master/src/runtime/asm_amd... (search 'indexbyte' and 'indexshortstr'). At least some of the stuff to use newer instruction sets was added by Ilya Tocar of Intel in https://go-review.googlesource.com/c/go/+/22337 . It looks like:

For short needle + haystack that fits in a vector reg, use SSE/AVX2 string search.

For short needle and long haystack, use SSE to find the first byte of the needle jumping a whole vector-register width at a time, then compare. If that produces too many false positives, fall back to SSE/AVX2 search for the whole needle.

For long needle, use a Rabin-Karp rolling hash to find a likely match, then verify.

The string search optimization rabbit hole can go pretty deep, especially because different implementations might win for different values passed to the function. Like, I'm guessing Go tries that search for just the first byte because, empirically, it helps pretty often, but it's not inherently faster for every possible needle and haystack (think needle ab haystack aaaaaaaaaaaaaa...ab), hence the escape hatch after too many false matches.

It's kind of wild what clever machinery gets invoked when you try to do simple things.




There's a pretty good discussion and benchmarks here: http://0x80.pl/articles/simd-strfind.html

and here: http://0x80.pl/articles/simd-friendly-karp-rabin.html


Whoa, that's by Wojciech Muła, and I've used his pydvi2svg project. Neat coincidence.

Only sort of related to these links, but one thing I know very little about is whether you can use tricks like these to speed up related search-y ops that aren't exactly strstr or strchr. For example, finding an occurrence of any of a set of strings, or chars needing escaping for JSON or HTML/XML output. Of course, makes sense that most of the work goes to the most used fundamental functions; I'm just wondering out loud.


His work on Base64 decoding might be a start. In many cases it's a matter of how small an FSM or transducer is.

For multiple string match I was looking at doing Aho-Corasick for long strings in a SIMD-friendly way, but it's more a matter of an FSM with relatively few branch points, so that SIMD can digest long label matches quickly just as it does with long strcmp's.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: