> It’s a bit complicated to set up […] it’s really fast
Sigh, just use Perl. Writing code with the general regex engine took me only one minute of effort, but it runs already nearly 500× faster than codeplea's optimised special purpose code.
Why 100000 loops and not 10 like in the original code? Otherwise Benchmark.pm will show "(warning: too few iterations for a reliable count)".
----
benchmark.php 100000 loops:
Loaded 3000 keywords to search on a text of 19377 characters.
Searching with aho corasick...
time: 329.3522541523
Is your solution broken for the cases where keywords are prefixes or suffixes of each other? This situation is very common in my use-case. Also, does your solution work if a keyword appears multiple times?
I get what you're saying, but it's not quite as easy as you imply.
Pulling in an entire programming language is a much bigger dependency and maintenance cost than spending a couple hours writing an algorithm. It would make more sense to just use a C extension.
Sigh, just use Perl. Writing code with the general regex engine took me only one minute of effort, but it runs already nearly 500× faster than codeplea's optimised special purpose code.
Why 100000 loops and not 10 like in the original code? Otherwise Benchmark.pm will show "(warning: too few iterations for a reliable count)".
----
benchmark.php 100000 loops:
----benchmark.pl 100000 loops:
----benchmark.pl (fill in the abbreviated ... parts from benchmark_setup.php):