> Those are things that you can do just as well in C. [..] But the benefit is not "speed", but "speed plus security".
But nobody said otherwise? I don't understand your point. Speed + safety is indeed precisely the point. I would implore you to do your own comparative analysis by looking at the types of bugs reported for these search tools. (I can't do this for you. If I could, I would.)
> What makes ripgrep fast (AFAIK) is mainly using mmap() instead of open()/read() to read files,
I think you're confused. This is what the author of the silver searcher has claimed for a long time, but with ripgrep, it's actually precisely the opposite. When searching a large directory of files, memory mapping them has so much overhead that reading files into intermediate fixed size buffers is actually faster. Memory maps can occasionally be faster, but only when the size of the file is large enough to overcome the overhead. A code repository has many many small files, so memory maps do worse. (N.B. My context here is Linux. This doesn't necessarily apply to other operating systems.)
> and relying on Rust's regex library that compiles regexes to DFAs which can run in linear time.
There's no confusion that such things can't be done in C. GNU grep also uses a lazy DFA, for example, and is written in C.
The "linear time" aspect doesn't show up too often, and none of my benchmarks[1] actually exploit that.
There's a lot more to the story of how ripgrep beats the silver searcher. "About as fast" is fairly accurate in many cases, but to stop there would be pretty sad because you'd miss out on other cool things like:
- SIMD for multiple pattern matching
- Heuristics for improving usage of memchr
- Parallel directory iterator (all safe Rust code)
- Fast multiple glob matching
In many cases, this can make a big difference. Try searching the MySQL server repository, for example, and you'll find that the silver searcher isn't "about as fast" as ripgrep. (Hint: Take a peek at its .gitignore file.[2] This has nothing to do with memory maps, SIMD or linear time regex engines.)
And yes, I could have done all of this in C. But it's likely I would have given up long before I finished.
But nobody said otherwise? I don't understand your point. Speed + safety is indeed precisely the point. I would implore you to do your own comparative analysis by looking at the types of bugs reported for these search tools. (I can't do this for you. If I could, I would.)
> What makes ripgrep fast (AFAIK) is mainly using mmap() instead of open()/read() to read files,
I think you're confused. This is what the author of the silver searcher has claimed for a long time, but with ripgrep, it's actually precisely the opposite. When searching a large directory of files, memory mapping them has so much overhead that reading files into intermediate fixed size buffers is actually faster. Memory maps can occasionally be faster, but only when the size of the file is large enough to overcome the overhead. A code repository has many many small files, so memory maps do worse. (N.B. My context here is Linux. This doesn't necessarily apply to other operating systems.)
> and relying on Rust's regex library that compiles regexes to DFAs which can run in linear time.
There's no confusion that such things can't be done in C. GNU grep also uses a lazy DFA, for example, and is written in C.
The "linear time" aspect doesn't show up too often, and none of my benchmarks[1] actually exploit that.
There's a lot more to the story of how ripgrep beats the silver searcher. "About as fast" is fairly accurate in many cases, but to stop there would be pretty sad because you'd miss out on other cool things like:
In many cases, this can make a big difference. Try searching the MySQL server repository, for example, and you'll find that the silver searcher isn't "about as fast" as ripgrep. (Hint: Take a peek at its .gitignore file.[2] This has nothing to do with memory maps, SIMD or linear time regex engines.)And yes, I could have done all of this in C. But it's likely I would have given up long before I finished.
[1] - http://blog.burntsushi.net/ripgrep/
[2] - https://github.com/mysql/mysql-server/blob/5.7/.gitignore