Yep, that's why I've build my own, the existing ones don't give out a list of the links or are super slow. A co-worker made the first one in Python but it was so slow that it took hours (6+ sometimes) to finish a site and I thought "you can do that faster".
Problematic are some sites that don't use <a href="asd.com"> tags because that's what my crawler is looking for.
C# & Elixier & Rust where the the other options I thought about and I want to build the same crawler on these languages (relative easy to do with ~300 LOC) to compare them for network / server / cli stuff but that has to wait till next year.
the biggest headache with the c# implementation was the threading. A lot of the out-of-the-box threading structures (pools, etc...) have limitations you might not think about checking for; e.g. you can't set the number of threads lower than the CPU count on the machine with some of the official .net threadPool helpers; you can try, but it will just silently ignore you.
There is some super useful stuff too though that made it easy to write a generic extensible crawler. My implementation ended up supporting separately compiled plugins you could just dump in a 'plugins\' directory, which responded to events and had full ability to manipulate the output pipeline. Do-able in lots of languages, but c# has some formalized helpers around it that make it super easy.
Problematic are some sites that don't use <a href="asd.com"> tags because that's what my crawler is looking for.
C# & Elixier & Rust where the the other options I thought about and I want to build the same crawler on these languages (relative easy to do with ~300 LOC) to compare them for network / server / cli stuff but that has to wait till next year.