and you have that `count` function skip words with len < 2 then I can match results with the other 3 (Nim, Rust, C++) versions. Also, your C version runs in just 7 ms on Tale Of Two Cities, about half the time of the PGO optimized Nim.
Cool. You could probably simultaneously speed it up and simplify it by changing from linked list collision to open addressing with linear probes (but you may need a larger TSZ). Besides being more cache friendly for larger files, at the end you can then just qsort your table instead of first building a flat copy. :-) You could also use the unreduced hash value as a comparison prefix (only calling strcmp on different values). But it's not bad for potato code. ;-)
I didn't do open addressing because I didn't want to write code to resize the whole table. :)
I thought about also keeping the top 10 as I go instead of copying the whole table. But I'm guessing that virtually all the time this program spends is in I/O.