It’s not an apples to apples comparison unfortunately. He’s using custom assembly for the direct call benchmark but C code for the indirect benchmark.
The C code contains no optimization annotations either, the compiler could be inlining the indirect benchmark and/or devirtualizing the indirect call itself.
The benchmark quantifiably shows that direct calls are faster than indirect calls, which was your original question. Could a hypothetical C compiler transform an indirect call to a shared library into a direct call? Maybe, but that is different from your original question as to the performance of direct vs. indirect calls on modern x86 architectures which this benchmark shows is not the same.
Unless he uses the same custom assembly except with an indirect call, it’s not a good comparison. We can’t be sure the increase in runtime is due to the indirect call.
All the details are right in the article as well as a link to a git repo so I'm not sure what there is to speculate about. If you have an issue with the actual benchmark you can certainly point it out, but otherwise you're basically asking us to restate the contents of the article when the article does a much better job of explaining these details.
Hmm I’m not sure you’re responding to what I’m saying. The C code is not an apples to apples comparison with the custom assembly when comparing the speed of an indirect call to a direct call. Do you deny that?
Yes I do deny that, especially since the article literally addresses this issue explicitly and takes care to avoid that, along with a git repo that you can use to verify this for yourself. If you have a specific criticism to make then you should go ahead and point it out in a non-vague manner instead of speculating.
The C code contains no optimization annotations either, the compiler could be inlining the indirect benchmark and/or devirtualizing the indirect call itself.