Author here. This post was from a few years ago but from memory I controlled for that. So the problem wasn't FFI.
I loaded the whole history into wasm before I started timing, then processed it in an inner loop that was written in rust, running in the wasm context itself. There were only 2 calls to wasm or something. The 4x slowdown wasn't FFI. The algorithmic code itself was actually running 4x slower.
It'd be interesting to rerun the benchmark now. I assume compilers are better at emitting wasm, and wasm runtimes have gotten faster. I'm sure I've still got the benchmarking code around somewhere.
I loaded the whole history into wasm before I started timing, then processed it in an inner loop that was written in rust, running in the wasm context itself. There were only 2 calls to wasm or something. The 4x slowdown wasn't FFI. The algorithmic code itself was actually running 4x slower.
It'd be interesting to rerun the benchmark now. I assume compilers are better at emitting wasm, and wasm runtimes have gotten faster. I'm sure I've still got the benchmarking code around somewhere.