How can any performance gain match the benefits of running on a tuned SMP kernel...

baruch · on Oct 1, 2017

You either run it on a single CPU or the "os" layer needs to provide for thread migration. It also obviously needs to provide thread scheduling.

fiokoden · on Oct 1, 2017

Which is to say, DIY symmetric multiprocessing, or BYO kernel code in-application.

I'd have to read a compelling technical explanation before believing this could perform better than a Linux or BSD kernel.

In most cases, the Go code is going to be single CPU, and that ain't the way the world works anymore. There's going to be a bunch of wasted computing power on that VM

shub · on Oct 2, 2017

Where people seem to really care about perf is networking. What that means in non-crazy land is you cut out the kernel from that part, do networking in userspace, and pin a thread to a CPU to poll for packets. Like it's all MMIO and DMA anyway so ring 0 doesn't have to get involved if it doesn't want to. Then for other stuff you have convenient slow OS facilities. Including threads, processes, and disk I/O! Luxurious.

So it's not just that this has to perform better than a kernel, it has to perform better than not involving the kernel in the critical path, and it has to perform so much better that it's worth wanting to beat yourself in the face with a hammer after trying to diagnose the latest lockup. Unikernels are interesting to me, but like the demoscene or a semi tractor doing wheelies. Not for production usage.

farazbabar · on Oct 1, 2017

You can bind processing for a consistent subset of requests to an individual cpu core - essentially sharding requests across cpu cores and benefitting from very high l1 and l2 cache utilization. The idea is to treat the system with multiple cores as bunch of single cpu nodes connected via bus instead of network and without the unnecessary overhead of thread and process related context swtiching.

fiokoden · on Oct 1, 2017

And now you're implementing SMP in application. Zero chance of doing that as well as the Linux or bsd kernel.

yazaddaruvala · on Oct 1, 2017

Why? You're assuming there is no sharable code.

Instead of sharing the code at runtime, i.e. what an OS does. You could easily share code at compile time, i.e. statically link a library.

Because of sharable code, "implementing * in application" should always be at-least as performant as the best generic implementation (i.e. the implementation you find in a general purpose OS). However, when appropriate, customizing the implementation for the application would allow it to become even more performant.