An addendum to tie this back to the original discussion: the reason kernel devs ...

bluejekyll · on Oct 25, 2021

> panicking = crashing the computer

That isn't very accurate. In Rust when programming in no_std, you can (must?) define your own panic handler:

https://doc.rust-lang.org/nomicon/panic-handler.html

Which you would define in the kernel. While I'm not going to speculate on exactly what the implementation would look like, you definitely do not need to "crash" the computer. I haven't done any kernel programming, but I'm guessing the kernel could do some things at that point with shared memory space that is already allocated to deal with this situation and try to recover in some way.

Edit: for example, I just found this in the kerla project: https://github.com/nuta/kerla/blob/88fd40823852a63bd639e602b...

That halts now, but it probably doesn't need to, or could do it conditionally based on the contents of PanicInfo.

cormacrelf · on Oct 26, 2021

Mm no, it's pretty accurate. For a start, notice that the Linux community has been very clear that panicking is unacceptable. The reason is that they cannot realistically do anything to recover.

> panic handler [...] Which you would define in the kernel. While I'm not going to speculate on exactly what the implementation would look like, you definitely do not need to "crash" the computer.

The panic handler loses so much of the context that crashing the computer is the only thing you can practically achieve. You can't retry an operation generically from with a panic handler, it doesn't know anything about the operation you were attempting. The OOM handler gets a Layout struct only. You could try unwinding or something, but within a syscall handler, I don't see how anything good can come from that. Unwinding in the kernel is simply a terrible idea. What else are you going to do?

bluejekyll · on Oct 26, 2021

I disagree that PanicInfo loses so much context. PanicInfo caries an arbitrary payload of &(dyn Any + Send).

Now there is a lot that the allocator could do. If you wanted something to be retriable, it could be interesting if the thing that failed was an async task. If so, that panic info could carry enough information to say, the failure was an OOM, here’s the task that failed, and it is marked as retriable. Yes, this would require a store of tasks somewhere in the kernel. Then based on it being an OOM, see if any memory can be reallocated before retrying, or wait to retry until it is.

This is where theoretically a new async based Rust kernel, especially a micro-kernel, could be interesting. Is stack unwinding in the kernel a bad idea? Maybe. Can it be done in Linux? Maybe not, maybe it’s too much work to track all this information, but I disagree with the conviction with which you right it off.