Can you elaborate as to why "his isn't how you'd do things in userspace, but, this isn't userspace so fine" holds?
Naive me - not a kernel dev at all - would argue that returning Result<Memory, AllocationError> is always better, even for userspace because it would allow me to additionally log something or gracefully deal with this.
Even if I don't want to deal with it, I could just `.unwrap()` or `.expect('my error message')` it.
Note: I am not trying to be snarky here, I genuinely don't know and would like to.
If answering this is too complex, maybe you can point me in the right direction so I can ask the right questions to find answers myself? Thanks in any case!
If you don't have any memory your allocations are all failing. When you assemble the log message, the allocation needed to do that fails. Bang, double fault.
Now, often people don't really mean they want allocations to be able to fail generally, they're just thinking about that code they wrote that reads an entire file into RAM. If it was a 100GB file that would be a bad idea. But the best answer is: Guard the allocation you're actually worried about, don't ladle this into the fast path everybody has to deal with on every allocation.
People say that "well if allocations fail all bets are off" but can't you pre-allocate memory for error handling?
Like sit down, figure out all the things you'll want to do on an allocation failure, and once you have determined that you slice a little chunk of memory when you start your app (and maybe _that_ fails and you can't do anything). and when you hit a failure you do your think, then tear stuff down.
It's what we used to do in the days when 4MB was a lot of memory. Batch programs would just abort but interactive programs had to have enough reserve to fail gracefully, possibly unwinding and releasing things until they could operate better.
Now that I see interactive programs taking a gigabyte and the system being ok, I guess we're in a different regime.
It never occurred to me (being in non-embedded land) that returning an enum as the error or a &'static str instead of a heap structure like String, could also fail.
Seeing that Result isn't part of core, but of std, this makes sense.
Just to tickle my nerve though: theoretically speaking, with your example, it would work, right?
I couldn't allocate 100GB (because OOM or not even enough RAM to begin with) but it could be that the system can allocate the needed memory for error message just fine.
Result is part of core [0]. Result data and/or errors can be stack-only data. The parent was just saying that many people that say they want to guard against out-of-memory issues aren't cognizant of just how difficult that is.
Add to that that several operating systems will lie about whether you're out of memory, so the 'error' or failure will often not be on the Result() value but come in a SIGKILL instead, it's just adding complexity.
People that are actually worried about it and no how to deal with it, will be coding with a different style and can use the alloc library where/when they need to. (at least when it gets stabilized in Rust)
Tialaramex answered this in their post already, and you almost answered the question yourself:
> I could just .unwrap() or .expect('my error message') it.
Panicking can allocate. Allocating can fail. Failing can panic. Panicking can allocate. Allocating can fail. You can bite yourself in the ass like a real Ourobouros.
IMO, a prerequisite to using fallible allocation APIs should be attempting to write your own allocator, handling the weird and wacky problem of initialising a data structure (for the heap) in such a way that if it fails, it fails without allocating but leaves some hint as to what went wrong.
Oh, wow, I was under the impression that the error message would be stack only, no heap involved, but as Result is part of the std library and not of core, this totally makes sense.
So for `Rust for Linux` they also need to implement a `Result-like` type that is stack only based to solve this issue, right?
If so, cool, thanks, you just made my day by tickling my learning nerves! :)
It has nothing to do with Result, whatsoever. Result does not allocate. If you used a Result that way, you could certainly try to "gracefully" handle the allocation failure, but if you think it would be easy, you would be wrong. As Tialaramex said, you are probably just going to make the problem worse because it is very difficult to ensure you do not attempt to allocate during allocation-failure-recovery. Rustc doesn't and can't really check this for you.
It actually has to do with `panic!(...)`. When you use `unwrap()`/`expect("...")`, you use the panic macro under the hood; parts of the panicking infrastructure use a boxed trait object which could contain a static string or formatted String or anything else really. The box can allocate if it is not a ZST. I believe the alloc crate's default handler tries to avoid this kind of thing, so that it can't fail to allocate AGAIN in the failure-handling routine. It will likely do a better job than you could.
This is a live issue at the moment, so to go into any more detail I'd have to read a bunch of recent Rust issues/PRs.
An addendum to tie this back to the original discussion: the reason kernel devs want these APIs more than userland is that (a) in a kernel, panicking = crashing the computer, which would be bad, and (b) they have a much bigger toolbox for handling OOM.
They can kill entire misbehaving processes. What are you going to do in your little program, clear a cache whose objects are sprinkled evenly across 150 different pages? You would need more control than you get from blindly using malloc/free/rust_alloc globally. Something like memcached would be able to use these APIs, because it uses its own allocator, and knows enough about its layout to predictably free entire pages at once.
Which you would define in the kernel. While I'm not going to speculate on exactly what the implementation would look like, you definitely do not need to "crash" the computer. I haven't done any kernel programming, but I'm guessing the kernel could do some things at that point with shared memory space that is already allocated to deal with this situation and try to recover in some way.
Mm no, it's pretty accurate. For a start, notice that the Linux community has been very clear that panicking is unacceptable. The reason is that they cannot realistically do anything to recover.
> panic handler [...] Which you would define in the kernel. While I'm not going to speculate on exactly what the implementation would look like, you definitely do not need to "crash" the computer.
The panic handler loses so much of the context that crashing the computer is the only thing you can practically achieve. You can't retry an operation generically from with a panic handler, it doesn't know anything about the operation you were attempting. The OOM handler gets a Layout struct only. You could try unwinding or something, but within a syscall handler, I don't see how anything good can come from that. Unwinding in the kernel is simply a terrible idea. What else are you going to do?
I disagree that PanicInfo loses so much context. PanicInfo caries an arbitrary payload of &(dyn Any + Send).
Now there is a lot that the allocator could do. If you wanted something to be retriable, it could be interesting if the thing that failed was an async task. If so, that panic info could carry enough information to say, the failure was an OOM, here’s the task that failed, and it is marked as retriable. Yes, this would require a store of tasks somewhere in the kernel. Then based on it being an OOM, see if any memory can be reallocated before retrying, or wait to retry until it is.
This is where theoretically a new async based Rust kernel, especially a micro-kernel, could be interesting. Is stack unwinding in the kernel a bad idea? Maybe. Can it be done in Linux? Maybe not, maybe it’s too much work to track all this information, but I disagree with the conviction with which you right it off.
AFAIK one other thing to note is that in Linux userspace, malloc (or fork) might succeed, but accessing the memory later can fail because of memory overcommit.