Dealing with Out-of-Memory Conditions in Rust

lmilcin · on Aug 14, 2021

My experience with out of memory is that in every single language and environment I worked on in the past, once an application hits that condition, there is very little hope of continuing reliably.

So if your aim is to build a reliable system, it is much easier to plan to never get there in the first place.

Alternatively, make your application restart after OOM.

I would actually prefer the application to just stop immediately without unwinding anything. It makes it much clearer as to what possible states the application could have gotten itself to.

Hopefully you already have designed the application to move atomically between known states and have mechanism to handle any operation getting interrupted.

If you did it right, handling OOM by having the application drop dead should be viewed as just exercising these mechanisms.

zbentley · on Aug 14, 2021

You can increase the probability of being able to survive an OOM condition long enough to emit diagnostic info in most memory-managed languages (java, python, etc.) fairly easily. Allocate (and write data to) some global fixed size chunk of ballast memory at program startup, and whenever an allocator failure is detected immediately free that block and emit whatever diagnostics you need. simcop2387 pointed out that Perl even supports this as an interpreter setting; in other languages I've done it manually.

This is more useful than OOM'd programs silently disappearing, but isn't foolproof (that's why I said "increase the probability" rather than "guarantee"): if the OOM killer gets invoked before your telemetry, if you've forked, or if whatever your language does in order to even detect an allocator error is itself memory-costly (looking at you, Python--why on earth would you allocate in order to construct a MemoryError object?!), you will still go down hard.

Other than going back in time and reversing Linux's original sin (allocation success is a lie), swap is the "solution" to these situations, but for many people that cure is worse than the disease.

I often wish there was a generally-available way to map memory to files (rather than mmaping files to memory) selectively in my programs. For cases like this--handling OOM conditions and doing cleanup/reporting before turning out the lights, when the OOM condition was caused either by code in my program outside of my control or other programs on the system outside of my control--having a "all allocations after this instant should occur in a file on disk, not memory" bit to flip would be nice. I don't need my cleanup to be fast; if this is happening I'm already on a sad enough path that I'm happy to trade shutdown/crash performance for fidelity of diagnostic data and increased probability of successful orderly cleanup.

Of course, not being an OS developer, I assume this is impossible for reasons outside of my understanding; the few complicating factors I can think of (copy-on-write pages, for example) are hard enough, and I'm sure there are others.

jbverschoor · on Aug 14, 2021

Yeah please don’t.. memory isn’t cheap when everybody does this.

This behavior is the reason Java had such a bad time with sysops.

Just make use of proper libraries that will invalidate caches if needed, or simply refute to spawn new threads/actors/processes.

I’d rather have some application halt/pause or simply crash if it ran out of memory.

zbentley · on Aug 14, 2021

Strongly disagree. First of all, we're talking about miniscule (usually Kb or less) amounts of memory here. Secondly, I'm not talking about over-allocating so programs can run in perpetuity; rather, I'm talking about keeping some small ballast around so cleanup and reporting can happen right before my program crashes/halts (think: the kind of code you'd put in atexit hooks). That's a better experience for operators and product maintainers than "your app disappeared without a trace because the OOM killer came for it" or "your app disappeared without a trace because its ENOMEM behavior was abort()".

This is absolutely not why Java had/has a bad rap memory wise--that has more to do with a combination of code that allocates irresponsibly on the happy (not memory-error-anticipating) path, and the JVM's preference to preallocate as much as possible for every purpose, not just OOM handling.

> Just make use of proper libraries that will invalidate caches if needed

What if I've dumped every cache I can and I'm still getting allocation (or spawn, or whatever) errors because the system is out of memory? It's useful to have a contingency in place to turn out the lights room-by-room rather than cutting power to the whole building, as it were.

jbverschoor · on Aug 14, 2021

Misunderstood.. didn’t see the part that it’s just for handling it. I thought you were talking about just allocation a huge chinch of mem

zbentley · on Aug 15, 2021

Nope, just crash ballast. I understand what you're condemning though: it's super regrettable when ordinary utilities, daemons, or desktop apps preallocate like like they're database servers on dedicated hardware.

pornel · on Aug 14, 2021

I have the same experience in C applications, but not in Rust. Rust's problems are IMHO purely self-imposed from unnecessarily nihilistic assumptions.

• There's no problem of "untested error paths". Rust has automatic Drop. Cleanup on function exit is written by the compiler, and thus quite dependable. Drop is regularly exercised on normal function exits too.

• Overwhelming majority of Drop implementations only free data, and don't need to allocate anything. Rust is explicit about allocations, so it's quite feasible to avoid them where necessary (e.g. if you use a fat error type that collects backtraces, that can bite on OOM. But you can use enums for errors, and they are plain integers).

• Probability of hitting OOM is linearly proportional to allocation size, so your program is most likely to hit OOM when allocating its largest buffers. This leaves a lot of RAM left to recover from that. It's basically impossible to exhaust memory up to the last byte.

Crashing may be fine for tiny embedded software or small single-threaded utilities. However, crashing of servers is expensive, and may not even work. When you crash, you kill all the requests that were in progress. This wastes work already done by other threads, and you're not making progress. When clients retry, you're likely to get the same set of requests that lead to OOM in the first place, so you may end up crashing and restarting in a loop forever. OTOH if you detect OOM and reject one offending request, you can keep making progress with other smaller requests.

lmilcin · on Aug 14, 2021

Compiler can only do limited number of things and I am not talking about it.

Think in terms of trying to save the file you are working on before the application exits. Or notify cluster map that the node is not going to be available to process requests.

How does compiler factor in it?

pornel · on Aug 15, 2021

You don't exit. You work with fallible functions, and keep trying to handle errors for as long as possible.

Each function can fail, and when it fails, you propagate the error up to a point where the whole failing task can be gracefully cancelled.

e.g. if user invokes a "Print" command, and it runs out of memory, then instead of immediately crashing and burning, you can try to report "Sorry, print failed". That too will be handled fallibly, so if `message("Sorry…")?` fails, then you proceed to plan B, which may be log, then save and quit. If these fail, then finally crash and burn. But chances are that maybe print preview needed to allocate lots of memory, and other functions don't need as much, so your program will survive just by aborting that one operation.

simcop2387 · on Aug 14, 2021

Build perl with -DUSEMYMALLOC and -DPERL_EMERGENCY_SBRK, then you can preallocate a buffer by doing $^M="0" x 65536; then you can trap the out of memory condition with the normal facilities in language and handle it appropriately (mostly letting the big data get deallocated, or exiting). Then you can continue on just like normal. It's a weird setup and I don't think I've run into any other language with that built in.

zbentley · on Aug 14, 2021

Useful, but on Linux it's highly likely that by the time you're comparing $@ to ENOMEM the OOM killer has already awoken and is heading your way.

Xylakant · on Aug 14, 2021

The interesting thing is that the OOM killer doesn’t always go for the program that triggered the OOM. It may also decide to kill another memory-hungry process (cough database) on the machine unless you explicitly tweaked it.

klyrs · on Aug 14, 2021

If a program allocates all available memory, and systemd then hits OOM on a 1kB allocation, do you think we should kill systemd?

Xylakant · on Aug 14, 2021

I’ve had t he OOM killer kill my work queue because someone wrote a shell script that leaked small amounts of memory. The entire purpose of that machine was to run the queue, so most of the memory was allocated to that process - making it the prime target for the OOM killer. Yes, please, kill everything else on that machine, before touching that process. (Indeed, the OOM killer has a setting for that)

klyrs · on Aug 14, 2021

Sounds bad, but it doesn't justify the proposed heuristic to kill every new process that fails to alloc without looking at the whale that's responsible for the OOM condition. Actually, it sounds like you didn't properly configure that machine.

You might be happy to learn that the OOM killer can be configured[1] to specifically protect certain processes. If the entire point of a machine is to run a single process, then you should definitely use that feature.

[1] https://lwn.net/Articles/317814/

zbentley · on Aug 14, 2021

I mean, the OOM killer's heuristics are byzantine to be sure. However, if your program is not likely to be the "true" culprit of memory exhaustion, there are better tools at your disposal than ballast pages--cgroups and tunable oomkillers like earlyoom (https://github.com/rfjakob/earlyoom).

On the other hand, if you are likely to be identified as the culprit, I think the best you can hope for is getting some cleanup/reporting in before you're kill-9'd.

jamesmishra · on Aug 14, 2021

I think you've identified the primary exception to the parent comment's rule. Obviously we would not want to kill PID 1, no matter the reason for the OOM.

jbverschoor · on Aug 14, 2021

The next thing OSes will do is use memory compression to go against this behavior.

Behold, CS is as bad as governments with all their tax rules

sharkbot · on Aug 14, 2021

In this case (cybersecurity application), there is a complicating factor: you cannot trust that the environment is trustworthy. The out of memory condition may have been caused by malware to disable/degrade the antivirus sensor. For example, the malware allocates just enough memory that further OS allocations fail, but it has enough memory to accomplish its task then terminates to allow normal system processing to resume.

Under that scenario, the antivirus sensor should be able to take some action (log, likely) that a malloc failed, and possibly even try to recover memory and identify the risk.

But beyond that narrow use case, you’re right.

amelius · on Aug 14, 2021

That's a design philosophy called "crash-only software".

https://en.wikipedia.org/wiki/Crash-only_software

lmilcin · on Aug 14, 2021

I did it at least once for a credit card terminal.

The application saved state to internal database, and even if you cycled power it would just come back to the same screen and same state it was before power cycle.

This wasn't to deal with memory problems (in fact, it had no dynamic memory allocation at all) but rather to deal with some crappy external devices and users that would frequently power cycle things if it didn't progress for more than couple of seconds.

IshKebab · on Aug 14, 2021

So now if your code gets into a broken state users have no way to fix it. Awesome.

lmilcin · on Aug 14, 2021

Don't make it possible to get your system into broken state?

Why everybody assumes every developer doesn't know what they are doing just because most of them don't?

notriddle · on Aug 15, 2021

Because most developers also think they’re one of the rare exceptions.

IshKebab · on Aug 15, 2021

Why do so many developers assume that they are the perfect ones who never make mistakes? They think that their programs have no bugs despite the overwhelming evidence that all programs have bugs. But no. Not theirs. Literally all other programmers are idiots and only they write perfect code.

SigmundA · on Aug 14, 2021

I am not sure how you can make a user friendly application like that.

How about an image viewer that tries to open too large an image, should it just crash when it OOMs? I would much prefer an error dialog and the program continues to run.

ectopod · on Aug 14, 2021

For big infrequent allocations (like your example of loading a huge image) it is easy to use a non-default allocator that returns an error rather than aborting.

As the article notes, this is all moot if you are running on Linux though. The allocation will always succeed but if it's too big the kernel will start killing processes, quite possibly including your image viewer.

alerighi · on Aug 14, 2021

In Linux allocations can fail in some situation, for example in case it's set a resource limit on memory, because it was set a memory rlimit for the process, or the cgroup that contains the process runs out of memory (so if you run the program in a container with constrained memory).

beached_whale · on Aug 14, 2021

oom handling on linux is a system option and there are also ulimits

tooltower · on Aug 14, 2021

You can look at the SQLite codebase for good examples of this. Essentially, you will have to introduce a fixed upper bound for the file size you are willing to handle, given your minimum system requirements. This way, you can actually test that your program gracefully handles OOM conditions.

This page explains more on this: https://sqlite.org/limits.html

SigmundA · on Aug 14, 2021

Fixed upper limits are helpful but do not prevent OOM. Perhaps the system has very little free memory due to other programs or limited hardware or other reasons but your program is intended to load large files on systems that have the memory so the upper limits are pretty high.

It's takes very little memory to show the user an error message it will almost always succeed even if the operation that triggered failed due to OOM.

tooltower · on Aug 14, 2021

Of course, it's not a guarantee. But if your program is under heavy pressure from other programs, it won't continue to run in any reasonable sense either, even without Linux's OOM-killer. Yes, you can show an error message, but then the only safe thing to do is quit. You won't even be able to clean up any on-disk swap files properly, let alone handle user interactions.

Handling this is hard in desktop applications. For servers, you can have known workloads with better limits on each process.

kragen · on Aug 14, 2021

That is what Linux's OOM killer does, yes. In Linux you can handle the problem by spawning off a subprocess and watching to see why it dies. Not an acceptable result for your antilock braking system firmware.

lmilcin · on Aug 14, 2021

Users are aware things stop working when memory is exhausted.

As a user I am ok with this. I don't let my memory end and would prefer that the application worked reliably while there is still memory.

What I don't like is applications trying to pretend nothing happened but then doing some strange shit.

I have seen, for example, applications using 100% cpu after they hit OOM. Yes they survived. No, it is not user friendly.

SigmundA · on Aug 14, 2021

Why would users be aware what happened? Did you tell them?

This is a very reliable and easy for the user and other developers to understand.

  try
  {
     image = LoadImage(path);
  }
  catch(OutOfMemoryException e)
  {
     Msgbox("Cannot load image it is too large for available memory");
  }

That is way more friendly than a program crash and allow the user to try again perhaps with a smaller version of the image because they accidentally picked the high res version or something.

Either way you may get 100% CPU as the program crashes and memory gets reclaimed, or the program continues to run and the garbage collector reclaims.

XorNot · on Aug 14, 2021

Although you've encoded the whole problem right there: do you even have enough memory to setup a MsgBox widget or whatever that is?

SigmundA · on Aug 14, 2021

In a GC'd language which is what I am used to, the GC will pause execution and reclaim on the next allocate if there isn't any available memory cleaning up whatever was allocated on the attempted image load.

After that if there is still not enough memory for the simple message box then you have an uncaught exception and the program crashes and you're just back to the no catch approach, nothing lost. Most likely though there will be enough memory for the message and you have a much more friendly result.

I have used this approach before and it is way way more friendly then a crash with users potentially losing work because they accidentally picked the wrong file.

lmilcin · on Aug 14, 2021

> After that if there is still not enough memory for the simple message box (...)

That's exactly the point. Once memory is exhausted, you can't take any action reliably.

You don't build reliable applications by making mechanisms like: "ok, if memory ends lets design it to show a box to the user, we have 50% chance this succeeds".

It would be better to wrap your application in a script that detects when the application quit and only then shows a message to the user.

People designing things like you drive me crazy. They come up with a huge number of contingencies that just don't work in practice when push comes to shove.

Stuck in a loop trying to show a widget, using 100% of CPU and preventing me to do any action?

I prefer simpler mechanisms that work reliably.

SigmundA · on Aug 14, 2021

>That's exactly the point. Once memory is exhausted, you can't take any action reliably.

Nothing is 100% reliable, thats not realistic, and in this kind of situation its is not a 50/50 shot, its more like 10000/1 that you will be able to show a message, that should be obvious.

>It would be better to wrap your application in a script that detects when the application quit and only then shows a message to the user.

Thats simpler then wrapping the specific function in a "script" that shows message to the user but allows the program to continue to function in almost all situations?

>People designing things like you drive me crazy. They come up with a huge number of contingencies that just don't work in practice when push comes to shove.

People like you who do not value you the user experience over pure code drive me crazy. These things actually do work in practice, I guarantee you any sufficiently complex GUI program will have code like this to try and gracefully handle as many contingencies as possible before simply crashing. Do you think your browser should just crash losing other tabs if a web site loads too much data? Does Photoshop just crash if it runs out of memory on an operation losing all your work?

>Stuck in a loop trying to show a widget, using 100% of CPU and preventing me to do any action?

Its no more stuck than your process crashing and the kernel is reclaiming memory, they both take similar cpu time, one however results in a message that informs you of what happens and leaves you with a running program, the other tells you nothing and your program is now gone.

lmilcin · on Aug 14, 2021

> Nothing is 100% reliable, thats not realistic,

My comments were directed at people who are interested in building reliable systems.

If you straight assume it is not possible to build reliable systems, you are missing a lot.

SigmundA · on Aug 15, 2021

A program that can handle an OOM and continue to function normally is more reliable to the user than one that crashes and must be restarted potentially losing work.

Memory allocation can fail, network connections are not reliable , opening a file may fail, writing a file may fail and so on. If your program simply crashed because some operation failed it would only be reliable at crashing.

ElectricalUnion · on Aug 15, 2021

> continue to function normally

By definition, you are not functioning normally if you're OOM.

SigmundA · on Aug 15, 2021

If you try an operation that causes an OOM by say allocating a large amount of memory as the example given above, then that memory is freed you are now functioning normally again.

If you write a large file to disk filling it to capacity, failing then delete the file, now you have free space again and everything is normal.

kasabali · on Aug 16, 2021

> If you write a large file to disk filling it to capacity, failing then delete the file, now you have free space again and everything is normal.

You've never used btrfs, have you? :D

dnautics · on Aug 14, 2021

All critical paths, should be allocated on application start, and you never have to worry about that.

lmilcin · on Aug 14, 2021

This is the kind of advice that I put in a category of "easier said than done".

The trouble is that you may not know beforehand what exactly is going to be needed. And you might need maybe a library call and the library does dynamic allocation in it and you either have no idea about it (until you find out the hard way) or no way to help it.

So in the end maybe you can take some extremely simple action like writing something to log or show a widget, but that's about it.

dnautics · on Aug 14, 2021

Just code as normal and when you run into a problem, reassign the default allocator for the struct/object? Not possible for many PLs, hard for some but basically trivial (1 LOC per struct/object) for others.

Some PLs come with test suites that give you a failing allocator, so you can even easily test to make sure this looping condition resolves sanely.

dnautics · on Aug 14, 2021

Preallocate that? If it's modal, you should never have more than one, and the allocator associated with the msgbox should know to use that scratch space instead of fetching new memory.

This is nontrivial in several PLs.

grishka · on Aug 14, 2021

It depends on the application. On Android, for example, I don't want my app to crash with an OOM while trying to load an image. I want to instead clear my in-memory cache and try again with more memory available. So I wrapped the code that loads images into a try-catch that catches OutOfMemoryError and it worked wonderfully.

derefr · on Aug 16, 2021

The ability to "handle" OOM in userspace, exists for a particular class of software — software that is usually configured to 1. use unbounded and unpredictable amounts of memory, where 2. requests are entirely-sandboxed in their resource usage, but within the software itself, rather than at the OS level.

Basically there's only one kind of software in this class: DBMS software. DBMSes want to be able to try to process ridiculous queries if users ask them to; and then fail in a way that only affects the processing of that query, rather than the stability of the DBMS as a whole. And they also mostly can't afford the overhead of pre-calculating just how ridiculous a query will be, before attempting it; because that calculation often requires effectively 90% of the work involved in actually running the query.

For every other type of software, letting the OS handle the OOM (by killing your process) — and setting up your higher-level inter-process / inter-node architecture to be resilient to that — is the sensible approach.

mlindner · on Aug 16, 2021

Why are you ignoring the obvious examples of operating systems? An Operating System doesn't and shouldn't crash if you run out of memory.

Additionally I have personal experience with network appliances (firewalls/deep packet inspection/etc). If such a device runs out of memory, it degrades gracefully (unless there's a bug) and starts shedding connections rapidly to avoid running out of memory. Such devices can't just restart if they run out of memory as that would be a network outage and can often be exploited to cause a more serious denial of service attack.

Rust is a system's language. Handing out of memory conditions is par for the course for systems programming.

monocasa · on Aug 14, 2021

FWIW, this is a little outdated now. If you're willing to bite off nightly, there's cfg(no_global_oom_handling) which will just remove access to the aborting allocation API. It came out of the Linux kernel work and their very similar concerns IIRC.

tialaramex · on Aug 14, 2021

If you cfg(no_global_oom_handling) lots of nice things go away. Which is both appropriate - those nice things did in fact depend on global OOM handling - and hopefully likely to keep more people from erroneously believing they can't afford global OOM handling.

As an example, ordinarily on Rust this does what it looks like it does:

  let mut lyrics = "Never was".to_owned();
  lyrics += " a cornflake girl";

With cfg(no_global_oom_handling) this can't work, most importantly because the second line is an arbitrary concatenation and therefore allocates, but there is no possible way to signal if it fails.

AlexanderDhoore · on Aug 14, 2021

As an embedded C guy this makes me so happy. Yes, that second line allocates and the fact that it is hidden makes me nervous. I want a programming environment that exposes all of the complexity and makes me deal with the consequences.

tialaramex · on Aug 14, 2021

Linus feels similarly, which is why this feature was mandatory to land Rust-for-Linux and why C++ in Linux was never likely to go anywhere.

umanwizard · on Aug 15, 2021

I’ve been doing Rust full-time for two years and I had no idea you could do this. I’d just have used `push_str` in any case.

Giving this up doesn’t seem very impactful and I’d be happy to do so to enable actually useful features.

tialaramex · on Aug 15, 2021

You can't push_str() under cfg(no_global_oom_handling) either

I mean, they did warn you: No Global OOM Handling. So, if this mustn't fail, and it might not succeed, therefore it has to be eliminated, a String under cfg(no_global_oom_handling) can't push_str() and it can't reserve() and it can't do a lot of things. Their definitions are conditionally removed from the type.

I didn't check, it's possible "something".to_owned() is similarly forbidden, seems like that would allocate memory.

Basically if you live in a world where allocating memory for strings, vectors, and other growable types feels extravagant, cfg(no_global_oom_handling) is for you, and if not then maybe you should re-evaluate why you are worried about allocation failures when you are wasting precious heap memory on such data structures.

umanwizard · on Aug 16, 2021

> You can't push_str() under cfg(no_global_oom_handling) either

Thanks for the clarification. I had totally missed the point.

Waterluvian · on Aug 14, 2021

Newbie question:

How hard would it be for a text editor to highlight all lines that do or might allocate? Can they be known statically?

wizzwizz4 · on Aug 14, 2021

rust-analyzer could probably do it. That information isn't exposed in type signatures, though, so it'd have to have full access to the source of everything and a good understanding of the standard library (and would fail if you added your own “might allocate” by going around the back of the global allocator).

IshKebab · on Aug 14, 2021

You could definitely show all the lines that might allocate statically.

1f60c · on Aug 14, 2021

I'm pretty sure catching this in all cases would involve solving the Halting Problem.

oconnor663 · on Aug 14, 2021

If you're allowed to have false positives, it's not so hard. Like "this function allocates if A or B is true" just becomes "this function can allocate". It's not so different from finding functions that can panic.

Xylakant · on Aug 14, 2021

I’m fairly certain that this does not require solving the halting problem. Fundamentally, every line of code can be transformed to the underlying code constructs, and those could be checked for allocations. Doing this in an efficient manner is probably hard, but certainly not impossible

umanwizard · on Aug 15, 2021

    fn maybe_alloc() -> Vec<usize> {
        if collatz_conjecture_is_true() {
            vec![42]
        } else {
            vec![]
        }
    }

    fn main() {
        let v = maybe_alloc(); // Does this allocate?
    }

wtetzner · on Aug 15, 2021

You would just assume it does. Really you want to detect all places that might allocate.

umanwizard · on Aug 15, 2021

You are right. It’s definitely possible if you accept false positives.

Waterluvian · on Aug 14, 2021

I think if we wanted a concrete answer you’d be right.

But I feel like a static analyzer could clearly identify “this line can allocate” just by knowing what language features can allocate. And for third party library methods “does this function have anything inside of it that could allocate?”

wtetzner · on Aug 15, 2021

You can absolutely statically determine where allocations happen.

You can't know for sure if all of those places are reachable in a running program, but that's not necessary to solve this problem.

marcinjachymiak · on Aug 14, 2021

It's good to see that OOM errors can be handled in Rust now, at least in unstable.

Zig has a similar approach that is pretty cool. I don't know of any other language that let's you handle it like this https://ziglang.org/documentation/0.8.0/#Heap-Allocation-Fai...

dralley · on Aug 15, 2021

The zig approach seems really well suited to low-level libraries where you really need to handle these errors (and custom allocators, etc.) but it's kind of annoying for general purpose programs.

tialaramex · on Aug 14, 2021

"where the C++ pieces can all recover from OOM and the Rust component cannot"

I would like to read about their successful test of their C++ pieces using the same (random fail small allocations) strategy. My impression from Herb Sutter is that on the popular C++ Standard Library implementations (for MSVC, Clang, GCC) this actually doesn't work, but of course Herb has a reason to say that - so it'd be interesting to hear from somebody who has a reason to believe the opposite.

oconnor663 · on Aug 14, 2021

I think I know the talk you're referring to. This is a great one: https://youtu.be/ARYP83yNAWk?t=56m56s

the_mitsuhiko · on Aug 14, 2021

Handling allocation failures on the STL is also only possible with exceptions.

superjan · on Aug 14, 2021

Why not budget how much memory your rust component needs, and allocate a pool ahead of time? This is how it’s sometimes done in embedded systems. It depends on what your rust code does of course, but if it’s runtime memory use depends on untrusted data it is handling that is a concern in itself.

develatio · on Aug 14, 2021

I'm not sure if this can apply to the most commonly used software. A browser can't know in advance how much memory it would need. Same thing applies for audio, image or video editing software. Probably office suites too.

This might be feasible with some limited-in-features apps, like an audio/video player, an IM client, etc...

superjan · on Aug 14, 2021

It won’t. I was thinking more about backend components where you allocate a fixed per request memory budget. It would work best if you could apply the budget per in-flight request. In case a request goes over budget, only the affected (presumably nefarious) request gets terminated, other requests and other processes will run uninterrupted.

1f60c · on Aug 14, 2021

...and missiles: https://devblogs.microsoft.com/oldnewthing/20180228-00/?p=98...

SiebenHeaven · on Aug 14, 2021

Most commonly used software such as browsers can abort on OOM, this isn't about those cases at all. In conditions that you do not want to abort, usually you can and would want to allocate pool before hand.

remram · on Aug 14, 2021

All this is moot for me since I only use systems with overcommit enabled.

Is there a way to disable it for a single program or cgroup, to enable it to deal with out-of-memory conditions? Maybe changing/hooking the standard library?

mlockall(2) seems overkill, since it will also force all code and mapped files to be resident.

pornel · on Aug 14, 2021

It's bonkers to allow programs to destabilize the whole machine to the point that the kernel has to start killing off processes just to survive.

I use https://lib.rs/cap which self-imposes memory limit on the process. Setting that limit below cgroup limit allows programs to actually handle OOM before they get killed by the OS (if only Rust's libstd wasn't so eager to self-abort anyway).

creata · on Aug 15, 2021

> I use https://lib.rs/cap which self-imposes memory limit on the process.

How do you choose that limit?

pornel · on Aug 15, 2021

In case of servers/vm/containers you probably know how much RAM you've provisioned, so set it to that minus whatever other processes need to live.

For desktop applications it's tough. It may be just an arbitrarily high amount you don't expect to hit during normal operation. If you need to work with variable-size data, then it could be `size_of_file_being_opened * x` if you can predict the `x`.

wyldfire · on Aug 14, 2021

You can probably use a pool allocator that won't go back to the OS for more when it's exhausted.

I wouldn't be surprised if the popular allocators - jemalloc/tcmalloc/scudo or even the defaults in glibc/musl support a setting like this.

remram · on Aug 14, 2021

I suppose I could use a custom allocator that calls mlock() on each page. I will look whether the ones you mention have Rust integrations and whether they expose a setting like that.

eqvinox · on Aug 14, 2021

I understood wyldfire's post to mean counting & imposing your own artificial memory limit, no mlock involved. But I wouldn't know how to pick that limit :/.

remram · on Aug 15, 2021

Oh I see. Well I'm not trying to add a limit to my RSS, I just want to be notified when allocation fail in my process on my machine where overcommit is enabled.

Maybe what I'm asking for makes no sense though, because even if my process handles out of memory errors gracefully, it might still get OOM-killed when another process allocates some more.

wyldfire · on Aug 14, 2021

Yes that is what I meant. And it would avoid the latency hit of a system call to satisfy an allocation. However doing an mlock would avoid the latency hit of paging, so that's good too.

eqvinox · on Aug 14, 2021

Indeed it's pointless for desktop/server OSes.

But at the same time it's critical for kernel and embedded (µC, the "too small for Linux" kind) code.

Narishma · on Aug 14, 2021

> Indeed it's pointless for desktop/server OSes.

Isn't it only Linux that does overcommit by default?

hyperman1 · on Aug 14, 2021

Can't you use ulimit for this? Finding out the exact flag combination might be gnarly

remram · on Aug 14, 2021

I don't think so. You can set a resident size limit, but you can't force all anonymous memory to be resident (or non-overcommitted).

hyperman1 · on Aug 14, 2021

I was thinking about -v, maximum virtual memory size

https://ss64.com/bash/ulimit.html

eqvinox · on Aug 14, 2021

In a way, it's counterproductive to limit VSZ. It's perfectly fine to mmap() gigabytes of files to virtual addresses, but you neither expect nor want to have that counted against your "memory usage" limit.

The entire problem is, in a way, unsolveable with current OS APIs. AFAIK there is no preexisting, good, actually usable, and universal memory usage meter. Some things work for a lot of cases, but I don't think there's anything that would work universally.

Coincidentally, in my eyes the best way to handle memory pressure in applications would be proactively rather than reactively. Sometimes you can unload more things, like pages in a document that aren't being edited right now. And if you need to fail, you can fail on good boundaries (e.g. refusing entire requests in server-like things) rather than in the middle of something that you need tons of work to unwind correctly.

(Slightly related: pool memory allocators, e.g. APR's https://apr.apache.org/docs/apr/trunk/group__apr__pools.html )

vzaliva · on Aug 14, 2021

Is it the same issue Linus was pointing out when discussing possible adoption of Rust for Linux kernel?

loeg · on Aug 14, 2021

Yes. Kernel mode code needs to gracefully handle OOM conditions.