This is one of those "let's ignore malloc failing" bugs. This is pretty common in C code and is, IMHO, a common "worse is better"[1] anti-pattern, which is predicated on the "worse is better" principle that correctness can be sacrified if it makes things too complicated. Many programmers, unfortunately, don't give a crap about things that far off the happy path, and when the language implements checked exceptions or Option types they get upset that it forces them to think about those distracting corner cases. Such is the state of software engineering.
Yes, as Hoare stated in 1981, regarding Algol design:
"The first principle was security: The principle that every syntactically incorrect program should be
rejected by the compiler and that every syntactically correct program should give a result or an
error message that was predictable and comprehensible in terms of the source language program
itself. Thus no core dumps should ever be necessary. It was logically impossible for any source
language program to cause the computer to run wild, either at compile time or at run time. A
consequence of this principle is that every occurrence of every subscript of every subscripted
variable was on every occasion checked at run time against both the upper and the lower declared
bounds of the array. Many years later we asked our customers whether they wished us to provide
an option to switch off these checks in the interests of efficiency on production runs.
Unanimously, they urged us not to - they already knew how frequently subscript errors occur on
production runs where failure to detect them could be disastrous. I note with fear and horror that
even in 1980, language designers and users have not learned this lesson. In any respectable branch
of engineering, failure to observe such elementary precautions would have long been against the
law."
"Many years later we asked our customers whether they wished us to provide an option to switch off these checks in the interests of efficiency on production runs. Unanimously, they urged us not to - they already knew how frequently subscript errors occur on production runs where failure to detect them could be disastrous. "
I've always liked that part. Once they saw the problems, they'd put up with whatever sacrifices it took to avoid them.
Except for C, Objective-C(++) and C++, all other programming languages, even systems programming ones do bounds checking.
And at least in C++, thanks to the STL or similar library there is the option to always enable bounds checking.
So even if the market didn't got hold of Algol, except for C and languages that offer copy-paste compatibility with it, all other ones do offer sane defaults in terms of security.
Ada, Eiffel, Java, C#, and Go are doing fine. All making money. :P Three even used in operating systems with two in embedded systems. So, can be fairly efficient, too.
Note: Not to mention Unisys still makes tons of money off their ALGOL machines from Burroughs. IBM & Fujitsu still service PL/1. So, old ones too.
The project I work on, Mesa3D, routinely ignores malloc failures for a couple of reasons: no one knows how to reasonably test the additional code, and often no one has any idea what sort of graceful failure is even possible when malloc fails.
We've had some cargo culted patches to call _mesa_error_no_memory() when malloc fails, but it was recently noted that this can't possibly work because it internally calls fprintf, and there's no guarantee that fprintf doesn't call malloc! In fact, glibc's implementation does.
The proper way to deal with a memory allocation failure, when a graceful return isn't possible, is to call abort() or similar, instead of trying to continue with a potentially inconsistent state.
Of course, it would be better to avoid getting into a situation where either a graceful return isn't possible, or poorly-tested recovery code has to be run to deal with the failure. I read somewhere that seL4's strategy is to divide the work in two phases. The first phase allocates all the resources it will need and checks all preconditions, but changes no state; the second phase has no allocations, does all the work, and will never fail. That way, any error recovery is a simple release of the resources it had already allocated.
Check out the concept of fault injection -- you can wrap all of the malloc() calls and the systematically cause each one to fail. You might want to combine that with valgrind to make sure you don't end up doing undefined things after a malloc failure.
You can also use that wrapping technique to test _mesa_error_no_memory(), by setting the state to "all mallocs fail" and then calling it. Hint: Don't call fprintf, don't use stdio, call write() directly.
Fault injection can't cover all of the combinations of all inputs and all places to fail, but it's a lot better than nothing.
If you don't have anything better to do, just abort().
One (naughty) option is to malloc a decent sized buffer at startup (50MB say), and then when malloc fails free it, and immediately display a warning box.
One most modern 64-bit systems, malloc will never fail anyway, you'll get killed due to memory overcommit.
The alternative, if you have no good way of testing the codepaths, or if you're running on a system with overcommitted VM, is basically segfaulting. And you're running on a system with overcommitted VM.
Overcommit is not actually relevant. An overcommit OOM-kill is a manifestation of the administrator killing the process for being a memory hog (it's just that the administrator has delegated this decision to some less-than-scrutable kernel code).
Library code should in general punt out-of-memory conditions back to the caller.
It is if you ever want to exercise the code paths. Overcommit means you never get an out of memory error. Your process just dies. Kind of like it would if you just aborted.
When you're running tests in development you can easily set up a test environment with and without overcommit enabled (and in fact there are other ways to limit allocated memory on a per-process basis that work even with overcommit, such as with rlimits).
...but that doesn't even really matter, because in order to fully exercise those code paths you need precise control over exactly which call to malloc() returns NULL, which in practice means you need to interpose malloc() with a debugging framework to force a NULL return.
(these kinds of) Crashes can be manipulated, though. If you can't restore a stable state, you have created instability. Failing gracefully in this context means failing without creating instability or insecurity.
The failure mode is the one where you terminate in the middle of an operation that was expected to be atomic or ordered and then the program restarts and does step two without finishing step one.
And operating system crashes, and being killed by the OOM handler (which is probably likely if you're out of heap), and all of the other things that can kill a process without warning.
Sadly it doesn't on linux due to overcommit. You can actually get the failure much later when you try to use the memory. The exception is when you try to allocate absolutely huge buffers. So malloc returning a valid pointer doesn't indicate success (though it returning NULL does indicate failure).
Yes, but it becomes harder to tell where in the code that happened. If you run abort() or the like, you've got more information as to the circumstances of the crash.
It is quite possible to write to files, stdout, etc without using any heap memory. It can be achieved with the bare bones read/write syscalls. I'm not sure I would consider this a "best practice", but it's worth noting. Boehm GC actively uses read when it needs to read files without allocating any heap memory.
> ... no one knows how to reasonably test the additional code ...
wouldn't a simple 'ulimit -m' set appropriate limits on memory for the process ? and then use that to simulate failures, and hopefully see what breaks, fix it, and rinse-lather-repeat ?
edit-1: this is ofcourse predicated on the fact that you are using a unix like system.
> the "worse is better" principle that correctness can be sacrified if it makes things too complicated
To be clear, in systems designed with the "worse is better" approach, responsibility for correctness can be pushed up the stack, from e.g. kernel programmers to user-space programmers, if doing the Right Thing lower in the stack is hard. The example given in the paper is a syscall being interrupted -- in unix, it just fails with EINTR, and it's the application author's responsibility to retry. (I agree that malloc returning null is similar.)
But worse is better isn't saying that programs shouldn't be correct if it's complicated; it's just a question of where the responsibility for correctly handling API edge cases should lie.
Moreover, if one reads the "worse is better" paper carefully, you'll notice he doesn't say that "worse is better" actually is better, overall. The argument is only that it has better "survival characteristics". Meaning, as I read it, that they can gain and maintain momentum against competitors because they can move faster.
Which, tangentially, is all very interesting to me, because the three modern systems with the best survival characteristics are unix, C, and the web, the latter of which is definitely not worse is better. (However bad you may think the web is as an API, "worse is better" is a design philosophy about giving users simple, low-level APIs, whereas the web is all about high-level APIs where the browser implements lots of logic to prevent you from shooting yourself in the foot.) So maybe we need to reconsider the conclusions of the worse is better paper in light of the giant counterexample that is the web.
I'd also add Java, C#, Python, ARM (especially ARM64) and Windows NT as examples of success of better-is-better systems. (Windows NT is somewhat less better-is-better though due to Win32 compatibility.)
Maybe Python and ARM. Java was a was Worse is Better compromise trying to bring in features from Right Thing languages. Took lots of marketing to make it. C# similarly done with adoption mainly because Microsoft pushed it w/ tooling. They're both Worse is Better candidates. Just not as bad as some others.
Modula-3 or maybe Eiffel would be closer to Right Thing examples in that category.
You say that, but then, there are examples of terrible security bugs resulting from broken attempts at handling malloc failures, too.
The right answer is, I think:
* Don't add checking code at individual malloc call-sites.
* If you have an allocation regime where you need to do something better than abort in response to failure, don't use malloc directly for those allocations.
* Run your program with malloc rigged to blow up if it fails.
Actually, I'd say that if you're trying to protect against lazy programmers, checked exceptions are worse than unchecked, since they'll just add an empty try/catch around the call, whereas crashing the process would usually be a safer option.
they'll just add an empty try/catch around the call
In an ideal world, someone who did that should be taken out back and shot. In this real world, someone who did that should be immediately walked out the door.
It's astonishing to me just how lazy programmers can be.
However, they're scheduled for release on Monday the 1st of March - so we might be seeing a few more bugs in the next couple of days.
EDIT: I've spent all morning teasing a friend whose birthday is today about "what if it were tomorrow" (I'm sure they get it every year from every single person they meet), only to then come here and forget all about it in my tally. Go figure. Yes, the 1st of March is Tuesday. Sorry!
That it has its own printf implementation, along with many other not-exactly-crypto-related utilities, is a feature creep that I always found distasteful about OpenSSL.
Am I the only one who doesn't understand why it's still considered "okay" to write C for security-dependent code these days? Processors have gotten order of magnitudes faster, speed is no longer the constraining resource, and certainly not one I would trade my security for...
Don't even need anything that drastic. Just switch the compiler to C++ mode and start using std::unique_ptr instead of malloc/free. Remember to turn off exceptions and RTTI to keep the spirit of C alive in your C++ code!
From reading the description, I get the feeling that this is nowhere near as severe as e.g. Heartbleed, right?
IMHO dynamic allocation is something that is often overused, and the fact that it could fail leads to another error path that might be exploitable, like this one. I don't think it's necessary for a printf-like function to ever use dynamic allocation. (If there are cases where it's necessary, I would love to know more... and perhaps suggest how it could be avoided.)
Another day, another memory violation... the endless supply of ammo dispenses another box of shells for Rust. I don't know if Rust is the future, but I am certain the dangling pointers, use after frees, buffer overflows, double frees, smashed stacks, ignored failures, uninitialized data and all the other horrors of contemporary C/C++ will one day be understood as intolerable.
These things can be reduced to nuisance level by discipline. The cost to obviating them by language design is not zero (code size; execution speed; random latencies) so its arguable, different languages, different purposes.
> These things can be reduced to nuisance level by discipline.
Obviously not. We've been doing this for decades and are still making the same mistakes with costly effects. Humans are as much a part of technological systems as the language or platform; continuing to ignore that even educated, disciplined, skilled developers are prone to certain classes of errors -- errors that could be obviated by other components of the system -- is hubris. It is much more feasible to create tools that exclude certain classes of error than to create less fallible developers.
> (code size; execution speed; random latencies)
Rust is not GC'ed, has no inherent reason to be slower than C (as the toolchain matures), and I doubt is more verbose than C code that provides the same safety.
> A power saw is dangerous; but I'm not going back thanks.
And now we have power saws that immediately stop when they encounter flesh. Obviously, disciplined users never had to worry about that system, otherwise we would have had a bunch of people missing fingers... like my wood shop teacher in high school. :/
> And nothing to say about the performance cost of these infallible tools? They are not free.
People have brought this up a few times, but I've never gotten an explanation that makes sense. My understanding is that Rust does most of its protections in the compile stage, so while not free, they are a small up-front cost at the compile stage, plus an indeterminate cost at the development stage in possible more complex reasoning about the code, which then pays off over the lifetime of the application. While not free, that's definitely a trade-off that seems to make sense to me.
Or are you talking about some other cost? Can you explain where I'm not understanding your point?
"only works half the time" is a bit of a stretch. what's your OS kernel written in?
I agree we could do better than C. Everyone here talks about it every time this happens, but does nothing. I'll take battle-hardened C over written-last-week Rust, too. Especially crypto code.
My teams have written hundreds of thousands of lines of C++, with only occasional (once a year?) trouble with reuse-after-delete or double-delete. And their are tools to deal with that.
We've enjoyed orders of magnitude performance and latency improvement over scripted or GC'd languages.
Its certainly possible to write competent C++ with a professional team, and clear reasons for doing so.
C++ does provide the necessary features to write safe code, as long as one stays away from "C with C++ compiler" mentality and adopts those language and library features.
As for the tools, they are largely ignored by the majority of enterprise developers, assuming Herb Sutter's presentation at CppCon 2015 is in any way representative (1% of the audience said they were using them).
Which goes in line with my (now) outdated C++ professional experience up to 2006.
Only one company cared to pay for Insure++ (in 2000) and I was probably the only one that cared to have it installed.
Also except for my time at CERN, I never met many top elite C++ developers.
Most were using it as "C with C++ compiler" and in teams with high attrition, leading to situations where no one was 100% sure what the entire codebase was doing.
Aren't Cedar and Oberon are the only languages in that list that are memory-safe (assuming no use of unsafe code)? They both achieved that distinction by adding a garbage collector, so you could generalize and say that nobody has learned about writing secure code since Lisp in 1958.
Also, I'm pretty sure that Oberon was quite a bit earlier than that.
To be fair, the needed feature to mitigate this specific vulnerability is "throws exception on malloc failure", which at least Ada does (Storage_Error).
Isn't it in practice a "this code wouldn't exist in the first place" situation? (I mean in context of Rust and many others) Either you'd have to go into unsafe{} block, of the buffer would simply be a Vec<>, which oom()'s on allocation issue.
I mean theoretically the issue still exists, because you can run allocation yourself and get back null. But without unsafe this shouldn't be possible.
That's what I meant - if you go through actual allocators, then of course such bug can exist. As you say <<the needed feature to mitigate this specific vulnerability is "throws exception on malloc failure">>. But it seems to me like any language providing resizable buffers with bound checking in standard library is unlikely to get code like this in the first place. And that's a massive scope reduction.
The openSSL project committed the fix and description to their public repo 3 days ago (no clue if intentionally or not, but they didn't remove them). Not much to hide anymore.
1. https://en.m.wikipedia.org/wiki/Worse_is_better