With that in mind, it'd be handy to know which exploit techniques these steps break, and whether those steps are in the current "meta" game for exploit developers.
(The specific mitigation here: the kernel formerly locked system call invocation down to the libc.so area of program text in memory; libc.so is big, so now OpenBSD locks specific system calls down to their specified libc stubs; further, in static binaries, the same mechanism locks programs down to only those system calls used in the binary, which effectively disables all the system calls not explicitly invoked by the program text of a static binary).
Indeed, in CCC's "systematic evaluation of OpenBSD's mitigations"[0] the presenter explicitly calls out OpenBSD's tendency to present mitigations without specific examples of CVEs it defeats or exploit techniques the mitigations are known to defend against:
> Proper mitigations I think stem from proper design and threat modeling. Strong, reality-based statements like "this kills these vulnerabilities," or "this kills this CVE; it delays production of an exploit by one week." And also thorough testing by seasoned exploit writers. Anything else is relying on pure luck, superstition, and wishful thinking.
Some of OpenBSD's mitigations are excellent and robust in defensiveness; others are amorphous and not particularly useful.
> Proper mitigations I think stem from proper design and threat modeling. Strong, reality-based statements like "this kills these vulnerabilities," or "this kills this CVE; it delays production of an exploit by one week." And also thorough testing by seasoned exploit writers. Anything else is relying on pure luck, superstition, and wishful thinking.
The comment seems to imply that "proper design and threat modeling" must stem from real-world CVE-s and proofs of concept. That seems to me like "if nobody heard it, the tree didn't fall" kind of thinking.
I'm sure OpenBSD developers have very good intuition on what could be used in a vulnerability, without having to write one themselves. And fortunately, they don't have a manager above them to whom they need to justify their billing hours.
>I'm sure OpenBSD developers have very good intuition on what could be used in a vulnerability, without having to write one themselves
Why? On average programmers are not very good security engineers. And the opposite - security engineers are often not a good programmers. If your mitigation doesn't stop any CVE that's being exploited right now in the wild, it's an academic exercise and not particularly useful IMO.
>And fortunately, they don't have a manager above them to whom they need to justify their billing hours.
The point of the thread is that the mitigation cost right now may be low (the "billing hours"), but it's paid in perpetuity by everyone else downstream - in complexity, performance, unexpected bugs, etc. So having a manager or BDFL to evaluate the tradeoffs may be beneficial.
> If your mitigation doesn't stop any CVE that's being exploited right now in the wild, it's an academic exercise and not particularly useful IMO.
If your only metric of security is "fixed CVEs", then you're rewarding mistakes that were rectified later, and punishing proactive approach to security that actually makes fewer CVEs appear in the first place.
And Theo's reputation and influence on the security is evidence that what he does is more than just "academic exercise".
E.g. he created OpenSSH.
> The point of the thread is that the mitigation cost right now may be low (the "billing hours"), but it's paid in perpetuity by everyone else downstream - in complexity, performance, unexpected bugs, etc.
While that may or may not be the pattern in general, it is not a rule, and especially doesn't apply in OpenBSD development. OpenBSD is widely regarded as one of the cleanest and most robust (free software) codebases ever.
You're mischaracterizing their logic. They're saying it's a necessary but not sufficient metric. You can't then shoot it down for being not-sufficient; we all agree about that.
It's not my recollection that Theo created OpenSSH, for what it's worth. My memory of this is that it was mostly Niels and Markus who did the lifting.
You might do some digging on Theo's reputation among exploit developers. It's complicated.
> They're saying it's a necessary but not sufficient metric.
Okay, then I'm saying it shouldn't be necessary either, for the sole reason that preventing a future CVE is not measurable, while fixing a CVE is. If you so much as pay attention to fixing existing real-world CVEs, you're implicitly focusing on that measurement, as you cannot predict the future. I argue that we would be better off not paying attention to them at all.
If anything, we should take the wide array of CVEs that were discovered in other systems and not applicable to OpenBSD as evidence that their intuition and proactive approach works well. The only real metric of a security of a system is the absolute number of CVEs in a long period of time, in which OpenBSD shines.
>> I'm sure OpenBSD developers have very good intuition on what could be used in a vulnerability, without having to write one themselves
> Why?
Exactly, POCOGTFO! :)
But wouldn't providing such a proof-of-concept implementation immediately render a bull's eye on all pre -current (and/or not appropriately syspatched) boxes in the wild?
They famously do not. That's OK, it's a trait shared by a lot of hardening developers on other platforms, too --- all of them are better at this than I'll ever be. But the gulf of practical know-how between OS developers and exploit developers has been for something like 2 decades now a continuing source of comedy. Search Twitter for "trapsled", or "RETGUARD", for instance.
> But the gulf of practical know-how between OS developers and exploit developers has been for something like 2 decades now a continuing source of comedy
Are you implying that OS developers are 2 decades behind exploit developers? If so, is there any proof of that claim, e.g. OpenBSD exploits?
Or are you implying that OS developers are 2 decades ahead of exploit developers? If so, how is that a bad thing?
Neither, I'm saying that for the past 2 decades, the conventional wisdom in the space has been that OS hardening efforts were some significant quantum of time behind exploit developers, but certainly not "2 decades" worth.
It's an aggregate sentiment, right? There are some mitigations that I think legitimately did set back exploit development, but on the whole I think the sentiment has been that OS hardening mitigations have been not just reactive, but reactive to exploit development that is some significant quantum of time behind the current state of the art.
By way of example, I think people made fun of the original OpenBSD system call mitigation stuff described at the beginning of this post. I have no idea what the consensus would be on this new iteration of the idea.
I'll bet the NSA is very happy about this situation and is doing everything they can to keep the gravy train rolling.
I thought the entire point of being a good security person was that you're able to anticipate and defend against attacks before they become known... Isn't that what "security mindset" is supposed to entail?
NSA doesn't care about your emailed vulnerability report. They're not spending their own money when they buy zero-day bug chains in platforms people actually use, and even if they were, those bug chains are so ludicrously cheap relative to their utility that any sigint (or law enforcement, for that matter) organization in the world, from Canada to El Salvador, can cheerfully afford them.
Even if your emailed report was a complete bug chain and not, like, an X-Frame-Options redressing issue, it would be harder, and probably more expensive, for NSA to pick the bug up from email than it would be for them to simply fill out a purchase order from one of their private partners.
As always it is helpful to remember as well that NSA's mission is to secure budget for NSA, full stop.
>As always it is helpful to remember as well that NSA's mission is to secure budget for NSA, full stop.
Sure, let's focus on an intelligence agency with budget constraints, Russia's GRU perhaps.
You claim that bug chains are "ludicrously cheap". Is cheap the same thing as abundant? If you had to guess, how many distinct zero-click exploit chains do does the GRU have for e.g. an iPhone in lockdown mode? Order of magnitude: do they have 1? 10? 100? 1000?
Zerodium pays up to 2M for "Full Chain with Persistence" for iOS: https://www.zerodium.com/program.html I don't think a low price relative to utility lets us conclude that such exploits are abundant. There's asymmetrical information in this market: buyers don't know the quality/novelty of what sellers have discovered, and sellers don't know how badly buyers need what they have to sell. It seems plausible to me that a savvy seller could negotiate a significantly higher price, similar to how tech workers are often able to negotiate significantly higher compensation -- especially if they were somehow able to prove that they weren't just replicating an exploit the broker already had in their inventory. I also suspect there is significant buying power on the buyer side which keeps acquisition prices low (hard to play buyers against each other, given low number of buyers who coordinate with each other).
In any case, I think this is the wrong question in a certain sense. The right question is about the relative cost of buying exploits vs developing in-house. I don't see why picking up the bug from email is hard or expensive. If the GRU is already running a program like XKEYSCORE, which seems likely, it could just be a matter of adding a few filtering rules for emails that go to select security@ email addresses. Have a GRU engineer monitor those emails, and see if any proof-of-concept work in the email can be quickly integrated into existing malware, in order to attack a target considered too low-value for the GRU's crown jewel exploits.
The real question is about the salary of that GRU engineer vs the cost of purchasing exploits. If the GRU engineer gets paid $100K, and a fresh exploit costs $500K, employing the GRU engineer to harvest a few temporary, expendable exploits a year looks quite favorable. I don't think the price/utility ratio of exploits from brokers affects the decision, since that price/utility ratio argument also works for exploits harvested+developed in-house.
Neither of us really knows what's going on in intelligence agencies, but my story seems about as plausible as yours. Given that simply using a Google Form for bug disclosures would be an easy and dramatic improvement on the status quo, I'm left with the sense that there is a lot of dysfunctional cargo-culting going on in the security world.
OpenBSD doesn't even have hyperthreading? Why does anyone use this OS? The Linux developers put in a lot of effort to make hyperthreading actually work for their kernel rather than ignoring it.
There have been cases where OpenBSD's hypothetical mitigations have worked out well for the project. I recall a relatively recent DNS cache poisoning attack that OpenBSD was novel in pre-emptively mitigating because something (I think it was the port?) was "needlessly" random.
If a mitigation has negligible performance impact, and doesn't introduce a new attack vector, I can't imagine why it would be seen as a bad thing.
That's from four years ago and does not address these technical issues. Are you going to pull it out every time OpenBSD is mentioned? I think people understand that you don't like their approach, etc., and the flaws you see, and that OpenBSD isn't designed for your interests.
Is there a current meta for OpenBSD exploit developers?
What's the right way to go about hardening the system if there's no meta to observe?
My very naive take would be something like: A successful exploit depends on jumping through a number of different hoops. Each of those hoops has an estimated success probability associated with it. We can multiply all the individual probabilities together to get an estimated probability of successful exploit -- assuming that hoop probabilities are independent, which seems reasonable? The most efficient way to harden against exploits is to try and shrink whichever hoop possesses the greatest partial derivative of overall exploit success probability with respect to developer time.
The meta doesn’t exist because nobody targets OpenBSD because it’s not used. People’s analysis of it is mostly just their educated guess as to how work for other platforms would carry over.
> The most efficient way to harden against exploits is to try and shrink whichever hoop possesses the greatest partial derivative of overall exploit success probability with respect to developer time.
Depending on your definition of efficient, adding more hoops should work exponentially better.
Suppose your hoop probabilities are 25% and that you have two hoops so that the probability of jumping through both is
25% * 25% = 6.25%.
You can reduce the size of one of the hoops in half, changing the probability to
25% * 25%/2 = 3.125%
You can also add a third hoop, in which case the probability is
25% * 25% * 25% = 1.5625%
1.5625% < 3.125%, so adding a third hoop is better than shrinking one of the two existing hoops. Of course, this argument makes important assumptions about the hoop probabilities.
The probabilities aren't independent. The person jumping through the first hoop is probably more able than average. Therefore, any additional hoop - if it doesn't require a completely orthogonal skill - is less selective.
I think it depends on what the "probability" is meant to indicate. You're correct if it's meant to indicate whether a particular attacker can get through a particular hoop. But probabilities could also refer to e.g. the chance that it's possible to get through a particular hoop, period. Or the fraction of some input space which corresponds to an exploitation.
Makes sense. Other key questions would be: complexity cost of added hoop (including, possibly, increased attack surface -- the sequence of hoops is just an abstraction that reality may not obey) and also creation difficulty (it could be that improving an existing hoop is significantly quicker than creating a new one).
Without a pre-formed opinion: does anybody have an intuition for the security benefits this provides? My first thought is that it’s primarily mitigating cases of attacker-introduced shellcode, which should already be pretty well covered by techniques like W^X. Code reuse techniques (ROP, JOP, etc.) aren’t impacted, right?
I would also think this would cause problems for JITed code, although maybe syscalls in JITed code aren’t common enough for this to be an issue (or the JIT gets around it by calling a syscall thunk, similar to how Go handled OpenBSD’s earlier syscall changes).
Unless I'm mistaken, this should restrict what you can do with ROP gadgets that contain syscalls. You will only be able to use the gadget with its intended arguments, since other syscall types will be disallowed.
> I would also think this would cause problems for JITed code
They can probably just jump into precompiled code that performs the needed syscall. Also, making syscalls directly from something like JITed JavaScript is generally avoided anyways. AFAIK browsers don't even let the processes that run JavaScript touch much of the system at all, instead they have to use an IPC mechanism to ask a slightly more privileged process to perform specific tasks.
> You will only be able to use the gadget with its intended arguments, since other syscall types will be disallowed.
That makes sense, although "intended" arguments here means still being able to invoke `execve(2)`, etc., right? The gadget will still be able to mangle whatever it likes into the arguments for that syscall; it just won't be able to mangle a `wait(2)` into an `execve(2)`, I think.
The other comment on this thread mentions that it also does something else:
>disables all the system calls not explicitly invoked by the program text of a static binary
This means that if the original library didn't have an execve call in it, you would'nt be able to use it even if with ROP. In short, this seems useful to block attackers from using syscalls that were not originally used by the program and nothing else. It can be useful.
Sure, assuming your programs don't execute other programs. I don't know much about OpenBSD specifically, but spawning all over the place is the "norm" in terms of "Unix philosophy" program design.
(I agree with the point in the adjacent thread: it's hard to know what to make of security mitigations that aren't accompanied by a threat model and attacker profile!)
> assuming your programs don't execute other programs.
What about language runtimes? They don't execute other programs in the sense of ELF executables (although the programs they interpret might), but they have to support every syscall that's included in the language. So, for example, the Python interpreter would have to include the appropriate code for every syscall that Python byte code could call (in addition to whatever internal syscalls are used by the interpreter itself). That would be a pretty complete set of syscalls.
Yep, language runtimes are an (inevitably?) large attack surface. My understanding is that OpenBSD userspace processes can voluntarily limit their own syscall behavior with pledge[1], so a Python program (or the interpreter itself) could limit the scope of a particular process. But I have no idea how common that is.
The syscall goes in a register but it does not have to appear literally right next to the `syscall` instruction in the binary. As TFA explains in the introduction, a syscall stub generally looks like
mov eax,0x5
syscall
However it doesn’t have to, `syscall` will work as long as `eax` is set no matter where it’s set, or where it’s set from. You could load it from an array or a computation for all `syscall` cares.
So as an attacker if you can get eax to a value you control (and probably a few other registries) then jump to the `syscall` instruction directly you have arbitrary syscall capabilities.
The point of this change is that the loader now records exact syscall stubs as “address X performs syscall S”, then on context switch the kernel validates if the syscall being performed matches what was recorded by the loader, and if not it aborts (I assume I didn’t actually check).
This means as long as your go binary uses a normal syscall stub it’ll be recognised by the loader and whitelisted, but if say a JIT constructs syscalls dynamically (instead of bouncing through libc or whatever) that will be rejected because the loader won’t have that (address, number) recorded.
One thing to note is that system calls can no longer be made from the program's .text section; only from within libc. This is highly important because of ASLR: in order to ROP into a syscall, an attacker must now know where libc is located in the virtual address space. Before this mitigation, an attacker that only knew the address of the program binary could search for a sequence of bytes within the .text section that happened to decode to a syscall instruction, and use that for ROP (code reuse techniques can often access a lot of unexpected instructions by jumping into the middle of a multibyte instruction, due to x86's complex and variable-length instruction encoding).
ld.so can do so while initially linking the application in pre main, then in
> 4) in libc.so text, and ld.so tells the kernel where that is
> The first 3 were cases were configured entirely by the kernel, the 4th case used a new msyscall(2) system call to identify the correct libc.so region.
ld.so passes its ability to make syscalls to libc.so. The application has to call into libc.so in order to perform any IO.
Ah, makes sense, thanks. Libc can sanitize all the inputs, and as long as ld.so has a hardwired path to libc all is well. This way you don’t even need a facility to tell the kernel “this binary is allowed to make system calls.
Have you managed to trigger this? You never ended up explaining how the heap overflow occurs, and I cannot determine whether the other person who was guessing how it might happen is right, because I am not very familiar with OpenBSD's code.
One of the more under-appreciated features of Rust is that it traps on integer overflow / underflow by default.
It’s too bad OpenBSD doesn’t have a good Rust story for the core system. (I understand their reasoning, but it’s still too bad.)
I wonder how hard it would be to backport the integer behavior to C. (C++ would be easy: Use templates.) Perhaps they could add a compiler directive that causes vanilla integer overflow/wraparound to trap, then add annotations or a special library call that supports modulo arithmetic as expected.
Look, I get as much as the next guy that Rust brings a lot of niceties. But what we are talking about here is a project that clocks in at just over 19,000,000 lines of C (`wc -l $(find src -name '*.c' -or -name '*.h')`), code heritage going back more than 40 years, an explicit commitment to a very rich set of platforms [1], and a very limited amount of manpower compared to projects such as Linux with their insane level of corporate backing. "Just rewrite it in and/or integrate Rust" is neither easy nor safe in that it overturns everything that is already there and tested.
As for improving upon C. I can not speak for OpenBSD as a project, but I am sure that there would be ample excitement to produce a minimal, solid C compiler with experimental security features to then serve as the default compiler for the project (heck, OpenBSD already ships with a number of less common security-related compiler flags from what I recall). Sadly, I doubt there is either funding or the hands to make that a reality.
Implying that OpenBSD has not considered Rust? There is plenty of discussion on misc@ already (for example [1]) and while it certainly can be "ranty", I am sure if one reads it in good faith you get a fairly nuanced picture.
Look, I think a lot of programmers and managers fail to understand the number of factors one needs to consider and how it scales with the complexity of your codebase and what it interacts with. If you want to rewrite your video processing service which you wrote together with five or so contributors and is say 20,000 lines of C++ into Rust, that is one thing. Take a step back and consider the number of users and what you interact with, it all seems rather manageable and you can probably be backwards compatible with respect to your users. In terms of time, maybe a few months? Maybe even six? For a single programmer that now needs to learn proper Rust.
Now, instead consider what an operating system is and the surface with which it interacts. The absolutely metric ton of hardware, the heap of standards, the massive load of hacks that are documented and undocumented, the large amount of users, the number of contributors and their experience, the platforms that you support, all the software that is written for your operating system to make it useful, etc. OpenBSD is famous (infamous?) for being willing to break things to do "the right thing". But they are also famous (infamous, again?) for being very conservative, which I think is understandable given their security focus and (relatively) low amount of manpower. Rust has certainly been considered, but it is far from the only consideration. This is akin to walking into a multi-million dollar company that has a fully functioning service or piece of software that they have been selling for decades and suggest to management that maybe we should start moving it all from C# to Rust next week, while being blissfully unaware of everything else the company is beholden to. Spoiler, it will not work out that way.
Furthermore, it is somewhat tiring that Rust keeps being touted as the final revelation when it comes to writing safer code. Guess what, there were plenty of projects before Rust that introduced safety in various forms (and at various costs) and there will be plenty of projects after Rust that will do the same. Yes, it is an amazing piece of technology, but it is equally plausible that its influence may not end up in it eating the world; rather give birth to something else or bring some of its thinking into other languages. Only time will tell and worse may end up being better yet again.
So to me, what any operating system (security focused or not) should do is to consider their (limited) options given their own goals and situation. Which is something I have seen plenty of evidence that a project of the age of OpenBSD is doing and successfully so.
Personally, I am keeping an eye on Redox and look forward to see what lessons will be learnt from their take of what an operating system can be. But for my servers and desktop I have and continue (for now?) to run a healthy mix of Linux and BSD while getting work done.
> I am sure if one reads it in good faith you get a fairly nuanced picture.
I don't see a lot of nuance. The primary argument in that thread boils down to any programmer who isn't using C isn't a serious programmer. The dominant technical analyses are a) adding more compiler toolchains make builds take a lot longer and b) memory safety doesn't protect against everything, so why bother? That's not nuance.
Nuance would be pointing out that Rust provides benefits over just memory safety: that Rust tends to push you towards a parse-don't-validate memory, or maybe Rust having a greater emphasis on error checking correctly [1]. But then again, Rust isn't necessary to do that, so you could also analyze how existing software development practices are sufficient to bring those into play.
You can also discuss Rust's failures. Rust, after all, didn't get uninitialized memory right. There's also some uneasiness about the details of the borrow checker to point to, or you can throw some remarks about the &mut-is-noalias and the difficulty it takes to actually ensure that you never create two &mut to the same location in unsafe code.
Nuance might also discuss how safety features do and don't percolate in mixed-programming-language environments, or how a rewrite may or may not improve security. There's definitely items on both sides of the cost-benefit ledger there!
But that's not what we got. The thread starts with an (incorrect [2]) gatekeeping moment of "it's not serious if you don't rewrite coreutils in it," and mostly devolves into a general theme of "anyone who needs their programming language to provide safety wheels is a terrible programmer who shouldn't be allowed anywhere near systems programming". The irony of that viewpoint coming from an OS well known for its love of just-in-case security precautions is not lost on me.
[1] I was recently writing something parsing diffs, and as part of that, I was making sure that the arithmetic on line numbers didn't overflow. Integer overflow (whether signed or unsigned) is frequently ignored as an error source in most programming languages!
[2] At the time the post was written, people were in fact working on doing a rewrite of this stuff in Rust.
Thank you, that is an interesting response to read, but what I was trying to communicate with that sentence was that if you read that thread and others in good faith you can get a fairly nuanced picture and not that it is one in and of itself. I am certainly not implying that you get a nuanced picture from simply reading a mailing list which I described as being littered with rants in the very same sentence that you quoted parts of.
As for the linked thread itself, I just ignored the "Real Programmers Don't Use Pascal" parts (it gets old really fast) and what I arrived it is largely reflected in what I already wrote in the parent and ancestors.
> You can also discuss Rust's failures. Rust, after all, didn't get uninitialized memory right.
I think that's a success story because while Rust 1.0 shipped with the hopelessly broken std::mem::unitialized, Rust 1.36 shipped with std::mem::MaybeUninit that fixed the situation. (it's true that they can't definitely remove the broken API, but it can be aggressively linted against nonetheless)
That is, it's a flaw present in the first version of the language but a success of the development process and language evolution, without breaking backwards compatibility
IMHO the biggest blocker for Rust inside OpenBSD is that the project already has killed C developers and moving (some stuff) to Rust will need time and effort that could be put into other problems.
If they had to stabilize some parts, probably the answer would be different.
> IMHO the biggest blocker for Rust inside OpenBSD is that the project already has killed C developers and moving (some stuff) to Rust will need time and effort that could be put into other problems.
I'm all for dropping C in favor of Rust for new projects, but killing C developers is going a bit too far, don't you think?
There are compiler flags that OpenBSD could be using but aren't, which would have caught this bug without needing to convert the codebase to Rust. Using -Wconversion would have warned on the mismatched signedness of the MAX macro argument (the unsigned integer, sysno) and its result (being assigned to npins, a signed integer). Alternatively, adding -fsanitize=implicit-integer-sign-change, or a UBSan flag that includes this, would detect this at runtime for the actual range of values that end up causing a change of sign.
Though, these would also be triggered by statements like:
pins[SYS_kbind] = -1;
Due to the pins array being of unsigned int, so all this sort of code would need to be fixed too.
In this case more important than any runtime overflow/underflow checks is the fact that the compiler will check that comparison operands are of the same type instead of inserting an implicit cast. Instead the programmer is forced to insert an explicit conversion like .try_into().unwrap(), which clearly suggests the possibility of an error. And if the error isn't handled, it will panic.
Then report the bug like an adult instead of acting like you found some huge published vulnerability in code that was posted for peer review. Thats why the code is there, so others can identify issues; congrats, you did.
I don't know how they define `MAX`, but I'm guessing it's a typical "a>b?a:b". In function `elf_read_pintable` the `npins` is defined as signed int and `sysno` as unsigned int.
So this comparison will be unsigned and will allow to set `npins` to any value, even negative:
npins = MAX(npins, syscalls[i].sysno)
Then `SYS_kbind` seems to be a signed int. So this comparison will be signed and "fix" the negative `npins` to `SYS_kbind`:
npins = MAX(npins, SYS_kbind)
And finally the `sysno` index might be out of bounds here:
pins[syscalls[i].sysno] = syscalls[i].offset
But maybe I'm completely wrong, I'm not interested in researching it too much.
I believe your whole analysis is correct, that running an elf file with an openbsd.syscalls entry with .sysno > INT_MAX will allow an out-of-bounds write.
Pure decimal integer literals (like 86) are typed as "int" in C, rather than being typeless and triggering type inference. This is a pain when you accidentally write something like this:
uint64_t n = 1 << 32;
On modern desktop platforms, an int is 32 bits, so 1 << 32 is 0, not 2^32, even though a 64-bit integer is wide enough to support that.
Regardless, it's not relevant here, because if an integer and an unsigned integer of the same size are compared the integer is implicitly cast to unsigned integer, and 86 is fine for both signed and unsigned integers (so "MAX(npins, SYS_kbind)" is safe).
> Then `SYS_kbind` seems to be a signed int. So this comparison will be signed and "fix" the negative `npins` to `SYS_kbind`:
npins = MAX(npins, SYS_kbind)
No, the comparison is unsigned here. They're integers of the same "conversion rank", so the unsigned type wins and the signed integer is interpreted as unsigned.
This isn't correct, as npins is a signed int and SYS_kbind is just a macro for the integer 86 (as defined in sys/syscall.h). So it will be a signed comparison between the value of npins and 86.
So first of all we calculate the number of syscalls in the pin section [1], allocate some memory for it [2] and read it in [3].
At [4], we want to figure out how big to make our pin array, so we loop over all of the syscall entries and record the largest we've seen so far [5]. (Note: the use of `MAX` here is fine since `sysno` is unsigned -- see near the top of the function).
With the maximum `sysno` found, we then crucially go on to clamp the value to `SYS_kbind` [6] and +1 at [7].
This clamped maximum value is used for the array allocation at [8].
We now loop through the syscall list again, but now take the unclamped `sysno` as the index into the array to read at [9] and write at [10] and [11]. This is essentially the vulnerability right here.
Through heap grooming, there's a good chance you could arrange for a useful structure to be placed within range of the write at [11] -- and `offset` is essentially an arbitrary value you can write. So it looks like it would be relatively easy to exploit.
Re-reading this, my analysis is slightly incorrect: the `MAX` at [5] with an unsigned arg means we can make `npins` an arbitrary `int` using the loop at [4].
Choosing to make `npins` negative using that loop means we'll end up allocating an array of 87 (`SYS_kbind + 1`) `int`s at [8] and continue with the OOB accesses described.
You'd set up your `pinsyscall` entries like this:
struct pinsyscall entries[] = {
{ .sysno = 0x1111, .offset = 0xdeadbeef }, /* first oob write */
{ .sysno = 0x2222, .offset = 0xf000f000 }, /* second oob write */
{ .sysno = 0xffffffff } /* sets npins to 0xffffffff so we under-allocate */
};
`npins` would be `0xffffffff` after the loop and then the `MAX` at [6] would then return `86`, since `MAX(-1, 86) == 86`.
Just to handle the case where the same syscall number is specified twice by the ELF header: in that case, the entry is set to -1 (presumably meaning it’s invalid).
Now when we come to 3, we'll find `pin[syscalls[2].sysno] != 0` since `syscalls[2].sysno == syscalls[0].sysno` - so we set `pin[1] = -1` instead of `0x9abc`.
Oh, thanks, now I understand why there is an if in the for loop! But I still can't see how pin[] could be accessed out of bounds, since the array is allocated to be large enough to hold the largest value of .sysno occurring in the entries[] array.
Right, so this is the crux of the vulnerability. Firstly, note that the `MAX` macro is defined as:
#define MAX(a, b) ((a) > (b) ? (a) : (b))
This is important because it doesn't cast either arg to any particular type. You can use it with `float`s, or `int`s, or `u_int`s... or a combination.
Referring back to the implementation, the first use of `MAX` (inside the `for` loop) is this:
...
for (i = 0; i < nsyscalls; i++)
npins = MAX(npins, syscalls[i].sysno);
...
`npins` is an `int`, but `sysno` is a `u_int`. C integer promotion rules means that we'll actually be implicitly casting `npins` to a `u_int` here; it's as if we did this:
This means that `npins` can end up as any value we like -- even up to `0xffffffff`. But remember that `npins` is _actually_ a signed int, so once it comes out of `MAX`, it'll be signed again. Thus we can use this to make `npins` negative.
Once we're out of the loop, `MAX` is used again here:
...
npins = MAX(npins, SYS_kbind);
...
Where `SYS_kbind` is just:
...
#define SYS_kbind 86
...
Integer literals in C are signed, so now this use of `MAX` is actually dealing with two signed integers. If we used the loop to make `npins` negative (as described just before) then this line will now take 86 as the maximum of the two values.
With `npins = 86`, an array of 86+1 will be allocated, but the `syscalls[i].sysno` in the next loop could of course easily be greater than 86 -- thus leading to out-of-bounds array access.
So then it depends on if whatever code loaded the elf section does any validation of the data it reads. I can't help but thinking the whole code could use some structs and/or "getter setters" to talk dirty objective oriented speak. It needn't be that though, it could be as low level as some macros which helps doing the right thing with signedness and such. But my main impression is that a lot of the data structure semantics is kept in the heads of programmers instead of being formalised in the code.
In C there is always the opportunity to run with scissors in the middle of the road, but you can do a lot to protect yourself too, without loosing much, if any, performance.
> Two remote public holes in the default install, as in “unauthenticated remote code execution”. How many local privilege escalation? How many remote/local denial of service? How many remote code execution in software not present in the non-default install? How many private ones that were never disclosed? Other operating systems didn’t have many unauthenticated RCE either
https://twitter.com/halvarflake/status/1156815950873804800
With that in mind, it'd be handy to know which exploit techniques these steps break, and whether those steps are in the current "meta" game for exploit developers.
(The specific mitigation here: the kernel formerly locked system call invocation down to the libc.so area of program text in memory; libc.so is big, so now OpenBSD locks specific system calls down to their specified libc stubs; further, in static binaries, the same mechanism locks programs down to only those system calls used in the binary, which effectively disables all the system calls not explicitly invoked by the program text of a static binary).