OpenBSD – pinning all system calls

tptacek · on Dec 12, 2023

Classic thread on this stuff from Halvar Flake:

https://twitter.com/halvarflake/status/1156815950873804800

With that in mind, it'd be handy to know which exploit techniques these steps break, and whether those steps are in the current "meta" game for exploit developers.

(The specific mitigation here: the kernel formerly locked system call invocation down to the libc.so area of program text in memory; libc.so is big, so now OpenBSD locks specific system calls down to their specified libc stubs; further, in static binaries, the same mechanism locks programs down to only those system calls used in the binary, which effectively disables all the system calls not explicitly invoked by the program text of a static binary).

vngzs · on Dec 12, 2023

Indeed, in CCC's "systematic evaluation of OpenBSD's mitigations"[0] the presenter explicitly calls out OpenBSD's tendency to present mitigations without specific examples of CVEs it defeats or exploit techniques the mitigations are known to defend against:

> Proper mitigations I think stem from proper design and threat modeling. Strong, reality-based statements like "this kills these vulnerabilities," or "this kills this CVE; it delays production of an exploit by one week." And also thorough testing by seasoned exploit writers. Anything else is relying on pure luck, superstition, and wishful thinking.

Some of OpenBSD's mitigations are excellent and robust in defensiveness; others are amorphous and not particularly useful.

[0]: https://youtu.be/3E9ga-CylWQ?feature=shared&t=2770

bheadmaster · on Dec 12, 2023

> Proper mitigations I think stem from proper design and threat modeling. Strong, reality-based statements like "this kills these vulnerabilities," or "this kills this CVE; it delays production of an exploit by one week." And also thorough testing by seasoned exploit writers. Anything else is relying on pure luck, superstition, and wishful thinking.

The comment seems to imply that "proper design and threat modeling" must stem from real-world CVE-s and proofs of concept. That seems to me like "if nobody heard it, the tree didn't fall" kind of thinking.

I'm sure OpenBSD developers have very good intuition on what could be used in a vulnerability, without having to write one themselves. And fortunately, they don't have a manager above them to whom they need to justify their billing hours.

msm_ · on Dec 12, 2023

>I'm sure OpenBSD developers have very good intuition on what could be used in a vulnerability, without having to write one themselves

Why? On average programmers are not very good security engineers. And the opposite - security engineers are often not a good programmers. If your mitigation doesn't stop any CVE that's being exploited right now in the wild, it's an academic exercise and not particularly useful IMO.

>And fortunately, they don't have a manager above them to whom they need to justify their billing hours.

The point of the thread is that the mitigation cost right now may be low (the "billing hours"), but it's paid in perpetuity by everyone else downstream - in complexity, performance, unexpected bugs, etc. So having a manager or BDFL to evaluate the tradeoffs may be beneficial.

bheadmaster · on Dec 12, 2023

> If your mitigation doesn't stop any CVE that's being exploited right now in the wild, it's an academic exercise and not particularly useful IMO.

If your only metric of security is "fixed CVEs", then you're rewarding mistakes that were rectified later, and punishing proactive approach to security that actually makes fewer CVEs appear in the first place.

And Theo's reputation and influence on the security is evidence that what he does is more than just "academic exercise". E.g. he created OpenSSH.

> The point of the thread is that the mitigation cost right now may be low (the "billing hours"), but it's paid in perpetuity by everyone else downstream - in complexity, performance, unexpected bugs, etc.

While that may or may not be the pattern in general, it is not a rule, and especially doesn't apply in OpenBSD development. OpenBSD is widely regarded as one of the cleanest and most robust (free software) codebases ever.

tptacek · on Dec 12, 2023

You're mischaracterizing their logic. They're saying it's a necessary but not sufficient metric. You can't then shoot it down for being not-sufficient; we all agree about that.

It's not my recollection that Theo created OpenSSH, for what it's worth. My memory of this is that it was mostly Niels and Markus who did the lifting.

You might do some digging on Theo's reputation among exploit developers. It's complicated.

bheadmaster · on Dec 13, 2023

> They're saying it's a necessary but not sufficient metric.

Okay, then I'm saying it shouldn't be necessary either, for the sole reason that preventing a future CVE is not measurable, while fixing a CVE is. If you so much as pay attention to fixing existing real-world CVEs, you're implicitly focusing on that measurement, as you cannot predict the future. I argue that we would be better off not paying attention to them at all.

If anything, we should take the wide array of CVEs that were discovered in other systems and not applicable to OpenBSD as evidence that their intuition and proactive approach works well. The only real metric of a security of a system is the absolute number of CVEs in a long period of time, in which OpenBSD shines.

yencabulator · on Dec 13, 2023

> E.g. he created OpenSSH.

OpenSSH is a fork of Tatu Ylönen's SSH from when it was not proprietary.

MonkeyClub · on Dec 13, 2023

>> I'm sure OpenBSD developers have very good intuition on what could be used in a vulnerability, without having to write one themselves

> Why?

Exactly, POCOGTFO! :)

But wouldn't providing such a proof-of-concept implementation immediately render a bull's eye on all pre -current (and/or not appropriately syspatched) boxes in the wild?

saagarjha · on Dec 13, 2023

That’s why you invest in closing the patch gap.

zilti · on Dec 13, 2023

I wouldn't call OpenBSD programmers average.

tptacek · on Dec 12, 2023

They famously do not. That's OK, it's a trait shared by a lot of hardening developers on other platforms, too --- all of them are better at this than I'll ever be. But the gulf of practical know-how between OS developers and exploit developers has been for something like 2 decades now a continuing source of comedy. Search Twitter for "trapsled", or "RETGUARD", for instance.

bheadmaster · on Dec 12, 2023

> But the gulf of practical know-how between OS developers and exploit developers has been for something like 2 decades now a continuing source of comedy

Are you implying that OS developers are 2 decades behind exploit developers? If so, is there any proof of that claim, e.g. OpenBSD exploits?

Or are you implying that OS developers are 2 decades ahead of exploit developers? If so, how is that a bad thing?

tptacek · on Dec 12, 2023

Neither, I'm saying that for the past 2 decades, the conventional wisdom in the space has been that OS hardening efforts were some significant quantum of time behind exploit developers, but certainly not "2 decades" worth.

It's an aggregate sentiment, right? There are some mitigations that I think legitimately did set back exploit development, but on the whole I think the sentiment has been that OS hardening mitigations have been not just reactive, but reactive to exploit development that is some significant quantum of time behind the current state of the art.

By way of example, I think people made fun of the original OpenBSD system call mitigation stuff described at the beginning of this post. I have no idea what the consensus would be on this new iteration of the idea.

__turbobrew__ · on Dec 12, 2023

OpenBSD disabled hyperthreading before speculative execution attacks were in the wild. In the words of Greg K-H “OpenBSD was right”.

There probably is some amount of security theatre in OpenBSD but they have also mitigated attacks which weren’t even known to exist.

0xDEAFBEAD · on Dec 12, 2023

>they have also mitigated attacks which weren’t even known to exist

Indeed, I'm reminded of some other comments that tptacek made in a recent thread, about how encrypting vulnerability disclosures "just isn't done":

https://news.ycombinator.com/item?id=38563897

https://news.ycombinator.com/item?id=38569179

I'll bet the NSA is very happy about this situation and is doing everything they can to keep the gravy train rolling.

I thought the entire point of being a good security person was that you're able to anticipate and defend against attacks before they become known... Isn't that what "security mindset" is supposed to entail?

tptacek · on Dec 12, 2023

NSA doesn't care about your emailed vulnerability report. They're not spending their own money when they buy zero-day bug chains in platforms people actually use, and even if they were, those bug chains are so ludicrously cheap relative to their utility that any sigint (or law enforcement, for that matter) organization in the world, from Canada to El Salvador, can cheerfully afford them.

Even if your emailed report was a complete bug chain and not, like, an X-Frame-Options redressing issue, it would be harder, and probably more expensive, for NSA to pick the bug up from email than it would be for them to simply fill out a purchase order from one of their private partners.

As always it is helpful to remember as well that NSA's mission is to secure budget for NSA, full stop.

0xDEAFBEAD · on Dec 13, 2023

Thanks for the reply!

>As always it is helpful to remember as well that NSA's mission is to secure budget for NSA, full stop.

Sure, let's focus on an intelligence agency with budget constraints, Russia's GRU perhaps.

You claim that bug chains are "ludicrously cheap". Is cheap the same thing as abundant? If you had to guess, how many distinct zero-click exploit chains do does the GRU have for e.g. an iPhone in lockdown mode? Order of magnitude: do they have 1? 10? 100? 1000?

Zerodium pays up to 2M for "Full Chain with Persistence" for iOS: https://www.zerodium.com/program.html I don't think a low price relative to utility lets us conclude that such exploits are abundant. There's asymmetrical information in this market: buyers don't know the quality/novelty of what sellers have discovered, and sellers don't know how badly buyers need what they have to sell. It seems plausible to me that a savvy seller could negotiate a significantly higher price, similar to how tech workers are often able to negotiate significantly higher compensation -- especially if they were somehow able to prove that they weren't just replicating an exploit the broker already had in their inventory. I also suspect there is significant buying power on the buyer side which keeps acquisition prices low (hard to play buyers against each other, given low number of buyers who coordinate with each other).

In any case, I think this is the wrong question in a certain sense. The right question is about the relative cost of buying exploits vs developing in-house. I don't see why picking up the bug from email is hard or expensive. If the GRU is already running a program like XKEYSCORE, which seems likely, it could just be a matter of adding a few filtering rules for emails that go to select security@ email addresses. Have a GRU engineer monitor those emails, and see if any proof-of-concept work in the email can be quickly integrated into existing malware, in order to attack a target considered too low-value for the GRU's crown jewel exploits.

The real question is about the salary of that GRU engineer vs the cost of purchasing exploits. If the GRU engineer gets paid $100K, and a fresh exploit costs $500K, employing the GRU engineer to harvest a few temporary, expendable exploits a year looks quite favorable. I don't think the price/utility ratio of exploits from brokers affects the decision, since that price/utility ratio argument also works for exploits harvested+developed in-house.

Neither of us really knows what's going on in intelligence agencies, but my story seems about as plausible as yours. Given that simply using a Google Form for bug disclosures would be an easy and dramatic improvement on the status quo, I'm left with the sense that there is a lot of dysfunctional cargo-culting going on in the security world.

Looking forward to your response!

saagarjha · on Dec 12, 2023

He’s not wrong, though. Security researchers don’t use PGP when reporting vulnerabilities typically.

Conscat · on Dec 12, 2023

OpenBSD doesn't even have hyperthreading? Why does anyone use this OS? The Linux developers put in a lot of effort to make hyperthreading actually work for their kernel rather than ignoring it.

MonkeyClub · on Dec 13, 2023

It has, but it's disabled by default.

You can find more info on the relevant commit here: https://undeadly.org/cgi?action=article;sid=20180620110722

Some delightful de Raadt finger-wagging along with HN discussion here: https://news.ycombinator.com/item?id=17829790

zamalek · on Dec 12, 2023

There have been cases where OpenBSD's hypothetical mitigations have worked out well for the project. I recall a relatively recent DNS cache poisoning attack that OpenBSD was novel in pre-emptively mitigating because something (I think it was the port?) was "needlessly" random.

If a mitigation has negligible performance impact, and doesn't introduce a new attack vector, I can't imagine why it would be seen as a bad thing.

lmm · on Dec 12, 2023

> If a mitigation has negligible performance impact, and doesn't introduce a new attack vector, I can't imagine why it would be seen as a bad thing.

Because it creates confusion about your threat model, which can ultimately weaken your security.

insanitybit · on Dec 12, 2023

Every mitigation is code and complexity. There is always a cost.

binkHN · on Dec 12, 2023

Not for every mitigation. There are plenty of cases where OpenBSD removed code and functionality because of security implications.

insanitybit · on Dec 12, 2023

In this case we got a code exec.

wolverine876 · on Dec 12, 2023

> Classic thread on this stuff from Halvar Flake:

That's from four years ago and does not address these technical issues. Are you going to pull it out every time OpenBSD is mentioned? I think people understand that you don't like their approach, etc., and the flaws you see, and that OpenBSD isn't designed for your interests.

rs_rs_rs_rs_rs · on Dec 12, 2023

>I think people understand that you don't like their approach, etc., and the flaws you see, and that OpenBSD isn't designed for your interests.

OpenBSD isn't designed from anyone's interests.

https://isopenbsdsecu.re/mitigations/

wolverine876 · on Dec 12, 2023

Only if you personally get to define other people's interests. Apparently they disagree!

tptacek · on Dec 12, 2023

"That's from four years ago" is a funny rebuttal to "classic thread".

harry8 · on Dec 12, 2023

https://nitter.net/halvarflake/status/1156815950873804800

for those who don't have an X a/c

0xDEAFBEAD · on Dec 12, 2023

Is there a current meta for OpenBSD exploit developers?

What's the right way to go about hardening the system if there's no meta to observe?

My very naive take would be something like: A successful exploit depends on jumping through a number of different hoops. Each of those hoops has an estimated success probability associated with it. We can multiply all the individual probabilities together to get an estimated probability of successful exploit -- assuming that hoop probabilities are independent, which seems reasonable? The most efficient way to harden against exploits is to try and shrink whichever hoop possesses the greatest partial derivative of overall exploit success probability with respect to developer time.

saagarjha · on Dec 12, 2023

The meta doesn’t exist because nobody targets OpenBSD because it’s not used. People’s analysis of it is mostly just their educated guess as to how work for other platforms would carry over.

rfoo · on Dec 12, 2023

There is also "soft" mitigations vs "hard" mitigations guideline as described here [1].

It's handy when designing new mitigations when there's no meta game.

[1] https://googleprojectzero.blogspot.com/2023/08/mte-as-implem...

dtx1 · on Dec 12, 2023

> The most efficient way to harden against exploits is to try and shrink whichever hoop possesses the greatest partial derivative of overall exploit success probability with respect to developer time.

Depending on your definition of efficient, adding more hoops should work exponentially better.

0xDEAFBEAD · on Dec 12, 2023

My definition of efficient is essentially whatever decreases the number of workable exploits most rapidly per hour of developer time.

>Depending on your definition of efficient, adding more hoops should work exponentially better.

Explain?

nequo · on Dec 12, 2023

Suppose your hoop probabilities are 25% and that you have two hoops so that the probability of jumping through both is

  25% * 25% = 6.25%.

You can reduce the size of one of the hoops in half, changing the probability to

  25% * 25%/2 = 3.125%

You can also add a third hoop, in which case the probability is

  25% * 25% * 25% = 1.5625%

1.5625% < 3.125%, so adding a third hoop is better than shrinking one of the two existing hoops. Of course, this argument makes important assumptions about the hoop probabilities.

krab · on Dec 12, 2023

The probabilities aren't independent. The person jumping through the first hoop is probably more able than average. Therefore, any additional hoop - if it doesn't require a completely orthogonal skill - is less selective.

0xDEAFBEAD · on Dec 13, 2023

I think it depends on what the "probability" is meant to indicate. You're correct if it's meant to indicate whether a particular attacker can get through a particular hoop. But probabilities could also refer to e.g. the chance that it's possible to get through a particular hoop, period. Or the fraction of some input space which corresponds to an exploitation.

0xDEAFBEAD · on Dec 12, 2023

Makes sense. Other key questions would be: complexity cost of added hoop (including, possibly, increased attack surface -- the sequence of hoops is just an abstraction that reality may not obey) and also creation difficulty (it could be that improving an existing hoop is significantly quicker than creating a new one).

woodruffw · on Dec 12, 2023

Without a pre-formed opinion: does anybody have an intuition for the security benefits this provides? My first thought is that it’s primarily mitigating cases of attacker-introduced shellcode, which should already be pretty well covered by techniques like W^X. Code reuse techniques (ROP, JOP, etc.) aren’t impacted, right?

I would also think this would cause problems for JITed code, although maybe syscalls in JITed code aren’t common enough for this to be an issue (or the JIT gets around it by calling a syscall thunk, similar to how Go handled OpenBSD’s earlier syscall changes).

gary_0 · on Dec 12, 2023

> Code reuse techniques (ROP, JOP, etc.) aren’t impacted, right?

Unless I'm mistaken, this should restrict what you can do with ROP gadgets that contain syscalls. You will only be able to use the gadget with its intended arguments, since other syscall types will be disallowed.

> I would also think this would cause problems for JITed code

They can probably just jump into precompiled code that performs the needed syscall. Also, making syscalls directly from something like JITed JavaScript is generally avoided anyways. AFAIK browsers don't even let the processes that run JavaScript touch much of the system at all, instead they have to use an IPC mechanism to ask a slightly more privileged process to perform specific tasks.

woodruffw · on Dec 12, 2023

> You will only be able to use the gadget with its intended arguments, since other syscall types will be disallowed.

That makes sense, although "intended" arguments here means still being able to invoke `execve(2)`, etc., right? The gadget will still be able to mangle whatever it likes into the arguments for that syscall; it just won't be able to mangle a `wait(2)` into an `execve(2)`, I think.

Your points about JITs make sense, thanks.

gary_0 · on Dec 12, 2023

Yes, that's how I understand it; you can't mangle one syscall into another, but you can still mangle the syscall's other argument values.

enjoytheview · on Dec 12, 2023

The other comment on this thread mentions that it also does something else:

>disables all the system calls not explicitly invoked by the program text of a static binary

This means that if the original library didn't have an execve call in it, you would'nt be able to use it even if with ROP. In short, this seems useful to block attackers from using syscalls that were not originally used by the program and nothing else. It can be useful.

woodruffw · on Dec 12, 2023

Sure, assuming your programs don't execute other programs. I don't know much about OpenBSD specifically, but spawning all over the place is the "norm" in terms of "Unix philosophy" program design.

(I agree with the point in the adjacent thread: it's hard to know what to make of security mitigations that aren't accompanied by a threat model and attacker profile!)

torstenvl · on Dec 12, 2023

*Agent Smith voice*

But what use is a fork() if you're unable to exec()?

tedunangst · on Dec 12, 2023

It is now less normal for programs to run programs on openbsd.

pdonis · on Dec 12, 2023

> assuming your programs don't execute other programs.

What about language runtimes? They don't execute other programs in the sense of ELF executables (although the programs they interpret might), but they have to support every syscall that's included in the language. So, for example, the Python interpreter would have to include the appropriate code for every syscall that Python byte code could call (in addition to whatever internal syscalls are used by the interpreter itself). That would be a pretty complete set of syscalls.

woodruffw · on Dec 12, 2023

Yep, language runtimes are an (inevitably?) large attack surface. My understanding is that OpenBSD userspace processes can voluntarily limit their own syscall behavior with pledge[1], so a Python program (or the interpreter itself) could limit the scope of a particular process. But I have no idea how common that is.

[1]: https://man.openbsd.org/pledge

masklinn · on Dec 12, 2023

A langage runtime would dispatch to the libc, which is always whitelisted.

This is only an issue for the weirdo langage runtimes who’d also refuse to use libc.

arzig · on Dec 12, 2023

cough go cough

Although it is periodically useful to be able to copy a binary to some random Linux server and know it will work.

masklinn · on Dec 12, 2023

Even for go it should actually work as-is: the syscalls should exist statically in the binary, so the loader can enumerate and whitelist them.

What gets blocked is the system constructing the entire thing at runtime, or at least setting the syscall number dynamically.

saagarjha · on Dec 13, 2023

Isn’t that how all syscalls work? The syscall number typically goes in a register.

masklinn · on Dec 13, 2023

The syscall goes in a register but it does not have to appear literally right next to the `syscall` instruction in the binary. As TFA explains in the introduction, a syscall stub generally looks like

    mov eax,0x5
    syscall

However it doesn’t have to, `syscall` will work as long as `eax` is set no matter where it’s set, or where it’s set from. You could load it from an array or a computation for all `syscall` cares.

So as an attacker if you can get eax to a value you control (and probably a few other registries) then jump to the `syscall` instruction directly you have arbitrary syscall capabilities.

The point of this change is that the loader now records exact syscall stubs as “address X performs syscall S”, then on context switch the kernel validates if the syscall being performed matches what was recorded by the loader, and if not it aborts (I assume I didn’t actually check).

This means as long as your go binary uses a normal syscall stub it’ll be recognised by the loader and whitelisted, but if say a JIT constructs syscalls dynamically (instead of bouncing through libc or whatever) that will be rejected because the loader won’t have that (address, number) recorded.

NobodyNada · on Dec 12, 2023

> Code reuse techniques (ROP, JOP, etc.) aren’t impacted, right?

One thing to note is that system calls can no longer be made from the program's .text section; only from within libc. This is highly important because of ASLR: in order to ROP into a syscall, an attacker must now know where libc is located in the virtual address space. Before this mitigation, an attacker that only knew the address of the program binary could search for a sequence of bytes within the .text section that happened to decode to a syscall instruction, and use that for ROP (code reuse techniques can often access a lot of unexpected instructions by jumping into the middle of a multibyte instruction, due to x86's complex and variable-length instruction encoding).

gumby · on Dec 12, 2023

> in ld.so text, and in that case the main program's text cannot do system calls

I don’t understand this case. Is there a way to do IO in openbsd without a system call? Without IO how can you get the result of the computation?

Is this a singular special case?

monocasa · on Dec 12, 2023

ld.so can do so while initially linking the application in pre main, then in

> 4) in libc.so text, and ld.so tells the kernel where that is

> The first 3 were cases were configured entirely by the kernel, the 4th case used a new msyscall(2) system call to identify the correct libc.so region.

ld.so passes its ability to make syscalls to libc.so. The application has to call into libc.so in order to perform any IO.

gumby · on Dec 12, 2023

Ah, makes sense, thanks. Libc can sanitize all the inputs, and as long as ld.so has a hardwired path to libc all is well. This way you don’t even need a facility to tell the kernel “this binary is allowed to make system calls.

MyMonkeyBalls · on Dec 12, 2023

This implementation has a trivial buffer overflow, ROFLMAO

trealira · on Dec 12, 2023

Have you managed to trigger this? You never ended up explaining how the heap overflow occurs, and I cannot determine whether the other person who was guessing how it might happen is right, because I am not very familiar with OpenBSD's code.

tiffanyh · on Dec 12, 2023

Would you mind sharing how and where in the code, specifically.

Geniunely curious.

ninjin · on Dec 12, 2023

This just went out to tech@:

https://marc.info/?l=openbsd-tech&m=170234892604404&w=2

hedora · on Dec 12, 2023

One of the more under-appreciated features of Rust is that it traps on integer overflow / underflow by default.

It’s too bad OpenBSD doesn’t have a good Rust story for the core system. (I understand their reasoning, but it’s still too bad.)

I wonder how hard it would be to backport the integer behavior to C. (C++ would be easy: Use templates.) Perhaps they could add a compiler directive that causes vanilla integer overflow/wraparound to trap, then add annotations or a special library call that supports modulo arithmetic as expected.

ninjin · on Dec 12, 2023

Look, I get as much as the next guy that Rust brings a lot of niceties. But what we are talking about here is a project that clocks in at just over 19,000,000 lines of C (`wc -l $(find src -name '*.c' -or -name '*.h')`), code heritage going back more than 40 years, an explicit commitment to a very rich set of platforms [1], and a very limited amount of manpower compared to projects such as Linux with their insane level of corporate backing. "Just rewrite it in and/or integrate Rust" is neither easy nor safe in that it overturns everything that is already there and tested.

[1]: https://www.openbsd.org/plat.html

As for improving upon C. I can not speak for OpenBSD as a project, but I am sure that there would be ample excitement to produce a minimal, solid C compiler with experimental security features to then serve as the default compiler for the project (heck, OpenBSD already ships with a number of less common security-related compiler flags from what I recall). Sadly, I doubt there is either funding or the hands to make that a reality.

nextaccountic · on Dec 12, 2023

This is all true, but this just means that any security focused OS for the next 40 years should consider Rust

bayindirh · on Dec 12, 2023

Rust brings niceties, but this implies no requirement for considering Rust.

ninjin · on Dec 12, 2023

Implying that OpenBSD has not considered Rust? There is plenty of discussion on misc@ already (for example [1]) and while it certainly can be "ranty", I am sure if one reads it in good faith you get a fairly nuanced picture.

[1]: https://marc.info/?l=openbsd-misc&m=151233210523661&w=2

Look, I think a lot of programmers and managers fail to understand the number of factors one needs to consider and how it scales with the complexity of your codebase and what it interacts with. If you want to rewrite your video processing service which you wrote together with five or so contributors and is say 20,000 lines of C++ into Rust, that is one thing. Take a step back and consider the number of users and what you interact with, it all seems rather manageable and you can probably be backwards compatible with respect to your users. In terms of time, maybe a few months? Maybe even six? For a single programmer that now needs to learn proper Rust.

Now, instead consider what an operating system is and the surface with which it interacts. The absolutely metric ton of hardware, the heap of standards, the massive load of hacks that are documented and undocumented, the large amount of users, the number of contributors and their experience, the platforms that you support, all the software that is written for your operating system to make it useful, etc. OpenBSD is famous (infamous?) for being willing to break things to do "the right thing". But they are also famous (infamous, again?) for being very conservative, which I think is understandable given their security focus and (relatively) low amount of manpower. Rust has certainly been considered, but it is far from the only consideration. This is akin to walking into a multi-million dollar company that has a fully functioning service or piece of software that they have been selling for decades and suggest to management that maybe we should start moving it all from C# to Rust next week, while being blissfully unaware of everything else the company is beholden to. Spoiler, it will not work out that way.

Furthermore, it is somewhat tiring that Rust keeps being touted as the final revelation when it comes to writing safer code. Guess what, there were plenty of projects before Rust that introduced safety in various forms (and at various costs) and there will be plenty of projects after Rust that will do the same. Yes, it is an amazing piece of technology, but it is equally plausible that its influence may not end up in it eating the world; rather give birth to something else or bring some of its thinking into other languages. Only time will tell and worse may end up being better yet again.

So to me, what any operating system (security focused or not) should do is to consider their (limited) options given their own goals and situation. Which is something I have seen plenty of evidence that a project of the age of OpenBSD is doing and successfully so.

Personally, I am keeping an eye on Redox and look forward to see what lessons will be learnt from their take of what an operating system can be. But for my servers and desktop I have and continue (for now?) to run a healthy mix of Linux and BSD while getting work done.

jcranmer · on Dec 12, 2023

> I am sure if one reads it in good faith you get a fairly nuanced picture.

I don't see a lot of nuance. The primary argument in that thread boils down to any programmer who isn't using C isn't a serious programmer. The dominant technical analyses are a) adding more compiler toolchains make builds take a lot longer and b) memory safety doesn't protect against everything, so why bother? That's not nuance.

Nuance would be pointing out that Rust provides benefits over just memory safety: that Rust tends to push you towards a parse-don't-validate memory, or maybe Rust having a greater emphasis on error checking correctly [1]. But then again, Rust isn't necessary to do that, so you could also analyze how existing software development practices are sufficient to bring those into play.

You can also discuss Rust's failures. Rust, after all, didn't get uninitialized memory right. There's also some uneasiness about the details of the borrow checker to point to, or you can throw some remarks about the &mut-is-noalias and the difficulty it takes to actually ensure that you never create two &mut to the same location in unsafe code.

Nuance might also discuss how safety features do and don't percolate in mixed-programming-language environments, or how a rewrite may or may not improve security. There's definitely items on both sides of the cost-benefit ledger there!

But that's not what we got. The thread starts with an (incorrect [2]) gatekeeping moment of "it's not serious if you don't rewrite coreutils in it," and mostly devolves into a general theme of "anyone who needs their programming language to provide safety wheels is a terrible programmer who shouldn't be allowed anywhere near systems programming". The irony of that viewpoint coming from an OS well known for its love of just-in-case security precautions is not lost on me.

[1] I was recently writing something parsing diffs, and as part of that, I was making sure that the arithmetic on line numbers didn't overflow. Integer overflow (whether signed or unsigned) is frequently ignored as an error source in most programming languages!

[2] At the time the post was written, people were in fact working on doing a rewrite of this stuff in Rust.

ninjin · on Dec 12, 2023

Thank you, that is an interesting response to read, but what I was trying to communicate with that sentence was that if you read that thread and others in good faith you can get a fairly nuanced picture and not that it is one in and of itself. I am certainly not implying that you get a nuanced picture from simply reading a mailing list which I described as being littered with rants in the very same sentence that you quoted parts of.

As for the linked thread itself, I just ignored the "Real Programmers Don't Use Pascal" parts (it gets old really fast) and what I arrived it is largely reflected in what I already wrote in the parent and ancestors.

nextaccountic · on Dec 12, 2023

> You can also discuss Rust's failures. Rust, after all, didn't get uninitialized memory right.

I think that's a success story because while Rust 1.0 shipped with the hopelessly broken std::mem::unitialized, Rust 1.36 shipped with std::mem::MaybeUninit that fixed the situation. (it's true that they can't definitely remove the broken API, but it can be aggressively linted against nonetheless)

That is, it's a flaw present in the first version of the language but a success of the development process and language evolution, without breaking backwards compatibility

https://doc.rust-lang.org/std/mem/fn.uninitialized.html

https://doc.rust-lang.org/std/mem/union.MaybeUninit.html

nextaccountic · on Dec 12, 2023

> Implying that OpenBSD has not considered Rust?

No, not at all. OpenBSD may consider Rust all the time and it may still be not the right language for them.

But Rust might be the right language for some greenfield projects (which means, no rewrite), in ways that doesn't apply to a decades-old OS

ninjin · on Dec 13, 2023

Thanks, I do think we are largely in agreement. Despite me somehow needing a wall of text to communicate it all.

eb0la · on Dec 12, 2023

IMHO the biggest blocker for Rust inside OpenBSD is that the project already has killed C developers and moving (some stuff) to Rust will need time and effort that could be put into other problems.

If they had to stabilize some parts, probably the answer would be different.

Narishma · on Dec 12, 2023

> IMHO the biggest blocker for Rust inside OpenBSD is that the project already has killed C developers and moving (some stuff) to Rust will need time and effort that could be put into other problems.

I'm all for dropping C in favor of Rust for new projects, but killing C developers is going a bit too far, don't you think?

abound · on Dec 12, 2023

> One of the more under-appreciated features of Rust is that it traps on integer overflow / underflow by default.

I think this is only true on debug builds, with --release (which most Rust binaries an end-user uses should be compiled with) it just wraps [1] [2]

[1] https://github.com/rust-lang/rfcs/blob/26197104b7bb9a5a35db2... [2] https://stackoverflow.com/a/60238510

piss_n_chips · on Dec 12, 2023

There are compiler flags that OpenBSD could be using but aren't, which would have caught this bug without needing to convert the codebase to Rust. Using -Wconversion would have warned on the mismatched signedness of the MAX macro argument (the unsigned integer, sysno) and its result (being assigned to npins, a signed integer). Alternatively, adding -fsanitize=implicit-integer-sign-change, or a UBSan flag that includes this, would detect this at runtime for the actual range of values that end up causing a change of sign.

Though, these would also be triggered by statements like:

    pins[SYS_kbind] = -1;

Due to the pins array being of unsigned int, so all this sort of code would need to be fixed too.

GrumpySloth · on Dec 12, 2023

In this case more important than any runtime overflow/underflow checks is the fact that the compiler will check that comparison operands are of the same type instead of inserting an implicit cast. Instead the programmer is forced to insert an explicit conversion like .try_into().unwrap(), which clearly suggests the possibility of an error. And if the error isn't handled, it will panic.

https://play.rust-lang.org/?version=stable&mode=debug&editio...

GrumpySloth · on Dec 12, 2023

A similar warning could be enabled in GCC using the flag -Wsign-compare.

1over137 · on Dec 12, 2023

>I wonder how hard it would be to backport the integer behavior to C.

Clang and gcc have flags that can make integer overflow trap.

petee · on Dec 12, 2023

> Stonks only go up

And the signature totally explains their childish reply, over there and here.

MyMonkeyBalls · on Dec 12, 2023

Yeah even a child like me is better at secure programming than Theo De Raadt

petee · on Dec 12, 2023

Then report the bug like an adult instead of acting like you found some huge published vulnerability in code that was posted for peer review. Thats why the code is there, so others can identify issues; congrats, you did.

People make mistakes in every project. So grow up

tiffanyh · on Dec 12, 2023

Would a code analyzer have detected this bug?

(E.g. Valgrind, Flexelint, cppcheck, clang static analyzer, etc.)

If yes, then why aren't code analyzers used on all OpenBSD code submissions, given their stance on having correct code & security focused.

ori_b · on Dec 13, 2023

No, probably not. It requires a crafted binary to be executed.

qzzi · on Dec 12, 2023

I don't know how they define `MAX`, but I'm guessing it's a typical "a>b?a:b". In function `elf_read_pintable` the `npins` is defined as signed int and `sysno` as unsigned int.

So this comparison will be unsigned and will allow to set `npins` to any value, even negative:

  npins = MAX(npins, syscalls[i].sysno)

Then `SYS_kbind` seems to be a signed int. So this comparison will be signed and "fix" the negative `npins` to `SYS_kbind`:

  npins = MAX(npins, SYS_kbind)

And finally the `sysno` index might be out of bounds here:

  pins[syscalls[i].sysno] = syscalls[i].offset

But maybe I'm completely wrong, I'm not interested in researching it too much.

LukeShu · on Dec 12, 2023

> I don't know how they define `MAX`, but I'm guessing it's a typical "a>b?a:b"

Indeed: https://github.com/openbsd/src/blob/master/sys/sys/param.h#L...

> Then `SYS_kbind` seems to be a signed int.

It's an untyped #define: https://github.com/openbsd/src/blob/master/sys/sys/syscall.h...

I believe your whole analysis is correct, that running an elf file with an openbsd.syscalls entry with .sysno > INT_MAX will allow an out-of-bounds write.

trealira · on Dec 12, 2023

> It's an untyped #define

Pure decimal integer literals (like 86) are typed as "int" in C, rather than being typeless and triggering type inference. This is a pain when you accidentally write something like this:

  uint64_t n = 1 << 32;

On modern desktop platforms, an int is 32 bits, so 1 << 32 is 0, not 2^32, even though a 64-bit integer is wide enough to support that.

Regardless, it's not relevant here, because if an integer and an unsigned integer of the same size are compared the integer is implicitly cast to unsigned integer, and 86 is fine for both signed and unsigned integers (so "MAX(npins, SYS_kbind)" is safe).

trealira · on Dec 12, 2023

> Then `SYS_kbind` seems to be a signed int. So this comparison will be signed and "fix" the negative `npins` to `SYS_kbind`:

  npins = MAX(npins, SYS_kbind)

No, the comparison is unsigned here. They're integers of the same "conversion rank", so the unsigned type wins and the signed integer is interpreted as unsigned.

https://en.cppreference.com/w/c/language/conversion

I can never remember the integer conversion rules and end up looking up this link whenever necessary.

piss_n_chips · on Dec 12, 2023

This isn't correct, as npins is a signed int and SYS_kbind is just a macro for the integer 86 (as defined in sys/syscall.h). So it will be a signed comparison between the value of npins and 86.

trealira · on Dec 12, 2023

My bad, I thought npins was unsigned.

qzzi · on Dec 12, 2023

tedunangst · on Dec 12, 2023

That seems about right.

accessvector · on Dec 12, 2023

Out-of-bounds heap write happens in this function:

        int
        elf_read_pintable(struct proc *p, Elf_Phdr *pp, struct vnode *vp,
            Elf_Ehdr *eh, uint **pinp)
        {
         struct pinsyscalls {
          u_int offset;
          u_int sysno;
         } *syscalls = NULL;
         int i, npins = 0, nsyscalls;
         uint *pins = NULL;
        
    [1]  nsyscalls = pp->p_filesz / sizeof(*syscalls);
         if (pp->p_filesz != nsyscalls * sizeof(*syscalls))
          goto bad;
    [2]  syscalls = malloc(pp->p_filesz, M_PINSYSCALL, M_WAITOK);
    [3]  if (elf_read_from(p, vp, pp->p_offset, syscalls,
             pp->p_filesz) != 0) {
          goto bad;
         }
        
    [4]  for (i = 0; i < nsyscalls; i++)
    [5]   npins = MAX(npins, syscalls[i].sysno);
    [6]  npins = MAX(npins, SYS_kbind);  /* XXX see ld.so/loader.c */
    [7]  npins++;
        
    [8]  pins = mallocarray(npins, sizeof(int), M_PINSYSCALL, M_WAITOK|M_ZERO);
         for (i = 0; i < nsyscalls; i++) {
    [9]   if (pins[syscalls[i].sysno])
    [10]   pins[syscalls[i].sysno] = -1; /* duplicated */
          else
    [11]   pins[syscalls[i].sysno] = syscalls[i].offset;
         }
         pins[SYS_kbind] = -1;   /* XXX see ld.so/loader.c */
        
         *pinp = pins;
         pins = NULL;
        bad:
         free(syscalls, M_PINSYSCALL, nsyscalls * sizeof(*syscalls));
         free(pins, M_PINSYSCALL, npins * sizeof(uint));
         return npins;
        }

So first of all we calculate the number of syscalls in the pin section [1], allocate some memory for it [2] and read it in [3].

At [4], we want to figure out how big to make our pin array, so we loop over all of the syscall entries and record the largest we've seen so far [5]. (Note: the use of `MAX` here is fine since `sysno` is unsigned -- see near the top of the function).

With the maximum `sysno` found, we then crucially go on to clamp the value to `SYS_kbind` [6] and +1 at [7].

This clamped maximum value is used for the array allocation at [8].

We now loop through the syscall list again, but now take the unclamped `sysno` as the index into the array to read at [9] and write at [10] and [11]. This is essentially the vulnerability right here.

Through heap grooming, there's a good chance you could arrange for a useful structure to be placed within range of the write at [11] -- and `offset` is essentially an arbitrary value you can write. So it looks like it would be relatively easy to exploit.

accessvector · on Dec 12, 2023

Re-reading this, my analysis is slightly incorrect: the `MAX` at [5] with an unsigned arg means we can make `npins` an arbitrary `int` using the loop at [4].

Choosing to make `npins` negative using that loop means we'll end up allocating an array of 87 (`SYS_kbind + 1`) `int`s at [8] and continue with the OOB accesses described.

You'd set up your `pinsyscall` entries like this:

    struct pinsyscall entries[] = {
        { .sysno = 0x1111, .offset = 0xdeadbeef }, /* first oob write */
        { .sysno = 0x2222, .offset = 0xf000f000 }, /* second oob write */
        { .sysno = 0xffffffff } /* sets npins to 0xffffffff so we under-allocate */
    };

`npins` would be `0xffffffff` after the loop and then the `MAX` at [6] would then return `86`, since `MAX(-1, 86) == 86`.

actionfromafar · on Dec 12, 2023

I must misunderstand something very basic about this code.

    if (pins[syscalls[i].sysno])

but pins is newly allocated and should just zero or "empty". Why dereference it right after allocation?

accessvector · on Dec 12, 2023

Just to handle the case where the same syscall number is specified twice by the ELF header: in that case, the entry is set to -1 (presumably meaning it’s invalid).

actionfromafar · on Dec 12, 2023

I still don't get it. Shouldn't [9] always evaluate to false, and the code be equivalent to:

    pins = mallocarray(npins, sizeof(int), M_PINSYSCALL, M_WAITOK|M_ZERO);
    for (i = 0; i < nsyscalls; i++) {
        pins[syscalls[i].sysno] = syscalls[i].offset;
    }

Edit:

Hang on - npins is already checked in the loop before, and incremented with ++

syscalls[i].sysno can't be larger than what is allocated with:

pins = mallocarray(npins, sizeof(int), M_PINSYSCALL, M_WAITOK|M_ZERO);

So I still can't find the problem

accessvector · on Dec 12, 2023

Consider this:

    struct pinsyscall entries[] = {
        { .sysno = 1, .offset = 0x1234 },
        { .sysno = 2, .offset = 0x5678 },
        { .sysno = 1, .offset = 0x9abc }
    };

Now `nsyscalls` will be 3 and `pin` will be an array of 3 ints, initialised to `{ 0, 0, 0 }`.

When we loop through, we'll set:

    1. `pin[syscalls[0].sysno] = 0x1234` => `pin[1] = 0x1234`
    2. `pin[syscalls[1].sysno] = 0x5678` => `pin[2] = 0x5678`

Now when we come to 3, we'll find `pin[syscalls[2].sysno] != 0` since `syscalls[2].sysno == syscalls[0].sysno` - so we set `pin[1] = -1` instead of `0x9abc`.

actionfromafar · on Dec 12, 2023

Oh, thanks, now I understand why there is an if in the for loop! But I still can't see how pin[] could be accessed out of bounds, since the array is allocated to be large enough to hold the largest value of .sysno occurring in the entries[] array.

accessvector · on Dec 13, 2023

Right, so this is the crux of the vulnerability. Firstly, note that the `MAX` macro is defined as:

    #define MAX(a, b) ((a) > (b) ? (a) : (b))

This is important because it doesn't cast either arg to any particular type. You can use it with `float`s, or `int`s, or `u_int`s... or a combination.

Referring back to the implementation, the first use of `MAX` (inside the `for` loop) is this:

    ...
    for (i = 0; i < nsyscalls; i++)
        npins = MAX(npins, syscalls[i].sysno);
    ...

`npins` is an `int`, but `sysno` is a `u_int`. C integer promotion rules means that we'll actually be implicitly casting `npins` to a `u_int` here; it's as if we did this:

    ...
     npins = MAX((u_int)npins, syscalls[i].sysno);
    ...

This means that `npins` can end up as any value we like -- even up to `0xffffffff`. But remember that `npins` is _actually_ a signed int, so once it comes out of `MAX`, it'll be signed again. Thus we can use this to make `npins` negative.

Once we're out of the loop, `MAX` is used again here:

    ...
    npins = MAX(npins, SYS_kbind);
    ...

Where `SYS_kbind` is just:

    ...
    #define SYS_kbind       86
    ...

Integer literals in C are signed, so now this use of `MAX` is actually dealing with two signed integers. If we used the loop to make `npins` negative (as described just before) then this line will now take 86 as the maximum of the two values.

With `npins = 86`, an array of 86+1 will be allocated, but the `syscalls[i].sysno` in the next loop could of course easily be greater than 86 -- thus leading to out-of-bounds array access.

actionfromafar · on Dec 13, 2023

Thanks! Very clear explanation!

So then it depends on if whatever code loaded the elf section does any validation of the data it reads. I can't help but thinking the whole code could use some structs and/or "getter setters" to talk dirty objective oriented speak. It needn't be that though, it could be as low level as some macros which helps doing the right thing with signedness and such. But my main impression is that a lot of the data structure semantics is kept in the heads of programmers instead of being formalised in the code.

In C there is always the opportunity to run with scissors in the middle of the road, but you can do a lot to protect yourself too, without loosing much, if any, performance.

lmm · on Dec 12, 2023

Imagine trying to write secure code in C by hand.

wolverine876 · on Dec 12, 2023

Kids today ... Anyway, how has it worked out for OpenBSD?

MyMonkeyBalls · on Dec 12, 2023

Yeah and imagine advertising yourself as the "most secure OS in the world" at the same time

starcraft2wol · on Dec 12, 2023

Yep, they implement kernel level security, rather than obsessing over language.

saagarjha · on Dec 13, 2023

Kernel level security doesn’t help if you have a heap overflow in the kernel itself.

gkbrk · on Dec 12, 2023

Anyone else with their track record?

MyMonkeyBalls · on Dec 12, 2023

Which track record exactly? Their slogan is known to be a complete lie

IntelMiner · on Dec 12, 2023

[citation needed]

MyMonkeyBalls · on Dec 12, 2023

Here are a lot of citations: https://isopenbsdsecu.re/quotes/

ectospheno · on Dec 12, 2023

Few groups confuse me more than the ones who vigorously argue I shouldn’t like an operating system.

SSLy · on Dec 12, 2023

> Two remote public holes in the default install, as in “unauthenticated remote code execution”. How many local privilege escalation? How many remote/local denial of service? How many remote code execution in software not present in the non-default install? How many private ones that were never disclosed? Other operating systems didn’t have many unauthenticated RCE either