RIP ROP: CET Internals in Windows 20H1

schoen · on July 21, 2020

We've been through several generations of exploit mitigations starting with non-executable stacks, and, impressively, exploit developers found workarounds for each of them (although often the particular workarounds have requirements that might not be met in a particular vulnerability environment). In many cases I had the impression that the workarounds were surprising to the mitigation developers because the latter had expressed a lot of confidence that software security was about to make a huge leap and memory safety violations would rarely be exploitable anymore.

What are the prospects for finding workarounds to CET too?

(I don't mean to argue that there's no benefit to these mitigations or that some of them might not eventually finally stop whole classes of vulnerabilities. But I feel like their track record is not nearly as awesome as their inventors anticipated, so I wonder what informed opinion is on the eventual relevance or irrelevance of this one. Notably, the "RIP ROP" seems like a somewhat ambitious claim to mitigate a large amount of attack potential; how well-justified is it?)

staticassertion · on July 22, 2020

It's not totally true that attackers have found ways bypass mitigations. In many cases the attackers require entirely new capabilities, or separate vulnerabilities, to get around a mitigation. And in some cases it makes an attack statistically unlikely, even if it may succeed.

Lots of mitigations do suck, and are a huge waste of time and effort, but quite a few are very significant. It's very rare that a bypass is a surprised except when the mitigation is poorly thought out. Good mitigations start with a threat model.

An example is ASLR. Once in a while ASLR is pronounced 'dead' because an attacker with capabilities that ASLR does not try to defend against can bypass ASLR. For example, the attacker has arbitrary compute on the system with ASLR. No one who built ASLR was surprised by this.

saagarjha · on July 22, 2020

It's kind of easy to say "ASLR is not resistant to infoleaks" but it's really quite another thing if the infoleak ends up coming from something like a microarchitectural sidechannel or even an undocumented proprietary processor extension, rather than "hurr durr here is a slid pointer, we hand out these out like candy".

staticassertion · on July 22, 2020

My point is that ASLR makes assumptions about the attackers capabilities. All good mitigations do. If the attacker has "I can run nearly arbitrary computations on within the processes memory space", that is outside of the model that ASLR attempts to deal with.

For example, ASLR for a process that executes Javascript is probably not going to be as useful as ASLR for a process that receives network requests.

saagarjha · on July 22, 2020

How so? As far as I understand, being able to leak an ASLR slide from JavaScript is considered to be a security bug in every browser engine, because they do not intentionally provide access to that information.

staticassertion · on July 22, 2020

Whether the browser intends to provide that information or not, ASLR was not designed as a control against an attacker who can run near arbitrary code in the process.

muricula · on July 21, 2020

There will probably be defeats and accidental gadgets for the first couple releases, but this hardware technology has the potential to be better than any software based mitigation with the same goals. Here's a paper with a more detailed security analysis: https://sci-hub.tw/10.1145/3337167.3337175

pjmlp · on July 22, 2020

Incredible how C and C++ have managed to keep the security industry busy.

Not that other languages don't have logical errors that might lead to security exploits, but in what concerns memory corruption exploits, the mitigation list just keeps pilling up.

Looking forward to how long ARM hardware memory tagging will hold on, after Intel's MPX failure.

At least so far Solaris SPARC ADI seems to hold on.

saagarjha · on July 22, 2020

MPX was totally slow and cumbersome, ARM tagging should be much nicer as will be CHERI when it actually comes out.

pjmlp · on July 22, 2020

Sure, it doesn't change the fact that Intel borked it and is yet to provide a better alternative.

pabs3 · on July 22, 2020

grsec seems to think CET won't be effective:

https://grsecurity.net/effectiveness_of_intel_cet_against_co...

mjg59 · on July 22, 2020

Most of the negative commentary us around the protection of the forward edges, while this blog is about the backward edges. The grsec post notes that implementing full support for the backward edges involves handling a number of special cases, but doesn't criticise its effectiveness if that work is done.

rwmj · on July 22, 2020

Fedora has been implementing this for a while. It will finally be enabled in Fedora 33 later this year: https://bugzilla.redhat.com/show_bug.cgi?id=1802674#c3

Of course you'll need a TigerLake chip for it to do anything. Are those even released yet?

muska3 · on July 21, 2020

TLDR? Why does this matter?

"As a reminder, Intel CET is a hardware-based mitigation that addresses the two types of control-flow integrity violations commonly used by exploits: forward-edge violations (indirect CALL and JMP instructions) and backward-edge violations (RET instructions). "

Why are these important

staticassertion · on July 21, 2020

It's a mitigation for a software exploitation technique called Return Oriented Programming (ROP). The mitigation is referred to as 'Control Flow Integrity' (CFI).

https://software.intel.com/content/www/us/en/develop/article...

Essentially an attacker who has the ability to exploit the first stage of a vulnerability will be able to stitch together "gadgets" from the program to build up a second stage of the exploit.

Control flow integrity, to my understanding, applies a validation or restriction of the program's call graph. This limits the attackers ability to just stitch up their own arbitrary call graph. There are 'forward edge' protections (calling a function) and 'reverse edge' protections (ret). But of course there are more ways to control the flow of a program, as this document discusses - like longjmp.

I won't try to get more detailed as I'm not an expert. Hopefully this will help you find more information.

jnwatson · on July 21, 2020

I'll add on since this is the most informative post so far (and I've written a static binary re-writer to add shadow stack protection to an existing binary).

A shadow stack is a limited subset of the call stack that only stores return addresses. In normal operation, Every time your compiled program makes a function call, it stores the return address on the main call stack (modulo certain compiler optimizations) so that when the called function returns, your program can resume executing directly after the point at which it called the function.

With a shadow stack, when a function is called, the return address is copied to a separate "shadow" stack as well as the call stack. When the called function returns, the return address on the two stacks are compared and the program fails if they are different.

In new Intel microprocessors, the shadow stack is implemented in hardware. The numerous corner cases require software support that the article describes.

codys · on July 21, 2020

Can you provide some details on your binary rewriter to add shadow stack support? Was this a pure software approach, or was it designed to take advantage of the support in new intel microprocessors? Do you have a write up of or can you give a quick overview of your methodology? Is the source code published somewhere?

jnwatson · on July 22, 2020

No, it was proprietary code, and it wasn't for an Intel processor. It was a pure software approach, but the particular (embedded) environment made it harder to attack the shadow stack itself.

I had a pretty cool optimization that I don't think anyone's figured out yet. Oh well. That's the downside of software-as-trade-secrets.

saagarjha · on July 22, 2020

Can you name it, so we can find it ourselves? ;)

staticassertion · on July 21, 2020

Oh yeah, I completely glossed over the whole shadow/safestack mitigation. Thanks.

pedro2 · on July 21, 2020

IDK, but it's because security.

Judging by the title, it helps avoiding ROP: "Return-oriented programming is a computer security exploit technique that allows an attacker to execute code in the presence of security defenses such as executable space protection and code signing." (Wikipedia)

mettamage · on July 21, 2020

Using ROP techniques in a binary bypasses a lot of stuff such as ASLR, canaries and even DEP (I think...).

I’ve seen ROP exploitation in binaries and is pretty handy when there is no other way to get a setuid binary to give you a shell as root.

Watch Rope from ippsec on YT (on my phone atm).

dg246 · on July 21, 2020

ROP does not bypass ASLR or canaries. It does bypass DEP/NX in the sense that it executes code that already exists in executable memory.

OminousWeapons · on July 21, 2020

Agree re: canaries, but when I learned about ROP I was told that ASLR typically is not employed on the text segment (due to lack of position independence) which is why ROP effectively acts a bypass for ASLR on the stack / heap and why we need things like control flow enforcement. Is this not the case or no longer the case?

archgoon · on July 21, 2020

Gcc these days compiles with -pie (Position Independent Executable) by default. This makes the text section position independent and able to be relocated, like a shared library.

You are correct that the main TEXT section used to typically not be position independent.

saagarjha · on July 22, 2020

Not only is the text segment relocatable, the entire binary is generally compiled as ET_DYN so it is a shared library in a sense.

barrkel · on July 21, 2020

Windows uses relocations, not PIC, to enable different load addresses. That means the image in memory has its self references patched by adding the difference between compiled in load address and runtime load address. System DLLs can still share code with one another as long as they share the same load address in different processes for that reboot of the operating system.

Historically EXEs were either linked without relocations or had relocations stripped. They were always loaded first so ended up where they wanted, no relocation necessary. But /dynamicbase flag to linker opts in to setting a bit in the PE header and retaining relocations, so the EXE can be loaded elsewhere.

TL;DR: Windows supports ASLR on both executables and dynamic libraries.

loeg · on July 21, 2020

ASLRing text segments is optional, but possible. There are negative performance tradeoffs.

barrkel · on July 21, 2020

There are limitations in sharing text segments in different processes, and extra memory usage mapping in text segments so relocations can be, uh, relocated, but once running there is no additional performance overhead. The runtime image is effectively "self-modified code" by the OS loader, patched to final addresses. No PIC register, no indirect references.

loeg · on July 22, 2020

Right. You've described the performance impact: a ton of relocations hurts startup time, and the modified memory reduces sharing, costing additional memory. If startup time does not matter and memory is free, sure, it has no costs.

barrkel · on July 22, 2020

Yeah; a linear scan over what, possibly as many as a couple of megabytes. I bet that takes a while.

(Seriously? :-)

haecceity · on July 21, 2020

TLDR ROP is common technique for making programs do bad things so this prevents a whole bunch of bad things from happening.

WrathOfJay · on July 21, 2020

And how about performance impact? The mitigation's that have been done in software recently came with an ugly performance cost (just not as ugly as the vulnerability). Is there any speculation about what this is going to cost?

muricula · on July 21, 2020

I haven't read the whole paper, but this has a section on performance and a security analysis of the hardware feature: https://sci-hub.tw/10.1145/3337167.3337175

CET has two parts, a forwards edge protection (indirect jumps and calls like those necessary to execute a C++ virtual function, a Go interface function, or a Rust trait function), and a backwards edge protection to protect against Return Oriented Programming (overwriting the return address to attacker chosen code).

If I recall correctly, Windows will only use the backwards edge protection, since they already have a superior technology for forwards edge protection (CFG and XFG). The backwards edge protection has an impact of 1.65% according to that paper. Forwards edge protection had no impact.

I have to say, this is well worth it.

exikyut · on July 22, 2020

An aside that I was curious about: "20H1"?

I found https://blogs.windows.com/windowsexperience/2020/06/16/whats...:

> Windows 10, version 20H2 is, therefore, “20H2” because it will be released in the second half of the 2020 calendar year.

So 20H1 is 2020 1st half then. And Windows now has biannual rolling release? Nice.

stellersjay · on July 22, 2020

Exploit devs love a good challenge. Each mitigation is another lessons learned :)

saagarjha · on July 22, 2020

I would argue that the learning is circular, as exploit development learns from mitigations just as much as mitigation development is informed by exploits ;)

JaimeThompson · on July 22, 2020

Does anyone know when AMD will have equivalent support?

hoseja · on July 22, 2020

Does AMD have equivalent vulns?

saagarjha · on July 22, 2020

This isn’t an Intel or AMD thing, or even an x86_64 thing, it’s an intrinsic part of any system that mixes return addresses and data on a stack.

pjmlp · on July 22, 2020

If it runs code compiled from C or C++ most likely, given that ROP as usually used alongside memory corruption bugs.