An introduction to virtual memory

chc4 · on May 7, 2020

A few months ago one of my friends sent a picture of some disassembled x86 they didn't understand: they guessed it was something like `alloca` because it was modifying the stack pointer based on the function parameters, but before it did that it had a loop over the size in 0x1000 chunks. The body of the loop, however, was just `test [ecx], eax; cmp eax, 0x1000; jae ...`, which seems trivially like a no-op! `test` is purely to set flags, but `cmp` would overwrite the flags immediately after, making the conditional jump independent of whatever the `test` does.

But in reality, `test` has side effects since it's dereferencing ecx, which causes a memory page access, and potentially trap. Compilers insert what look like pointless loops ("stack probes") for `alloca` because if you try to stack allocate multiple pages worth of memory at once and then read only the later bytes, you can skip over stack guard pages that the kernel uses to page in memory on-demand or crash the program if it could cause the stack to clash into the heap.

It's a really nice example of the "there's always something further down" type thinking you need for low-level stuff imo. This was from reading x86 disassembly, which is an advanced topic that 90%(?) of programmers never have to care about because it's so "low level"...until you have to start caring about virtual memory, or your kernel implementation details.

CountSessine · on May 7, 2020

Holy-moley - you know I’d never thought very hard about how alloca would work if it skipped a stack guard. Here’s a question though - wouldn’t you have the same problem if you just had a really really big stack frame in your function? If you enter the function and then call another function before touching any locals, you could jump past stack guards pushing arguments on to the stack, no? Presumably the compiler needs to anticipate this and grab those pages like in the alloca case?

chc4 · on May 8, 2020

Yes, that will happen if you have a multiple page long stack frame. In practice that doesn't happen very often, and the compiler can only insert the probe it needs since the stack frame size is known statically without dynamic `alloca`.

Go had a horrible, horrible bug due to an interaction with something related to this (stack clash mitigation), signal stacks, and the kernel vDSO shared object. https://marcan.st/2017/12/debugging-an-evil-go-runtime-bug/

saagarjha · on May 8, 2020

Depends on the language! In C, yes, you can write right past the guard page: https://godbolt.org/z/HnnNPU. Rust will insert a stack probe when necessary: https://godbolt.org/z/N8mGrz

monocasa · on May 8, 2020

In C that compiler is allowed to stack probe. The compilers just make you opt into it with -fstack-check and the like to make sure they didn't break some of your terribly written code.

saagarjha · on May 8, 2020

That code is standards-compliant, because the standard has no knowledge of the stack. I actually had those flags enabled when I was messing around with it and they did nothing useful so I chose to leave them out. (Yes, the compiler will probe the stack–but only if you alloca, it seems. Nobody seems to have extended this protection to stack frames bloated by automatic variables.)

monocasa · on May 8, 2020

I couldn't get clang to, and gcc seems to be omitting the checks in some cases with your recursive call to foo (maybe it realizes that 'a' isn't actually accessed, but still emits the increase and decrease?). But gcc given a program that passes the address of 'a' into another function outside the compilation unit does emit stack probes with -fstack-check and -fstack-protector-all.

https://godbolt.org/z/jHwZiG

And I don't care if they didn't come out and say it in the spec. If your program depends on overflowing the stack into other regions of memory you need to be taken out into the street and publicly dealt with as an example to others. That being said, I'm sure there's something about implementation defined behavior around exceeding the platform's max for automatic storage duration objects. If not I may try to sneak that into C2X, lol.

OnlyOneCannolo · on May 8, 2020

Can you explain what's going on here?

saagarjha · on May 8, 2020

There's two pieces of code that's fairly similar in semantics: both allocate a bunch of memory on the stack by declaring a large array. In the C version I make an array that easily spans multiple pages and as such writing to the end of it would skip over any guard page. In the Rust version you can see that a __rust_probestack function appears right as I approach the page size, which internally does stack probing to make sure that the guard page isn't skipped.

woadwarrior01 · on May 7, 2020

No, because you subtract the space required for all the locals from the stack pointer (esp) in the function in the function prologue.

CountSessine · on May 8, 2020

That’s the whole point. If esp is now say 100k or so below bsp, but the stack expansion detection pages maybe don’t extend very far below, say 8k, and you call another function that requires pushing args at esp, presumably you’ll miss the stack expansion. Unless the compiler inserts intentional dead reads to force a stack expansion.

CountSessine · on May 8, 2020

s/bsp/ebp/

Silly autocorrect.

vincent-manis · on May 8, 2020

This was a good article. For my money, one of the best conceptual introductions to virtual memory was an article by Jeff Berryman, over 40 years ago. It's been reprinted many times, including at https://en.m.wikisource.org/wiki/The_Paging_Game.

Koshkin · on May 8, 2020

Curiously, you do not need memory virtualization if all you want is memory protection and process isolation: this can be achieved by a simpler piece of hardware that just ensures that a range of the upper bits of the memory address used by the process matches a certain pattern, or tag. This effectively partitions the physical RAM.

fuklief · on May 8, 2020

That sounds like a capability machine e.g., CHERI[1] It seems those might become relatively mainstream in a few years, as ARM seems to be jumping on board [2]

[1]: https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/ [2]: https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/cheri...

monocasa · on May 8, 2020

If anyone wants to play with this idea, you can use the base and bounds registers in the ARM-M and ARM-R specs to do this for one example.

t0350 · on May 8, 2020

You could also implement something likes software isolated process to have memory safety. However, the idea of virtual memory really fascinates me. Such an elegant layer and it brings you many other merits besides memory safety.