The problem is, VMs aren't really "Virtual Machines" anymore. You're not parsing opcodes in a big switch statement, you're running instructions on the actual CPU, with a few hardware flags that the CPU says will guarantee no data or instruction overlap. It promises! But that's a hard promise to make in reality.
Looking at the IBM's tech from the sixties is somehow weirdly depressing: it's unbelievable how much of the architectural stuff they've invented already by the 1970.
Not depressing, but inspiring. So many great architectural ideas can be made accessible to millions of consumers, not limited to a few thousand megacorps.
In the early days of virtualization on PCs (things like OS/2's dos box) the VM was 100% a weird special case VM that wasn't even running the same mode (virtual 8086 vs 286 / 386 mode), and that second-class functionality continued through the earlier iterations of "modern" systems (vmware / kvm / xen).
"PC" virtualization's getting closer to big iron virtualization, but likely will never quite get there.
Also -- I was running virtual machines on a 5150 PC when it was a big fast machine -- the UCSD P System ran a p-code virtual machine to run p-code binaries which would run equally well on an apple 2. In theory.
IMO, it’s only a special case for commercial support reasons. Almost every engineer, QE, consultant, solution architect I know runs or has run nested virtualization for one reason or another.
Just how many times is the average operating system workload (with or without a virtual machine also running a second average operating system workload) context switching a second?
Like... unless I'm wrong... the kernel is the main process, and then it slices up processes/threads, and each time those run, they have their own EAX/EBX/ECX/ESP/EBP/EIP/etc. (I know it's RAX, etc. for 64-bit now)
How many cycles is a thread/process given before it context switches to the next one? How is it managing all of the pushfd/popfd, etc. between them? Is this not how modern operating systems work, am I misunderstanding?
> How many cycles is a thread/process given before it context switches to the next one?
Depends on a lot of things. If it's a compute heavy task, and there's no I/O interrupts, the task gets one "timeslice", timeslices vary, but typical times are somewhere in the neighborhood of 1 ms to 100 ms. If it's an I/O heavy task, chances are the task returns from a syscall with new data to read (or because a write finished), does a little bit of work, then does another syscall with I/O. Lots of context switches in network heavy code (io_uring seems promising).
> How is it managing all of the pushfd/popfd, etc. between them?
The basic plan is when the kernel takes an interrupt (or gets a syscall, which is an interrupt on some systems and other mechanisms on others), the kernel (or the cpu) loads the kernel stack pointer for the current thread, then it pushes all the (relevant) cpu registers onto the stack, then the kernel business it taken care of, the scheduler decides which userspace thread to return to (which might be the same one that was interrupted or not), the destination thread's kernel stack is switched to, registers are popped, then the thread's userspace stack is switched to, then userspace execution resumes.
Why do comments like this just make a bold claim and then wander off as if the claim stands for itself? No explanation. No insight. I mean why should we just take your word for it?
I'd like to be educated here why a big switch statement wouldn't necessarily protect us from these CPU vulnerabilities? Anyone willing to help?
The question should rather be: why would it protect you? This switch statement also runs on a CPU, which is still vulnerable. This CPU still speculates the execution of the switch statement. No amount of software will make hardware irrelevant.
Hence my choice of phrasing: 'wouldn't necessarily protect you'.
So, yes, the switch statemement might be safe, but you would need to prove that your switch statement doesn't use those instructions. You don't get to claim that for free just because you are using a switch-statement.
Conversely, even if you execute bare metal instructions for the user of the VM, you could also deny those instructions to the user. Eg by not allowing self-modifying code, and statically making sure that the relevant code doesn't contain those instructions.
So the switch statement by itself does not do anything for your security.
Tangent: To deny those bare-metal instructions with static analysis, you might also have to flat out deny certain sequences of instructions that, when jumped to "unaligned" would also form the forbidden instruction. That might break innocent programs, no?
Simple: don't allow unaligned jumps. Google's NaCl already figured out how to do that ages ago. (Eg you could only allow jumps after a bit-masking operation. Details depends on architecture.)
But yes, unless you solve the halting problem, anything that bans all bad programs will also have false positives. It's the same with type systems in programming languages.
Even if we pretend docker is a VM, building an image can happen on as many cores as you like in this hypothetical, it's the running of it that should be restricted.