Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The problem is, VMs aren't really "Virtual Machines" anymore. You're not parsing opcodes in a big switch statement, you're running instructions on the actual CPU, with a few hardware flags that the CPU says will guarantee no data or instruction overlap. It promises! But that's a hard promise to make in reality.


This is because VM means two different things and has for a long time:

IBM's VM was and is a hypervisor. It dates to the mid 1960s, in the form of CP-40, and it didn't run opcodes in software, but in hardware.

https://en.wikipedia.org/wiki/IBM_CP-40

p-code machines, which interpret bytecode, date back almost as far, such as the O-code machine for BCPL.

https://en.wikipedia.org/wiki/BCPL

Getting people to distinguish between these concepts is probably a lost cause.


Looking at the IBM's tech from the sixties is somehow weirdly depressing: it's unbelievable how much of the architectural stuff they've invented already by the 1970.


Not depressing, but inspiring. So many great architectural ideas can be made accessible to millions of consumers, not limited to a few thousand megacorps.


I remember seeing VMware for the first time and thinking that the PC world had finally entered the 1970s.


Close, but not quite -- you can't nest the VMs the way you can on "big iron"


I know Nested Virtualisation is a thing on both KVM and hyper-v, what is different about what you could do on "big iron"


In the early days of virtualization on PCs (things like OS/2's dos box) the VM was 100% a weird special case VM that wasn't even running the same mode (virtual 8086 vs 286 / 386 mode), and that second-class functionality continued through the earlier iterations of "modern" systems (vmware / kvm / xen).

"PC" virtualization's getting closer to big iron virtualization, but likely will never quite get there.

Also -- I was running virtual machines on a 5150 PC when it was a big fast machine -- the UCSD P System ran a p-code virtual machine to run p-code binaries which would run equally well on an apple 2. In theory.


A VM nest in "big iron" isn't a special case. It's a context push with comparatively exhaustively defined costs, side effects, and implications.


IMO, it’s only a special case for commercial support reasons. Almost every engineer, QE, consultant, solution architect I know runs or has run nested virtualization for one reason or another.


And licensing - DB2 and Oracle.


So what might you say hasn't been brought in from the 80s yet?


> Getting people to distinguish between these concepts is probably a lost cause.

I think people here of all places should distinguish between these concepts.

There are big performance and security implications of the two approaches.


I don't think anyone has ever been confused because of the conflation of these two terms. The context typically makes it very clear.


> you're running instructions on the actual CPU

Just how many times is the average operating system workload (with or without a virtual machine also running a second average operating system workload) context switching a second?

Like... unless I'm wrong... the kernel is the main process, and then it slices up processes/threads, and each time those run, they have their own EAX/EBX/ECX/ESP/EBP/EIP/etc. (I know it's RAX, etc. for 64-bit now)

How many cycles is a thread/process given before it context switches to the next one? How is it managing all of the pushfd/popfd, etc. between them? Is this not how modern operating systems work, am I misunderstanding?


> How many cycles is a thread/process given before it context switches to the next one?

Depends on a lot of things. If it's a compute heavy task, and there's no I/O interrupts, the task gets one "timeslice", timeslices vary, but typical times are somewhere in the neighborhood of 1 ms to 100 ms. If it's an I/O heavy task, chances are the task returns from a syscall with new data to read (or because a write finished), does a little bit of work, then does another syscall with I/O. Lots of context switches in network heavy code (io_uring seems promising).

> How is it managing all of the pushfd/popfd, etc. between them?

The basic plan is when the kernel takes an interrupt (or gets a syscall, which is an interrupt on some systems and other mechanisms on others), the kernel (or the cpu) loads the kernel stack pointer for the current thread, then it pushes all the (relevant) cpu registers onto the stack, then the kernel business it taken care of, the scheduler decides which userspace thread to return to (which might be the same one that was interrupted or not), the destination thread's kernel stack is switched to, registers are popped, then the thread's userspace stack is switched to, then userspace execution resumes.


> Like... unless I'm wrong... the kernel is the main process,

A nice way of thinking about it is the kernel visualizes the CPU among multiple programs.

Great reading material on all this OS stuff: https://pages.cs.wisc.edu/~remzi/OSTEP/


Usually a few hundred to a few thousand times a second.


I've seen in the neighborhood of dozens of thousands of times on the high end

For anyone interested in seeing this on the nearest Linux box:

    vmstat -S M 1
Watch the 'cs' column go wild


What you're describing (switch statement) is emulation, not virtualization.


The big switch statement wouldn't necessarily protect you either.


Why do comments like this just make a bold claim and then wander off as if the claim stands for itself? No explanation. No insight. I mean why should we just take your word for it?

I'd like to be educated here why a big switch statement wouldn't necessarily protect us from these CPU vulnerabilities? Anyone willing to help?


The question should rather be: why would it protect you? This switch statement also runs on a CPU, which is still vulnerable. This CPU still speculates the execution of the switch statement. No amount of software will make hardware irrelevant.


You need certain instruction to exploit the vulnerability, if the switch statement doesn't use that then it is safe.


Hence my choice of phrasing: 'wouldn't necessarily protect you'.

So, yes, the switch statemement might be safe, but you would need to prove that your switch statement doesn't use those instructions. You don't get to claim that for free just because you are using a switch-statement.

Conversely, even if you execute bare metal instructions for the user of the VM, you could also deny those instructions to the user. Eg by not allowing self-modifying code, and statically making sure that the relevant code doesn't contain those instructions.

So the switch statement by itself does not do anything for your security.


Tangent: To deny those bare-metal instructions with static analysis, you might also have to flat out deny certain sequences of instructions that, when jumped to "unaligned" would also form the forbidden instruction. That might break innocent programs, no?


Simple: don't allow unaligned jumps. Google's NaCl already figured out how to do that ages ago. (Eg you could only allow jumps after a bit-masking operation. Details depends on architecture.)

But yes, unless you solve the halting problem, anything that bans all bad programs will also have false positives. It's the same with type systems in programming languages.


Isn’t the typical solution here to pin each VM to certain CPUs / cores?


That's not very convenient. I want docker to be able to use all my cores when I build an image.


Even if we pretend docker is a VM, building an image can happen on as many cores as you like in this hypothetical, it's the running of it that should be restricted.


Docker is not a VM. It uses the same kernel as the host


Only on Linux. On other systems it's a VM.


Thanks, that's what I meant.


That depends on the docker runtime.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: