The problem is, VMs aren't really "Virtual Machines" anymore. You're not parsing...

msla · on July 24, 2023

This is because VM means two different things and has for a long time:

IBM's VM was and is a hypervisor. It dates to the mid 1960s, in the form of CP-40, and it didn't run opcodes in software, but in hardware.

https://en.wikipedia.org/wiki/IBM_CP-40

p-code machines, which interpret bytecode, date back almost as far, such as the O-code machine for BCPL.

https://en.wikipedia.org/wiki/BCPL

Getting people to distinguish between these concepts is probably a lost cause.

Joker_vD · on July 24, 2023

Looking at the IBM's tech from the sixties is somehow weirdly depressing: it's unbelievable how much of the architectural stuff they've invented already by the 1970.

nine_k · on July 24, 2023

Not depressing, but inspiring. So many great architectural ideas can be made accessible to millions of consumers, not limited to a few thousand megacorps.

meepmorp · on July 24, 2023

I remember seeing VMware for the first time and thinking that the PC world had finally entered the 1970s.

cduzz · on July 25, 2023

Close, but not quite -- you can't nest the VMs the way you can on "big iron"

jlawer · on July 25, 2023

I know Nested Virtualisation is a thing on both KVM and hyper-v, what is different about what you could do on "big iron"

cduzz · on July 25, 2023

In the early days of virtualization on PCs (things like OS/2's dos box) the VM was 100% a weird special case VM that wasn't even running the same mode (virtual 8086 vs 286 / 386 mode), and that second-class functionality continued through the earlier iterations of "modern" systems (vmware / kvm / xen).

"PC" virtualization's getting closer to big iron virtualization, but likely will never quite get there.

Also -- I was running virtual machines on a 5150 PC when it was a big fast machine -- the UCSD P System ran a p-code virtual machine to run p-code binaries which would run equally well on an apple 2. In theory.

vetrom · on July 25, 2023

A VM nest in "big iron" isn't a special case. It's a context push with comparatively exhaustively defined costs, side effects, and implications.

oso2k · on July 25, 2023

IMO, it’s only a special case for commercial support reasons. Almost every engineer, QE, consultant, solution architect I know runs or has run nested virtualization for one reason or another.

angled · on July 25, 2023

And licensing - DB2 and Oracle.

flenserboy · on July 25, 2023

So what might you say hasn't been brought in from the 80s yet?

mr_toad · on July 25, 2023

> Getting people to distinguish between these concepts is probably a lost cause.

I think people here of all places should distinguish between these concepts.

There are big performance and security implications of the two approaches.

insanitybit · on July 25, 2023

I don't think anyone has ever been confused because of the conflation of these two terms. The context typically makes it very clear.

MuffinFlavored · on July 24, 2023

> you're running instructions on the actual CPU

Just how many times is the average operating system workload (with or without a virtual machine also running a second average operating system workload) context switching a second?

Like... unless I'm wrong... the kernel is the main process, and then it slices up processes/threads, and each time those run, they have their own EAX/EBX/ECX/ESP/EBP/EIP/etc. (I know it's RAX, etc. for 64-bit now)

How many cycles is a thread/process given before it context switches to the next one? How is it managing all of the pushfd/popfd, etc. between them? Is this not how modern operating systems work, am I misunderstanding?

toast0 · on July 24, 2023

> How many cycles is a thread/process given before it context switches to the next one?

Depends on a lot of things. If it's a compute heavy task, and there's no I/O interrupts, the task gets one "timeslice", timeslices vary, but typical times are somewhere in the neighborhood of 1 ms to 100 ms. If it's an I/O heavy task, chances are the task returns from a syscall with new data to read (or because a write finished), does a little bit of work, then does another syscall with I/O. Lots of context switches in network heavy code (io_uring seems promising).

> How is it managing all of the pushfd/popfd, etc. between them?

The basic plan is when the kernel takes an interrupt (or gets a syscall, which is an interrupt on some systems and other mechanisms on others), the kernel (or the cpu) loads the kernel stack pointer for the current thread, then it pushes all the (relevant) cpu registers onto the stack, then the kernel business it taken care of, the scheduler decides which userspace thread to return to (which might be the same one that was interrupted or not), the destination thread's kernel stack is switched to, registers are popped, then the thread's userspace stack is switched to, then userspace execution resumes.

MisterTea · on July 25, 2023

> Like... unless I'm wrong... the kernel is the main process,

A nice way of thinking about it is the kernel visualizes the CPU among multiple programs.

Great reading material on all this OS stuff: https://pages.cs.wisc.edu/~remzi/OSTEP/

saagarjha · on July 24, 2023

Usually a few hundred to a few thousand times a second.

bravetraveler · on July 26, 2023

I've seen in the neighborhood of dozens of thousands of times on the high end

For anyone interested in seeing this on the nearest Linux box:

    vmstat -S M 1

Watch the 'cs' column go wild

tenebrisalietum · on July 25, 2023

What you're describing (switch statement) is emulation, not virtualization.

eru · on July 24, 2023

The big switch statement wouldn't necessarily protect you either.

distcs · on July 25, 2023

Why do comments like this just make a bold claim and then wander off as if the claim stands for itself? No explanation. No insight. I mean why should we just take your word for it?

I'd like to be educated here why a big switch statement wouldn't necessarily protect us from these CPU vulnerabilities? Anyone willing to help?

GrumpySloth · on July 25, 2023

The question should rather be: why would it protect you? This switch statement also runs on a CPU, which is still vulnerable. This CPU still speculates the execution of the switch statement. No amount of software will make hardware irrelevant.

etmmte · on July 26, 2023

You need certain instruction to exploit the vulnerability, if the switch statement doesn't use that then it is safe.

eru · on July 26, 2023

Hence my choice of phrasing: 'wouldn't necessarily protect you'.

So, yes, the switch statemement might be safe, but you would need to prove that your switch statement doesn't use those instructions. You don't get to claim that for free just because you are using a switch-statement.

Conversely, even if you execute bare metal instructions for the user of the VM, you could also deny those instructions to the user. Eg by not allowing self-modifying code, and statically making sure that the relevant code doesn't contain those instructions.

So the switch statement by itself does not do anything for your security.

MauranKilom · on July 26, 2023

Tangent: To deny those bare-metal instructions with static analysis, you might also have to flat out deny certain sequences of instructions that, when jumped to "unaligned" would also form the forbidden instruction. That might break innocent programs, no?

eru · on July 27, 2023

Simple: don't allow unaligned jumps. Google's NaCl already figured out how to do that ages ago. (Eg you could only allow jumps after a bit-masking operation. Details depends on architecture.)

But yes, unless you solve the halting problem, anything that bans all bad programs will also have false positives. It's the same with type systems in programming languages.

stingraycharles · on July 25, 2023

Isn’t the typical solution here to pin each VM to certain CPUs / cores?

tempaccount420 · on July 25, 2023

That's not very convenient. I want docker to be able to use all my cores when I build an image.

tvink · on July 25, 2023

Even if we pretend docker is a VM, building an image can happen on as many cores as you like in this hypothetical, it's the running of it that should be restricted.

petters · on July 25, 2023

Docker is not a VM. It uses the same kernel as the host

t-3 · on July 25, 2023

Only on Linux. On other systems it's a VM.

tempaccount420 · on July 25, 2023

Thanks, that's what I meant.

cbzbc · on July 25, 2023

That depends on the docker runtime.