There's no way an "enterprise grade" cloud vendor like AWS would allow co-tenanc...

nilptr · on Aug 24, 2023

> There's no way an "enterprise grade" cloud vendor like AWS would allow co-tenancy of containers (for ECS, Lambda etc) from different customers within a single VM - it's the reason Firecracker exists.

I won't speak for AWS, but your assumption about what "enterprise grade" cloud vendors do is dead wrong. I know, because I'm working on maintaining one of these systems.

lima · on Aug 24, 2023

Lots of enterprise grade cloud vendors trust the Linux kernel boundary WAY too much...

jen20 · on Aug 24, 2023

“Enterprise grade” deserves scare quotes for those people of course!

abwizz · on Aug 25, 2023

i read it like "military grade" meaning it's on the side of over-provisioned/-engineered and will not break in obvious ways

monocasa · on Aug 24, 2023

Does google? I know they use gvisor in production, which is ultimately enforced by a normal kernel (with a ton of sandboxing on top of it).

eddythompson80 · on Aug 24, 2023

Google is moving away from gvisor as well.

The "process sandbox" wars are over. Everybody lost, hypervisors won. That's it. It feels incredibly wasteful after all. Hypervisors don't share mm, scheduler, etc. It's a lot of wasted resources. Google came in with gvisor at the last minute to try to say "no, sandboxes aren't dead. Look at our approach with gvisor". They lost too and are now moving away from it.

Rapzid · on Aug 24, 2023

Really? Has gvisor ever been popped? Has there ever even been a single high-profile compromise caused by a container escape? Shared hosting was a thing and considered "safe enough" for decades and that's all process isolation.

Can't help but feel the security concerns are overblown. To support my claim; Well, Google IS using gvisor as part of their GKE sandboxing security..

tptacek · on Aug 24, 2023

I don't know what "popped" means here, but so far as I know there's never been a major incident caused by a flaw in gvisor. But gvisor is a much more intricate and carefully controlled system than standard Linux containers. Obviously, there have been tons of container escape compromises.

eddythompson80 · on Aug 24, 2023

shared this link on another reply, but google moved away from gvisor to hypervisor for cloud run. It won't be long before they do for GKE as well

https://cloud.google.com/blog/products/serverless/cloud-run-...

mikehotel · on Aug 25, 2023

It doesn’t look like the moved away from gVisor due to security reasons. “We were able to achieve these improvements because the second generation execution environment is based on a micro VM. This means that unlike the first generation execution environment, which uses gVisor, a container running in the second generation execution environment has access to a full Linux kernel.”

eddythompson80 · on Aug 25, 2023

The reason you go with process isolation over VM isolation is performance. If you share a kernel, you share memory managers and pages, scheduler, limits, groups, etc. If you get better performance running VMs vs running processes, then what was even your isolation layer for?

But at the end of the day, there is a line in the sand around hypervisors vs proc/kernel isolation models. I challenge you to go to a financial or medical institute and tell their CTO "yeah, we have this super bullet proof shared-kernel-inproc isolation model"

The first question you'd get is "Why is this not just part of upstream linux?" Answer that question and realize why you should just use a hypervisor.

jsolson · on Aug 25, 2023

Note that GKE Sandbox allows GKE users to sandbox the workloads running on GKE nodes. The GKE nodes themselves are still GCE VMs.

RainbowFriends · on Aug 24, 2023

Citation needed. gvisor seems to be under active development and just added support for the systrap platform, deprecating ptrace: https://gvisor.dev/blog/2023/04/28/systrap-release/

eddythompson80 · on Aug 24, 2023

Cloud run has abandoned gvisor in their "second generation" execution environment for containers

https://cloud.google.com/blog/products/serverless/cloud-run-...

Obviously there might be many reasons for that, but as someone who worked on a similar gvisor tech for another company, it's dead in the water. No security expert or consultant will ever sign off on a process isolation model. Despite of architecture, audits, reviews, etc. There is just too much surface area for anyone to feel comfortable signing off on hostile multi-tenants with process isolation regardless of the sandboxing tech.

Not saying that there are no bugs in hypervisors, but the surface area is so so much smaller.

coryrc · on Aug 25, 2023

The first sentence pretty much sums it up: "Cloud Run’s new execution environment provides increased CPU and network performance and lets you mount network file systems." It's not a secret that performance is slower under gvisor and there are compatibility issues: https://gvisor.dev/docs/architecture_guide/performance/

Disclaimer: I work on this product but wasn't involved in this decision.

tptacek · on Aug 25, 2023

gvisor isn't simply a process isolation model. Security experts will certainly sign off on gvisor for some multitenant workloads. The reason Google is moving from it, to the extent they are, is that hypervisors are more performant for more common workloads.

yencabulator · on Aug 25, 2023

I read "we got tired of reimplementing Linux kernel syscalls and functionality" as the reason. Like network file systems. The Cloud Run client base kept asking for more and more features, and they punted to just running the Linux kernel.

lima · on Aug 24, 2023

> Google is moving away from gvisor as well.

I've been wondering about this - are they really?

beardedwizard · on Aug 24, 2023

I have seen zero evidence of this; but if it's true I would love to learn more. The real action is in side channel vulnerabilities bypassing all manner of protections.

eddythompson80 · on Aug 24, 2023

see

https://cloud.google.com/blog/products/serverless/cloud-run-...

beardedwizard · on Aug 25, 2023

But this is because the workloads they execute changed, right? Http only before, to more general code today. I didn't see anything there that said gvisor was inferior, only that a new requirement was full kernel api access. For latency sensitive ephemeral and constrained workloads gvisor/seccomp can make a lot of sense and in the case of google handle multi-tenancy.

Now if workloads become less ephemeral and more general purpose, tolerance for startup latency goes up, annd probability of bespoke needs goes up making VM more palatable.

pjmlp · on Aug 24, 2023

In a way, it feels like a sweet revenge for microkernels.

intelVISA · on Aug 25, 2023

Tbf gvisor was pretty much DOA by design. Hypervisors are alright, but nowadays security expectations go much lower than ring-1.

tptacek · on Aug 25, 2023

Can you expand on this? What do you mean "security expectations go lower than ring-1", and how does that relate to gvisor?

pjmlp · on Aug 25, 2023

For example what Microsoft is doing at firmware level for Azure.

monocasa · on Aug 25, 2023

What is Microsoft doing at the firmware level for azure?

pjmlp · on Aug 25, 2023

For example,

https://techcommunity.microsoft.com/t5/microsoft-defender-fo...

https://techcommunity.microsoft.com/t5/azure-infrastructure-...

yencabulator · on Aug 25, 2023

gVisor uses KVM or ptrace as its sandbox layer, and there's some indications that Google's internal fork uses an unpublished kernel mechanism, perhaps by extending seccomp (EDIT: It seems this has made its way to the outside world since I last looked. `systrap` is now default: https://gvisor.dev/docs/architecture_guide/platforms/ ). It's fake-kernel-in-userspace then sandboxed by seccomp.

Saying gVisor is "ultimately enforced by a normal kernel" is about as misleading & accurate as "KVM is enforced by a normal kernel" -- it is, but it's a very narrow boundary, not the usual syscall ABI.

sharts · on Aug 25, 2023

I think bryan cantrill founded a company (joyent? or triton?) to do just that several years ago. It may have been based on solaris/smartos zones which is that exact use case w/ very secure/isolated containers.

abwizz · on Aug 25, 2023

althou it came with linux binary compat (of unknown quality) i think the solaris thing was just too off putting for most customers and the company did not do very well

NexRebular · on Aug 25, 2023

Triton is now being developed by MNX Solutions and seems to be doing quite well.

We run Triton and SmartOS in production and the linux compatibility works via lx-zones just fine. Only some of the linux-locked software, which usually means docker, needs to go inside a bhyve VM.

basique · on Aug 24, 2023

Aren't Cloudflare Workers multitenant? Although, if you want to be cynical, that could be a reason they aren't 'enterprise grade™'.

tyingq · on Aug 24, 2023

They are using v8 isolates, which is maybe easier to do in a sound way than the whole broad space of containers. Previous discussion: https://news.ycombinator.com/item?id=31740885

rewmie · on Aug 24, 2023

> There's no way an "enterprise grade" cloud vendor like AWS would allow co-tenancy of containers (...)

I don't think your beliefs are well founded. AWS's EC2 by default only supoprts shared tenancy, and dedicated instances are a premium service.

tptacek · on Aug 24, 2023

I take them to mean shared kernels.

cthalupa · on Aug 25, 2023

But the parent specifically called out co-tenancy of /containers/. EC2 instances are not containers.

mochomocha · on Aug 24, 2023

Both Lambda firecracker VMs and t2 instances are multi-tenant and oversubscribed.

tptacek · on Aug 24, 2023

I take them to mean "multiple tenants sharing a kernel"; I think everyone understands that AWS and GCP have multitenant hypervisor hosts.