Hacker News new | past | comments | ask | show | jobs | submit login

There's no way an "enterprise grade" cloud vendor like AWS would allow co-tenancy of containers (for ECS, Lambda etc) from different customers within a single VM - it's the reason Firecracker exists.



> There's no way an "enterprise grade" cloud vendor like AWS would allow co-tenancy of containers (for ECS, Lambda etc) from different customers within a single VM - it's the reason Firecracker exists.

I won't speak for AWS, but your assumption about what "enterprise grade" cloud vendors do is dead wrong. I know, because I'm working on maintaining one of these systems.


Lots of enterprise grade cloud vendors trust the Linux kernel boundary WAY too much...


“Enterprise grade” deserves scare quotes for those people of course!


i read it like "military grade" meaning it's on the side of over-provisioned/-engineered and will not break in obvious ways


Does google? I know they use gvisor in production, which is ultimately enforced by a normal kernel (with a ton of sandboxing on top of it).


Google is moving away from gvisor as well.

The "process sandbox" wars are over. Everybody lost, hypervisors won. That's it. It feels incredibly wasteful after all. Hypervisors don't share mm, scheduler, etc. It's a lot of wasted resources. Google came in with gvisor at the last minute to try to say "no, sandboxes aren't dead. Look at our approach with gvisor". They lost too and are now moving away from it.


Really? Has gvisor ever been popped? Has there ever even been a single high-profile compromise caused by a container escape? Shared hosting was a thing and considered "safe enough" for decades and that's all process isolation.

Can't help but feel the security concerns are overblown. To support my claim; Well, Google IS using gvisor as part of their GKE sandboxing security..


I don't know what "popped" means here, but so far as I know there's never been a major incident caused by a flaw in gvisor. But gvisor is a much more intricate and carefully controlled system than standard Linux containers. Obviously, there have been tons of container escape compromises.


shared this link on another reply, but google moved away from gvisor to hypervisor for cloud run. It won't be long before they do for GKE as well

https://cloud.google.com/blog/products/serverless/cloud-run-...


It doesn’t look like the moved away from gVisor due to security reasons. “We were able to achieve these improvements because the second generation execution environment is based on a micro VM. This means that unlike the first generation execution environment, which uses gVisor, a container running in the second generation execution environment has access to a full Linux kernel.”


The reason you go with process isolation over VM isolation is performance. If you share a kernel, you share memory managers and pages, scheduler, limits, groups, etc. If you get better performance running VMs vs running processes, then what was even your isolation layer for?

But at the end of the day, there is a line in the sand around hypervisors vs proc/kernel isolation models. I challenge you to go to a financial or medical institute and tell their CTO "yeah, we have this super bullet proof shared-kernel-inproc isolation model"

The first question you'd get is "Why is this not just part of upstream linux?" Answer that question and realize why you should just use a hypervisor.


Note that GKE Sandbox allows GKE users to sandbox the workloads running on GKE nodes. The GKE nodes themselves are still GCE VMs.


Citation needed. gvisor seems to be under active development and just added support for the systrap platform, deprecating ptrace: https://gvisor.dev/blog/2023/04/28/systrap-release/


Cloud run has abandoned gvisor in their "second generation" execution environment for containers

https://cloud.google.com/blog/products/serverless/cloud-run-...

Obviously there might be many reasons for that, but as someone who worked on a similar gvisor tech for another company, it's dead in the water. No security expert or consultant will ever sign off on a process isolation model. Despite of architecture, audits, reviews, etc. There is just too much surface area for anyone to feel comfortable signing off on hostile multi-tenants with process isolation regardless of the sandboxing tech.

Not saying that there are no bugs in hypervisors, but the surface area is so so much smaller.


The first sentence pretty much sums it up: "Cloud Run’s new execution environment provides increased CPU and network performance and lets you mount network file systems." It's not a secret that performance is slower under gvisor and there are compatibility issues: https://gvisor.dev/docs/architecture_guide/performance/

Disclaimer: I work on this product but wasn't involved in this decision.


gvisor isn't simply a process isolation model. Security experts will certainly sign off on gvisor for some multitenant workloads. The reason Google is moving from it, to the extent they are, is that hypervisors are more performant for more common workloads.


I read "we got tired of reimplementing Linux kernel syscalls and functionality" as the reason. Like network file systems. The Cloud Run client base kept asking for more and more features, and they punted to just running the Linux kernel.


> Google is moving away from gvisor as well.

I've been wondering about this - are they really?


I have seen zero evidence of this; but if it's true I would love to learn more. The real action is in side channel vulnerabilities bypassing all manner of protections.



But this is because the workloads they execute changed, right? Http only before, to more general code today. I didn't see anything there that said gvisor was inferior, only that a new requirement was full kernel api access. For latency sensitive ephemeral and constrained workloads gvisor/seccomp can make a lot of sense and in the case of google handle multi-tenancy.

Now if workloads become less ephemeral and more general purpose, tolerance for startup latency goes up, annd probability of bespoke needs goes up making VM more palatable.


In a way, it feels like a sweet revenge for microkernels.


Tbf gvisor was pretty much DOA by design. Hypervisors are alright, but nowadays security expectations go much lower than ring-1.


Can you expand on this? What do you mean "security expectations go lower than ring-1", and how does that relate to gvisor?


For example what Microsoft is doing at firmware level for Azure.


What is Microsoft doing at the firmware level for azure?



gVisor uses KVM or ptrace as its sandbox layer, and there's some indications that Google's internal fork uses an unpublished kernel mechanism, perhaps by extending seccomp (EDIT: It seems this has made its way to the outside world since I last looked. `systrap` is now default: https://gvisor.dev/docs/architecture_guide/platforms/ ). It's fake-kernel-in-userspace then sandboxed by seccomp.

Saying gVisor is "ultimately enforced by a normal kernel" is about as misleading & accurate as "KVM is enforced by a normal kernel" -- it is, but it's a very narrow boundary, not the usual syscall ABI.


I think bryan cantrill founded a company (joyent? or triton?) to do just that several years ago. It may have been based on solaris/smartos zones which is that exact use case w/ very secure/isolated containers.


althou it came with linux binary compat (of unknown quality) i think the solaris thing was just too off putting for most customers and the company did not do very well


Triton is now being developed by MNX Solutions and seems to be doing quite well.

We run Triton and SmartOS in production and the linux compatibility works via lx-zones just fine. Only some of the linux-locked software, which usually means docker, needs to go inside a bhyve VM.


Aren't Cloudflare Workers multitenant? Although, if you want to be cynical, that could be a reason they aren't 'enterprise grade™'.


They are using v8 isolates, which is maybe easier to do in a sound way than the whole broad space of containers. Previous discussion: https://news.ycombinator.com/item?id=31740885


> There's no way an "enterprise grade" cloud vendor like AWS would allow co-tenancy of containers (...)

I don't think your beliefs are well founded. AWS's EC2 by default only supoprts shared tenancy, and dedicated instances are a premium service.


I take them to mean shared kernels.


But the parent specifically called out co-tenancy of /containers/. EC2 instances are not containers.


Both Lambda firecracker VMs and t2 instances are multi-tenant and oversubscribed.


I take them to mean "multiple tenants sharing a kernel"; I think everyone understands that AWS and GCP have multitenant hypervisor hosts.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: