Google Compute Engine VM takeover via DHCP flood

cle · on June 29, 2021

The more concerning security finding here is that Google sat on this for 9 months. Assuming the claims hold, this is a serious problem for any security-conscious GCP customers. What other vulnerabilities are they sitting on? Do they have processes in place to promptly handle new ones? Doesn’t look like it…

saghm · on June 29, 2021

This is especially questionable given the much shorter deadline that Project Zero gives other companies to fix bugs before publishing their vulnerabilities (regardless of whether there's been a fix). It only seems fair that Google should hold itself to the same standard.

tptacek · on June 29, 2021

If the people who reported this vulnerability had wanted to disclose it on P0's timeline, they were presumably free to do so.

ajklsdhfniuwehf · on June 30, 2021

that's fine and all, but what would they gain?

Companies who use that response are even worse because they know very well there is no wining move from the researcher. The company have all the responsibility no matter what.

dataflow · on June 29, 2021

Project Zero gives Google the same timeline. This had nothing to do with Project Zero from what I understand.

kerng · on June 29, 2021

Both are Google - from an outside view we shouldn't distinguish. Google should hold itself to a consistent bar.

It highlights how divisions operate in silos at Google, and just because Project Zero causes a lot of positive security marketing for Google, it doesn't seem that the quality bar is consistently high across the company.

Also, please don't forget this is still not fixed.

dataflow · on June 29, 2021

Funny thing is I agree with you that Google should hold itself to that bar, but I don't agree as to Project Zero being the reason. I think we very much should distinguish Google from P0, and that P0's policy should be irrelevant here; their entire purpose is to be an independent team of security researchers finding vulnerability in software, indiscriminately. It seems a number of others here feel similarly (judging by the responses), and ironically their support for the position is probably being lost by dragging P0 into the conversation.

The reason I think Google should hold itself to that bar is something else: Google itself claims to use that bar. From the horse's mouth [1]:

> This is why Google adheres to a 90-day disclosure deadline. We notify vendors of vulnerabilities immediately, with details shared in public with the defensive community after 90 days, or sooner if the vendor releases a fix.

If they're going to do this to others as general company policy, they need to do this to themselves.

[1] https://www.google.com/about/appsecurity/

sirdarckcat · on June 29, 2021

Are you suggesting Google to make all unfixed vulnerabilities public after 90 days? Would that be even if the finder does not want them to become public? Or just as an opt-out type of thing.

dataflow · on June 29, 2021

I'm only suggesting Google needs to fix everything in 90 days (and reveal them afterward as they consider that standard practice) so they don't have unfixed vulnerabilities past that. I don't really have opinions on what policies they should have for cases where that isn't followed, though I think of thing even having a policy for that case encourages it not to be followed to begin with.

sirdarckcat · on June 29, 2021

Vulnerability deadlines are disclosure deadlines, not remediation deadlines. There's plenty of vulnerabilities that can't be fixed in that time, and I think it's fair for the public to know about them rather than keeping them secret forever.

dataflow · on June 29, 2021

"Fair to the public" was neither intended to be nor is the concern. Their stance has always been "better for security" and disclosing an unpatched vulnerability is generally worse for security unless you believe it'll encourage people to fix things by that deadline.

ithkuil · on June 29, 2021

On this case knowing about this vulnerability allows you to take corrective action. If Google cannot fix the root cause this doesn't necessarily mean there aren't mitigations that can be done manually by an end user (yes it sucks, but still better than getting hacked)

dataflow · on June 29, 2021

When users can mitigate it I agree with you (I forgot about that case in the second half of my comment), but there have also been cases when users weren't able to do anything but they disclosed anyway, so that doesn't explain the policy.

sirdarckcat · on June 30, 2021

Insecurity is invisible. Users have no way to know the weaknesses in the software they use until it's too late. Disclosure is meant to make it possible for users to see what weaknesses they might have so they can make informed decisions.

Users still benefit to know about issues that can't be fixed (think about Rowhammer, Spectre and similar), so as these attacks become more practical (eg https://leaky.page or half double) they can adjust their choices accordingly (switching browsers, devices, etc) if the risk imposed by them is too high.

Of course (using an analogy for a second), some can say that it would be better for people to never find out that they are at increased risk of some incurable disease, because they can't do anything about it.

But for software, you can't make individual decisions like that. Even if one person doesn't want to know about vulnerabilities in the software they use, others could still actually benefit to know about them, and the benefit of the many trump over the preferences of the few.

That is, unless the argument is that it's actively damaging for all of the public (or the majority) to know about vulnerabilities in the software they use. If the point is to advocate for complete unlimited secrecy, and for researchers to sit on unfixed bugs forever, then that's quite an extreme view of software security and vulnerability disclosure (but that some companies unfortunately still follow).

Disclosure policies like these aim to strike a balance between secrecy and public awareness. They put the onus of disclosure on the finder because it's their finding (and they are the deciders on how it's shared), and finders are more independent than the vendor, but I could imagine a world in which disclosure happens by default, by the company, even for unfixed bugs.

jonas21 · on June 29, 2021

I assume they haven't fixed it yet because they don't consider it to be severe enough to prioritize a fix.

So the reporter waits >90 days, then publicly discloses. Isn't this exactly how it's supposed to work?

breakingcups · on June 29, 2021

Project Zero is a (very public) Google project though. If they stand behind their choices and policies, they should live by them.

joshuamorton · on June 29, 2021

In what way is Google not standing by their policies (for example, have they criticized or tried to prevent this person from disclosing publicly)?

e40 · on June 29, 2021

The clear implication is by not fixing the bug in the same time frame.

joshuamorton · on June 29, 2021

What is the thing being implied? Like as far as I can tell, Google's position seems to be that "it is best if vuln researchers have the freedom to disclose unfixed issues, especially after reporting them".

People criticize P0 for publishing issues despite companies asking for extensions. But we're criticizing Google here for...what? They didn't ask for an extension, they didn't try to prevent this person from disclosing. Where is the hypocritical thing?

sirdarckcat · on June 29, 2021

Might be worth noting, 90 days are how long Google thinks it is reasonable to keep vulnerabilities secret without a fix.

The longer it is kept secret, the benefits of the public knowing about it outweigh the risks.

Not all vulnerabilities can be fixed in 90 days, but they can be disclosed.

dataflow · on June 29, 2021

The complaint is that Google's stance with Project Zero is "90 days is plenty sufficient; you're a bad vendor if you can't adhere to it", and then Google itself doesn't adhere to it, which implicates themselves here.

I see what they're saying if you lump them together; I just think it makes sense to treat P0 a little independently from Google. But otherwise it's got a point.

joshuamorton · on June 29, 2021

Can you point out the second part, specifically where "you're a bad vendor if..." is either stayed or implied py P0?

See instead https://news.ycombinator.com/item?id=27680941, which is my understanding of the stance p0 takes.

dataflow · on June 29, 2021

> See instead https://news.ycombinator.com/item?id=27680941, which is my understanding of the stance p0 takes.

That's a common sentiment I just don't buy. People here love to hand-wave about some vague "benefit to the public", and maybe there is some benefit when the vulnerability can be mitigated on the user side, but it literally cannot be the case for the fraction of vulnerabilities that entities other than the vendor can do nothing about. The only "benefit" is it satisfies peoples' curiosity, which is a terrible way to do security. Yet P0 applies that policy indiscriminately.

> Can you point out the second part, specifically where "you're a bad vendor if..." is either stayed or implied py P0?

As to your question of when this is implied by P0, to me their actions and lack of a compelling rationale for their behavior I explained above is already plenty enough to imply it. But if you won't believe something unless it's in an actual quote from themselves, I guess here's something you can refer to [1]:

- "We were concerned that patches were taking a long time to be developed and released to users"

- "We used this model of disclosure for over a decade, and the results weren't particularly compelling. Many fixes took over six months to be released, while some of our vulnerability reports went unfixed entirely!"

- "We were optimistic that vendors could do better, but we weren't seeing the improvements to internal triage, patch development, testing, and release processes that we knew would provide the most benefit to users."

- "If most bugs are fixed in a reasonable timeframe (i.e. less than 90 days), [...]"

All the "reasonable time frame (i.e. < 90 days)", "your users aren't getting what they need", "your results aren't compelling", "you can do better", etc. are basically semi-diplomatic ways of saying you're a bad vendor when you're not meeting their "reasonable" 90-day timeline.

[1] https://googleprojectzero.blogspot.com/p/vulnerability-discl...

joshuamorton · on June 30, 2021

They literally directly describe it as a benefit to users, the sentiment you don't buy, and don't ever actually call vendors bad, except if you interpret the less benefit to users to be a moral impugnment of the vendors.

What you cite proves my point!

dataflow · on June 30, 2021

> They literally directly describe it as a benefit to users

"It" in that sentence does not refer to their own unpatched disclosures.

> They don't ever actually call vendors bad, except if you interpret the less benefit to users to be a moral impugnment of the vendors. What you cite proves my point!

Okay well now I'm definitely convinced.

staticassertion · on June 29, 2021

They didn't fix it within that timeline. I don't know why everyone is saying "well they didn't stop disclosure in 90 days", but they didn't fix it in the timeline that they have allocated as being reasonable for all vulns they report.

jsnell · on June 29, 2021

At the limit, what you're saying would mean that vendors should feel obligated to fix issues they don't consider to be vulnerabilities, as long as they're reported as such. That'd clearly be absurd. Is there maybe some additional qualifying factor that's required to trigger this obligation that you've left implicit?

staticassertion · on June 29, 2021

> what you're saying would mean that vendors should feel obligated to fix issues they don't consider to be vulnerabilities

Why would it?

> Is there maybe some additional qualifying factor that's required to trigger this obligation that you've left implicit?

That they consider it a vulnerability seems fine.

jsnell · on June 29, 2021

If you're leaving the determination to the vendor, they could just avoid the deadline by claiming it is not a vulnerability. That seems like a bad incentive.

There are things that literally cannot be fixed, or where the risk of the fix is higher than the risk of leaving the vulnerability open. (Even if it is publicly disclosed!)

It seems that we're all better off when these two concerns are not artificially coupled. A company can both admit that something is a vulnerability, and not fix it, if that's the right tradeoff. They're of course paying the PR cost of being seen as having unfixed security bugs, and an even bigger PR cost if the issue ends up being exploited and causes damage. But that's just part of the tradeoff computation.

staticassertion · on June 29, 2021

I don't know what point you're trying to make here. Google acknowledges that this is a vulnerability ("nice catch"), Google pushes every other company to fix vulns in 90 days (or have it publicly disclosed, which is based on the assumption that vulns can be fixed in that time), and Google did not fix it in 90 days.

If you're asking me to create a perfect framework for disclosure, I'm not interested in doing that, and it's completely unnecessary to make a judgment of this single scenario.

> A company can both admit that something is a vulnerability, and not fix it, if that's the right tradeoff.

Google's 90 days policy is designed explicitly to give companies ample time to patch. And yes, this is them paying the PR cost - I am judging them negatively in this discussion because I agree with their 90 day policy.

jsnell · on June 29, 2021

I am saying that there are things that are technically vulnerabilities that are not worth fixing. Either they are too risky or expensive to fix, or too impractical to exploit, or too limited in damage to actually worry about. Given the line you drew was that there must be a fix in 90 days, if the company agrees it is a vulnerability, the logical conclusion is that the companies would end up claiming "not a vulnerability" when they mean WONTFIX.

If you think this particular issue should have been fixed within a given timeline, it should be on the merits of the issue itself. Not just by following a "everything must be fixed in 90 days" dogma. All that the repeated invocations of PZ have achieved is drown out any discussion on the report itself. How serious/exploitable is it actually, how would it be mitigated/fixed, what might have blocked that being done, etc. Seems like those would have been far more interesting discussions than a silly game of gotcha.

(If you believe there is no such thing as a vulnerability that cannot be fixed, or that's not worth fixing, then I don't know that we'll find common ground.)

staticassertion · on June 29, 2021

> Given the line you drew was that there must be a fix in 90 days, if the company agrees it is a vulnerability, the logical conclusion is that the companies would end up claiming "not a vulnerability" when they mean WONTFIX.

OK, but that doesn't apply here, which is why I don't get why you're bringing up general policy issues in this specific instance. Google did acknowledge the vulnerability, as noted in the disclosure notes in the repo.

So like, let me just clearly list out some facts:

* Project 0 feels that 90 days is a good timeline for the vast majority of vulns to be patched (this is consistent with their data, and appears accurate)

* This issue was acknowledged by Google, though perhaps not explicitly as a vulnerability, all that I can see is that they ack'd it with "Good catch" - I take this as an ack of vulnerability

* This issue is now 3x the 90 day window that P0 considers to be sufficient in the vast majority of cases to fix vulnerabilities

I don't see why other information is supposed to be relevant. Yes, vendors in some hypothetical situation may feel the incentive to say "WONTFIX" - that has nothing to do with this scenario and has no bearing on the facts.

> If you think this particular issue should have been fixed within a given timeline, it should be on the merits of the issue itself.

That's not P0s opinion in the vast majority of cases - only in extreme cases, to my knowledge, do they break from their 90 day disclosure policy.

> Not just by following a "everything must be fixed in 90 days" dogma.

Dogma here is quite helpful. I see no reason to break from it in this instance.

> Seems like those would have been far more interesting discussions than a silly game of gotcha.

I'm not saying "gotcha", I'm saying that:

a) 9 months to fix this feels very high, Google should explain why it took so long to restore confidence

b) The fact that they have an internal culture of 90 days being a good time frame for patching purely makes it ironic - it is primarily the fact that I think this should have been patched much more quickly that would bother me as a customer.

> (If you believe there is no such thing as a vulnerability that cannot be fixed, or that's not worth fixing, then I don't know that we'll find common ground.)

Nope, 100% there are vulns that can't be fixed, vulns that aren't worth fixing, etc. But again, Google didn't say this was a "WONTFIX" though, and they did ack that this is a vuln. If it wasn't possible to fix it they could say so, but that isn't what they said at all, they just said they weren't prioritizing it.

If it's the case that this simply isn't patchable, they should say so. If they think this doesn't matter, why not say so? It certainly seems patchable.

jsnell · on June 29, 2021

> OK, but that doesn't apply here

It's not what happened, but the logical outcome of what you propose. Right now the rules are simple: "disclosure in 90 days, up to you whether to fix it". What you're proposing is that it is no longer up to the company to make that tradeoff. They must always fix it.

> That's not P0s opinion in the vast majority of cases - only in extreme cases, to my knowledge, do they break from their 90 day disclosure policy.

Again, that is a disclosure timeline. Not a demand for a fix in that timeline. In general it's in the vendors best interest release a fix in that timeline, especially given its immutability. You're trying to convert it to a demand for a fix no matter what. That is not productive.

> a) 9 months to fix this feels very high, Google should explain why it took so long to restore confidence

So why not argue for that explicitly? It seems like a much stronger approach than the "lol PZ hypocricy" option.

staticassertion · on June 29, 2021

You're trying to talk about consequences of my statement, which I'm trying very hard not to talk about, because I don't care. I'm only talking about this very specific instance.

> Again, that is a disclosure timeline. Not a demand for a fix in that timeline.

Yes and it is based on the expectation of a fix within that timeline being practical.

> You're trying to convert it to a demand for a fix no matter what. That is not productive.

No I'm not, you're trying to say that I am, repeatedly, and I keep telling you I don't care about discussing disclosure policy broadly. I'm only talking about this once instance.

> It seems like a much stronger approach than the "lol PZ hypocricy" option.

Take that up with the person who posted about P0 initially. I'm only saying that it's ironic and that I support the 90 day window as being a very reasonable time to fix things, and that them going 3x over is a bad look.

joshuamorton · on June 30, 2021

Sure, but how is that hypocritical, which is the question I asked that you initially responded to?

staticassertion · on June 30, 2021

Replace "ironic" with hypocritical and I think it's still pretty fair. Less so, strictly.

dataflow · on June 30, 2021

> Again, that is a disclosure timeline. Not a demand for a fix in that timeline. In general it's in the vendors best interest release a fix in that timeline, especially given its immutability. You're trying to convert it to a demand for a fix no matter what.

I don't see what form it would come in if it were a demand in your view. We have a disagreement over private entities over a vulnerability; how would one "force" the other to do that except by disclosing it? Hold someone hostage?

sangnoir · on June 29, 2021

> Google pushes every other company to fix vulns in 90 days (or have it publicly disclosed)

I believe you're mistaken about the conditional publishing. The 90 day clock starts when google reports the bug - they will make it public whether or not the vulnerability is remediated (with very few exceptions). By all appearances, Google is very willing to be on the receiving end of that on the basis that End-Users can protect themselves when they get the knowledge - in this case, GCE users are now aware that their servers are exploitable and make changes - like moving to AWS. I think the 90-day clock is reasonable stance to take, for the public (but not necessarily for the vendor).

staticassertion · on June 29, 2021

I'm totally aware of all of this and I strongly agree with P0s policy.

sirdarckcat · on June 29, 2021

http://g.co/appsecurity has more details but TL;DR is that Google is supportive of people disclosing unfixed bugs after 90 days, which is what happened here.

staticassertion · on June 29, 2021

I agree - they've been really strict about this too, and have even talked about reducing this window. To go 3x over the window is a bad look.

VWWHFSfQ · on June 29, 2021

Google doesn't hold itself to any standard. At least, not anymore.

kerng · on June 29, 2021

Agreed, especially Google's comment early December about "holiday seasons" seems strange after not having done anything for 2 months already...

When it comes to others (like Microsoft) Google is always quick to publish their findings, regardless of other circumstances.

antoncohen · on June 29, 2021

While there are a series of vulnerabilities here, none of them would be exploitable in this way if the metadata server was accessed via an IP instead of the hostname metadata.google.internal.

The metadata server is documented to be at 169.254.169.25, always[1]. But Google software (agents and libraries on VMs) resolves it by looking up metadata.google.internal. If metadata.google.internal isn't in /etc/hosts, as can be the case in containers, this can result in actual DNS lookups over the network to get an address that should be known.

AWS uses the same address for their metadata server, but accesses via the IP address and not some hostname[2].

I've seen Google managed DNS servers (in GKE clusters) fall over under the load of Google libraries querying for the metadata address[3]. I'm guessing Google wants to maintain some flexibility, which is why they are using a hostname, but there are tradeoffs.

[1] https://cloud.google.com/compute/docs/internal-dns

[2] https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance...

[3] This is easily solvable with Kubernetes HostAliases that write /etc/hosts in the containers.

bradfitz · on June 29, 2021

Even Google's Go client for the GCE metadata uses an IP address:

> Using a fixed IP makes it very difficult to spoof the metadata

https://github.com/googleapis/google-cloud-go/commit/ae56891...

skj · on June 29, 2021

Hmm cloud build spoofs it :) if the customer build accessed the underlying VM's metadata it would be very confusing (though not a security issue).

It was not straightforward. I learned a lot about iptables and docker networking.

aenis · on June 29, 2021

It does not even take a lot. I run a production service on cloud run, the typical load is around 500qps, and the dns queries to resolve metadata server do fail frequently enough for this to be noticeable.

spyspy · on June 30, 2021

What are you querying the metadata server for on every request, if I might ask?

tryauuum · on June 29, 2021

I think even without metadata server replacement this attack would still be painful. The ability to reconfigure network on a victim sounds painful

antoncohen · on June 29, 2021

That is true, I was thinking specifically about the metadata and SSH keys. But DHCP can also set DNS servers, NTP servers, and other things that can either cause disruptions or be used to facilitate a different attack.

There might be a persistence issue, it seems like part of this attack was that the IP was persisted to /etc/hosts even after the real DHCP server took over again. But even just writing to /etc/hosts could open the door redirecting traffic to an attacker controlled server.

remus · on June 29, 2021

While it certainly seems like a fairly serious vulnerability I think it's worth highlighting that this attack requires that either you already have access to a machine on the same subnet as the target machine or that the firewall in front of the target machine is very lax. That's a pretty high bar for getting the attack to work in the wild.

asah · on June 29, 2021

-1: consider if you have complex vendor software on a GCE VM inside a larger GCP project... now a vulnerability in that vendor software means the whole GCP project is exposed. Vendor fixes are notoriously slow, so in practice you have to isolate vendor software to separate GCP projects.

Real example: I have a client with precisely this situation, and elsewhere in the GCP project is PII consumer data requiring public disclosure if exposed.

mywittyname · on June 29, 2021

Luckily, GCE limits Cloud API access available to an instance to nothing by default. Meaning that access to BigQuery, Storage, SQL, User manage, etc is not allowed on that VM, even with root access to the VM, unless configured by the administrator.

This at least mitigates the the impact of this exploit to a degree. If a web server has access only to Cloud SQL, an attacker cannot use access to that VM to go digging around in Cloud Storage buckets unless GCS access is granted explicitly to the VM.

From there, IAM security applies. So even if the VM has Cloud API access to, say Big Query, the limitations of that Service Account then applies to any data that is accessed.

asah · on June 29, 2021

I thought this attack allows one VM to takeover another VM, and the victim VM (the software on that VM...) talks with those services?

mywittyname · on June 30, 2021

Well, so GCP is pretty big on serverless architecture. It is completely possible that there is no GCE instance at all which has Cloud API access to a particular service.

A company might be making heavy use of BigQuery in their project, but have a data processing pipeline that uses tools like Cloud Functions, Scheduled Queries, and BQ Transfer Service to push sanitized data into a Cloud SQL instance for the front end to use.

So no GCE instance will need Cloud API access to BigQuery, so no matter what level of access is obtained by an intruder on any VM, they will never be able to access BigQuery.

remus · on June 29, 2021

I completely agree and didn't mean to suggest it wasn't a serious vulnerability: as I understand it this attack means that if any VM in the subnet is compromised that can be leveraged in to an attack on any VM in the subnet so your attack surface has suddenly gotten a lot bigger and your isolation between VMs substantially poorer.

mukesh610 · on June 29, 2021

> elsewhere in the project is PII

Servers holding PII should be firewalled off. Necessary traffic should be explicitly granted. That would remedy the issue a bit.

mywittyname · on June 29, 2021

Most of my GCP clients shove PII into a GCP Service, like BQ. It's not put on a "server" per se, so firewall rules don't really apply here. The appropriate thing to assert is that necessary IAM permissions should be granted explicitly.

This is usually the case. As most of my clients use isolated GCP projects for housing PII data. This forces IAM permissions to be granted service accounts, which, hopefully, means that administrators are cognizant of the levels of access that they are granting to service accounts.

Not a guarantee, mind you, but hopefully some redflags would be raised if a someone requested PII-level access for a Service Account associated with a public facing web server.

asah · on June 29, 2021

IIUC this is insufficient (!) - even with a firewall between them, a VM now vulnerable to attack from another VM on the subnet (in the same GCP project).

exitheone · on June 29, 2021

I think the real learning here is not to colocate different things in a single GCP project. AFAIK projects don't cost anything so why not create one per service?

rantwasp · on June 29, 2021

the real learning for me is to not use gcp. sounds harsh, but you don’t get second chances when it comes to trust in this context.

asah · on June 29, 2021

The way you're wording this suggests that this was a sensible design prior to this vulnerability, but in fact all sorts of tools, config, etc work within a project but not across projects, including IAM. Yes obviously anything can be duplicated but it's a big pain.

Probably easier to create separate subnets for VMs that don't trust each other ?

briffle · on June 29, 2021

IAM works across projects easily. The "Organization" concept it Google Cloud is used to collect projects, and manage permissions of groups (or subfolders) of projects very easily.

cmeacham98 · on June 29, 2021

Note that "have access to a machine on the same subnet" really just means "can send traffic to the VM local network". In other words, partial compromises (ex: docker container, VM, chroot) that have access to this network are enough, as well as attacks that let you send traffic (ex: TURN server abuse)

tryauuum · on June 29, 2021

I wonder if they have actually tested the last scenario:

> Targeting a VM over the internet. This requires the firewall in front of the victim VM to be fully open.

I mean, even if the firewall is non-existing, can you really carry a dhcp payload over the public internet?

EDIT: still haven't read the whole thing, but looks like they did test it

pmontra · on June 29, 2021

They did test that scenario

https://github.com/irsl/gcp-dhcp-takeover-code-exec#attack-3

tryauuum · on June 29, 2021

Google should probably block these kind of packets, anything from/to udp ports 67/68. Might be a little harsh, but secure

gerdesj · on June 29, 2021

This is well worth reading. It describes how, through a series of well meaning steps, you shoot yourself in the face.

It all starts with:

"Note that the last 4 bytes (0a:80:00:02) of the MAC address (42:01:0a:80:00:02) are actually the same as the internal IP address of the box (10.128.0.2). This means, 1 of the 3 components is effectively public."

markus_zhang · on June 29, 2021

As someone who knows nothing about networking, can you plz explain why they set up the config as this? Does that mean the third byte from top must be 0a?

BTW I just checked my corporate intranet setup and MAC has nothing to do with the ip address.

gerdesj · on June 29, 2021

Quite. MAC addresses don't need to line up with IP addresses or vice versa. They are completely different things. However in IPv6 there is a standard for link local addresses that does have a correlation between IP and MAC but there you go. It's designed to avoid clashes and is not the finest design decision ever made! That last will probably creep into a CVE one day just like this chain of events.

The config as designed probably looked like a good idea at the time. When you are worrying about millions of things in an address space like IPv4 and MAC, then having them tie up in some way may be useful for lookups in a very large database or two.

However, giving information away about something that was never designed from the outset to do so is not a good idea.

If you follow the chain of reasoning in the github post you can see that you can break the chain at multiple points by not "being clever". If you start by not making your IP addresses follow the MAC address you kill this problem off before it starts.

gerdesj · on June 29, 2021

At the risk of becoming a real bore, I'll spell out why I think this is a really, really dumb thing:

If you have anything to do with fire fighting (and we all do in a way) you soon learn that there are three things required for a fire:

  * Something to burn
  * Air (strictly speaking: Oxygen)
  * Something to start the fire (source of ignition)

Fire prevention is based around avoiding having all three things together at any point in time. So, putting out a fire often involves removing oxygen or removing the burning thing. Prevention can be as simple as making a rule at home that no one puts dish cloths on the hob to dry. That last phrase may need some translating!

So, you put your webby app or VM out on the internets for all to see and play with. Unlike your home front door, every Tom, Dick and Harry in the world can go and take a playful kick at it. So you need to take some care with it. There is no simple set of three factors that you can protect against as there is for fire prevention. Instead you need to follow some good practice and hope for the best.

One good (not best - there is no such thing) practice is to avoid giving away information unnecessarily. Linking IP to MAC is such a thing. Do it at home if you must but don't do it in public.

tovej · on June 29, 2021

The automatic host numbering feature in the IPV6 standard (modified EUI-64,RFC 4291) was a big mistake. But I thought that worked the other way? that the MAC was part of the IP, not the IP part of the MAC.

gerdesj · on June 30, 2021

"that the MAC was part of the IP, not the IP part of the MAC."

The IPv6 link local address is derived from the MAC address. I can't be arsed to look up the current RFCs so let's take a look at my laptop, that I'm using now (yes, I have changed a few digits but only for global addresses):

  2: wlp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether e4:70:b8:f1:6b:5c brd ff:ff:ff:ff:ff:ff
    inet 10.200.201.164/24 brd 10.200.200.255 scope global dynamic noprefixroute wlp2s0
       valid_lft 73049sec preferred_lft 73049sec
    inet6 2001:3d49:ad52:ddc8:6ba8:e800:8e96:143b/64 scope global temporary dynamic 
       valid_lft 86387sec preferred_lft 14387sec
    inet6 2001:3d49:ad52:ddc8:5203:e5fe:5ed0:c173/64 scope global dynamic mngtmpaddr noprefixroute 
       valid_lft 86387sec preferred_lft 14387sec
    inet6 fe80::506d:9c2f:8b7b:1d7e/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever

My link local address is fe80::506d:9c2f:8b7b:1d7e and my MAC address for that interface is e4:70:b8:f1:6b:5c

<sound-effect>scratched-record</sound-effect>

It seems that times have changed. This is a laptop that uses Network Manager on Arch Linux. If I had to guess I suspect that the bloke who does NM has fixed that flaw or he's following a newer standard/RFC than I've (bothered to have) heard of.

All this stuff points out a strange dichotomy: you want to be seen ("Hello look at my website") and yet you don't want to be seen by the baddies. You want to draw attention to your wares but not have a bunch of state sponsored folk poking at your unmentionables and nicking your cash.

As of fairly recently, we are seeing quite a lot of remediation (for want of a better word) by additional state sponsored actors than is considered normal. These are not the usual lot who piss on your prized Begonias. This lot seem to know when to widdle effectively.

tovej · on June 30, 2021

Yeah, automatic host discovery puts the MAC address (almost) verbatim at the end of an interface IP, but it's an optional feature.

In the vuln report, it turns out automatic host discovery is used after all, but the IPv4 is also based on this, which confused me.

The MAC address was: 42:01:0a:80:00:02

The IPv6 address was: fe80::4001:aff:fe80:2

It's not just the last four bytes, it's the whole MAC address, but 0xFFFE is crammed in the middle of the MAC address, and the first byte's second to last bit is flipped. We can see that this is exacly the scheme that was used. Now the confusing part is that the IPv4 address is apparently also the MAC address:

  MAC         42:01:0a:      80:00:02
  
  IPv6  fe80::40 01:0a ff:fe 80:00 02
               ^       ^^ ^^
       flipped bit   inserted

  IPv4(hex)         0a.       80. 0. 2
      (dec)         10.      128. 0. 2

Now the confusing part is that the IPv4 address is apparently also based on the MAC address, which means the IPv4, and MAC addresses can both be derived from the IPv6 address, and the IPv4 address can be derived from the MAC address.

(The entire host section of the IPv6 address is based on the MAC iirc, so the network section is the only part you need to guess there.)

markus_zhang · on June 29, 2021

Thanks! This is good point. Yeah I kind of understand why they make the initial desicision back then -- much easier to implement.

champtar · on June 29, 2021

In most (all?) public cloud you don't have real layer2, having the MAC computed from the IP allow to make a stateless ARP responder I guess

markus_zhang · on June 29, 2021

Thanks! I don't understand the technical details but (combining multiple responds) I think I understand a bit now. I also remember that back when I was in the university we used sort of university intranet (every personal computer connected has to go through it to the Internet, but it's free anyway) and there was a lot of ARP attack then, and I learned to use arp -a and arp -v (maybe wrong).

rafaelturk · on June 29, 2021

Apparently Google Project Zero timeline only applies to others...

tptacek · on June 29, 2021

This superficial dismissal doesn't even make sense. Google didn't control the disclosure timeline here. The people who found these vulnerabilities could have published on T+90, or, for that matter, T+1. Meanwhile, the norm that does exist (it's a soft norm) is that you respect the disclosure preferences of the person who reports the vulnerability to the extent you reasonably can.

I'm not sure I even understand the impulse behind writing a comment like this. Assume Google simply refuses to apply the P0 disclosure rule to themselves (this isn't the case, but just stipulate). Do you want them to stop funding Project Zero? Do you wish you knew less about high profile vulnerabilities?

viraptor · on June 29, 2021

I think you're mixing up the projects a bit. This vulnerability doesn't seem to be going via Project Zero. Other google teams are known to sometimes not react in time and when the reports are going through Project Zero, they are disclosed at 90 days, regardless if the project is internal or not. (I remember an unpatched Chrome vulnerability was published by Project Zero at the end of deadline)

So the Zero timeline applies to everyone the same. It doesn't mean fixes actually happen on the same timeline.

breakingcups · on June 29, 2021

Their point is, Project Zero is a very public Google project to hold other companies (and yes, themselves too) accountable with disclosure policies Google (as a company) presumably stands behind. Thus it is quite ironic for an issue to not be fixed in a year.

Yes, sometimes other Google teams miss the deadline set by Project Zero too. That's not the point.

emayljames · on June 29, 2021

parent comment is reffering to the double standards. One rule for google disclosing others vulns, another for their own vulns.

atatatat · on June 29, 2021

was this found by, or originally reported to, Project Zero?

manquer · on June 29, 2021

Precisely the point OP is saying. The rules for project zero are different not that project zero is applying it differently.

UncleMeat · on June 29, 2021

90 days is fairly common in the industry, but not universal. GPZ is definitely not uniquely strict on disclosures.

manquer · on June 29, 2021

Thsts not problem It is not that project zero is strict,

The researcher here allowed 9 months for them to fix. Should he have disclosed after 90 days ? Clearly google didn't use the time to fix.

It looks bad when you have a hard policy to disclose but not to fix

jrockway · on June 29, 2021

> The researcher here allowed 9 months for them to fix.

The researcher is basically allowed to do whatever they want here. They can wait 0 days and just post to the full-disclosure mailing list. Or they could never disclose it.

Personally, I've done both. I took DJB's "Unix Security Holes" class, where we had to find 10 vulnerabilities in OSS as the class's final project. All of those got 0-day disclosed because that is how DJB rolls. I've also independently found bugs, and I was satisfied by the resolution from emailing security@ that company.

manquer · on June 30, 2021

Well it is easier when your income doesn't depend on the bug bointies. Some companies won't pay if you disclose without their consent.

Same as any job really if I don't care whether I get paid , I can work under different rules.

UncleMeat · on June 29, 2021

He could have if he wanted to. He could have disclosed immediately if he wanted to.

Google does not have a hard policy to disclose. GPZ does. Vulns in external products found through other groups within Google do not share all the same processes as GPZ.

manquer · on June 29, 2021

GPZ is not some independent entity google just funds, they are as much part of Google as any other team.

If you want to be that precise, it is bad look for part of your organization to have hard policy that you expect external companies to follow, while parts of your organization itself cannot do the same.

I am not saying Project Zero is wrong, clearly giving more time did not prod Google to actually fix timely, he certainly was being too polite and gave too much time, I don't know why, perhaps companies don't pay bounties if you disclose without their consent [2] ?

All I am just saying Google as a company should hold itself to the same hard standard and fix issues in 90 days this is what Google Project Zero as a team expects other companies[1] to do so, they will even reject requests for extensions.

As a company if they can't do it, they shouldn't expect others to do it either right? Or they should disclose reported vulnerabilities even if not fixed in 90 days.

[1] Maybe they do it for internal teams as well, but that not relevant to us, all we should be concerned is how they behave externally with disclosing and solving issues.

[2] Perhaps part of the reason GPZ is able to do this hard policy is because they don't depend on bug bounties as source of income as independent researchers do.

Sebb767 · on June 29, 2021

He reported the issue first on 2020-09-26 [0], nearly a year ago.

[0] https://github.com/irsl/gcp-dhcp-takeover-code-exec#timeline

_trampeltier · on June 29, 2021

Love the 2020-12-03: ... "holiday season coming up"

jsnell · on June 29, 2021

The timeline for disclosure is set by the reporter, no? "We will disclose this bug on YYYY-MM-DD". The other side can ask for an extension, but has no intrinsic right to one. Unless I am missing something, this has nothing to do with PZ, so their default timeline is totally irrelevant.

manquer · on June 29, 2021

Depends on if you want to get paid. Many organizations will not pay bug bounties if you disclose before they fix or without their consent.

Project Zero probably doesn't care or doesn't even accept the bounties

jsnell · on June 29, 2021

The post had no indication of a bounty being held hostage. From https://www.google.com/about/appsecurity/reward-program/ it seems like the requirement for a bounty is not for "don't disclose before fix is released", but "don't disclose without reasonable advance notice".

So I just don't see the inconsistency here. Project Zero gives a disclosure deadline. The reporter here chose not to give one. When they said they wanted to disclose, there was no stalling for extra time. Just what is the expectation here?

Cthulhu_ · on June 29, 2021

I believe PZ can set the terms of reasonable disclosure (and the like) as a condition for paying out.

Sebb767 · on June 29, 2021

It's so strange to me that they have a process for adding a root key that involves no authentication at all. These are VMs with their images running their pre-installed software, it's not like this would have been a hard problem.

champtar · on June 29, 2021

In GKE it allowed to go from a hostNetwork pod to root on the node: http://blog.champtar.fr/Metadata_MITM_root_EKS_GKE/

tytso · on June 29, 2021

There's a really simple (albeit hacky) workaround which can be deployed fairly quickly. In /etc/dhcp/dhclient-exit-hooks.d/google_set_hostname replace this line:

if [ -n "$new_host_name" ] && [ -n "$new_ip_address" ]; then

with this:

if [ -n "$new_host_name" -a ! "$new_host_name" =~ metadata.google.internal ] && [ -n "$new_ip_address" ]; then

(Yes, =~ is a bashism, but google_set_hostname is a bash script.)

This prevents /etc/hosts from getting poisoned with a bogus entry for the metadata server. Of course, dhcpd should also be fixed to use a better random number generator, and the firewall should be default stop dhcp packets from any IP address other than Google's DHCP server. Belt and suspenders, after all. But fixing the dhclient exit hooks is a simple text edit.

floatingatoll · on June 29, 2021

!= would be a simple non-regex replacement, right? Or are there parts to the exploit hostname that aren’t a literal match?

tytso · on June 29, 2021

The reason why I used the regex match is because the attacker might try to add one or spaces as a prefix and/or suffix, e.g " metadata.google.internal " which wouldn't match "metadata.google.internal" but the spaces in /etc/hosts name would be ignored and still be effective in poisoning the /etc/hosts lookup for metadata.google.internal.

zokier · on June 29, 2021

The combination of dhcp, magic metadata servers, and cloud-init feels so awkward way of managing VM provisioning. I'm thinking would having a proper virtual device or maybe something on uefi layer clean up things?

CodesInChaos · on June 29, 2021

Metadata servers should not exist. An application having network access should not grant it any sensitive privileges.

cesarb · on June 29, 2021

Last time I helped administer a deployment on one of these clouds, one of the first things we did on the startup script for the instances was to install an iptables rule so that only uid 0 (root) could talk to the metadata servers. The need for that kind of firewall rule on every instance shows that these metadata servers are a bad design.

It would be much better if, instead of the network, these metadata servers were only visible as a PCIe or MMIO device. Of course, that would require a driver to be written, so at least initially, unmodified distributions would not be able to be used (but after a few years, every Linux and BSD distribution, and perhaps even Windows, would have that driver by default). That way, it would (on Linux) appear as files on /sys, readable only by root, without requiring any local firewall.

amluto · on June 29, 2021

There are ways for (virtual) firmware to expose data directly into sysfs, e.g. DMI and WMI. There are probably nicer ones, too. A virtio-9p instance exposing metadata would do the trick, too. Or trusted emulated network interface.

rjzzleep · on June 29, 2021

Is there a list of these mitigations somewhere?

nijave · on June 29, 2021

vsock might be a good solution

CodesInChaos · on July 1, 2021

I don't think vsock supports any form of access control.

formerly_proven · on June 29, 2021

So I just read up on what this is and what they're for and can't help the feeling this is an "everything is a nail" design.

CodesInChaos · on June 29, 2021

The traditional unix "everything is a file" approach would be much better than those metadata services.

jcims · on June 29, 2021

Totally agree. A block device with the contents of the metadata service in it would be nice! It becomes trivial to provide at least basic access control to the service.

rkeene2 · on June 29, 2021

This is done by some cloud software (e.g., OpenNebula). The downside is that modifying the metadata is now difficult since block devices being hot-plugged can cause issues if in use.

rkeene2 · on June 29, 2021

OpenNebula solves this by attaching an ISO image with credentials and metadata to the CD-ROM virtual device, so only root can get the credentials to make calls and also the metadata is there.

edf13 · on June 29, 2021

It's far worse... the Metadata can provision a new public ssh key to the VM (at will!).

I wasn't aware of this single point of failure

sparkling · on June 29, 2021

Very creative approach, never thought of such an attack vector.

tryauuum · on June 29, 2021

Yeah. I knew that some parts of DHCP are ipv4 and unicast... but to carry such a packet over the internet, what a bold move

londons_explore · on June 29, 2021

So why isn't the metadata server authenticated?

It would seem simple enough for googles metadata server to have a valid HTTPS certificate and be hosted on a non-internal domain. Or use an internal domain, but make pre-built images use a custom CA.

Or Google could make a 'trusted network device', rather like a VPN, which routes traffic for 169.254.169.254 (the metadata server IP address) and add metadata.google.internal to the hosts file as 169.254.169.254.

rantwasp · on June 29, 2021

how do you get the certs to the machines? ever had to rotate certs for all the machine in a datacenter?

staticassertion · on June 29, 2021

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configur...

Not mTLS, but AWS metadata v2 has moved to an authenticated session-based system. Of course, an attacker who can make arbitrary requests can create tokens for limited sessions, but it's certainly an improvement.

nijave · on June 29, 2021

Google happens to have a widely trusted CA they could sign the metadata server cert with

rantwasp · on June 29, 2021

my question is, if the CA cert needs to be rotated, how do you do this for all machines? it can be done, but it's not trivial

nijave · on June 30, 2021

Presumably the machines have a mechanism for managing their CAs, (the trust store that ships with the OS). If machines aren't being updated frequently enough to get a new CA, they're badly outdated in other ways

rantwasp · on June 30, 2021

oh yeah? and how do you update the machines/CAs?

nijave · on July 1, 2021

Using the package manager like people have been doing for years. Keeping OS CAs updated is a long solved problem

londons_explore · on June 29, 2021

To the metadata servers? They presumably hold keys to access all kinds of backend systems anyway. The certs don't require any additional trust. There must already be infrastructure in place for deploying said keys.

rantwasp · on June 29, 2021

yes and no. when doing stuff like this you will always have a chicken and egg problem.

nijave · on June 29, 2021

You could also do a hybrid where each machine gets a volume with an x509 cert and key only root has access to which can then be used to mTLS to a network service (which can then manage the certs)

That'd be a hybrid of cloud init data volume and network service

rantwasp · on June 29, 2021

you could, the problem with this approach is how do you manage these volumes and the infrastructure around it. How do you get the keys on that volume?

Usually, this trust is established when the machine is built the first time and it gets an identity and a cert assigned to it. You have the same problems (of how you control the infra and ensure that you and only you can do this on the network).

nijave · on June 30, 2021

The hypervisor or VM provisioning system can set it up. With something like certs you can just drop a <1mb iso on the host for each VM

The cert only needs to prove the VM is who it says it is

>ensure that you and only you can do this on the network You've already solved that with your VM provisioning system

If you're talking about physical/hardware, you can take more liberties with the network since it can be isolated during the initial provisioning step

willvarfar · on June 29, 2021

Has it been verified that GCE is still vulnerable?

There's clearly a communication gap between the researcher and Google. But perhaps the techies at Google saw it and fixed it and it just hasn't been communicated, or some other change in GCE has closed it or mitigated it?

SahAssar · on June 29, 2021

The author asked for an update 2021-06-08, 8 months after reporting it. If they fixed it why wouldn't they say so?

asah · on June 29, 2021

Same subnet = as another VM in your project? or random GCP VM that happens to share your subnet? Seems like pretty different risk levels...

https://github.com/irsl/gcp-dhcp-takeover-code-exec#attack-s...

kl4m · on June 29, 2021

The usual configuration is 1 or many subnets per VPC, and one or many VPC per project. a Shared VPC setup between projects is also possible but requires prior agreement from both projects.

kobalsky · on June 30, 2021

The problem is that it means that any compromised app, container or chroot can now be escalated to a full fleet takeover unless you have the mentioned mitigations in place.

Rockslide · on June 29, 2021

it literally says "same project" right there

asah · on June 29, 2021

Thanks. Sorry, just wanted to be completely sure.

corty · on June 29, 2021

Why the hell isn't the metadata server authenticated, e.g. via TLS certificates?

londons_explore · on June 29, 2021

This attack allows an adversary to move from root permissions on one VM in a GCP project to gaining root permissions on another VM in the same project.

The attack is unlikely to succeed unless the attacker knows the exact time a particular VM was last rebooted, or can cause a reboot.

Overall, this attack alone probably won't be the reason you will have to start filling in a GDPR notification to your users...

addingnumbers · on June 29, 2021

I'm super confused by the statement "The unix time component has a more broad domain, but this turns out to be not a practical problem (see later)."

I don't know what exactly he expected us to "see later," except that he knows exactly when the target machine rebooted down to a 15 second window because he already had full control over it before starting the attack...

londons_explore · on June 29, 2021

I can imagine cases where an attacker could have good knowledge of the reboot time of a system. For example, they could ping it for months waiting for a maintenance reboot.

Or they could flood the machine with requests, hoping to cause some monitoring system or human to reboot it.

But overall, it seems unlikely...

Arnavion · on June 29, 2021

>I don't know what exactly he expected us to "see later,"

See all mentions of "Attack #2"

addingnumbers · on June 30, 2021

This only makes it more confusing:

> a day of unixtime + potential pids makes ~86420 potential XIDs, this is a feasible attack vector.

The number of potential XIDs isn't timestamp PLUS pid, it's timestamp TIMES pid, right?

Totaling not "~86420" but 2,160,000, right?

But where does he get "a day" from? How does he know the target wasn't booted yesterday, or a month ago?

darkwater · on June 29, 2021

Well, it can be used in a chain of attack to jump from one compromised, external-facing machine to another only internal one that might hold more sensitive data. Or to run some inside-job (although in that case developers should not be able to spawn up a VM in production)

zomgwat · on June 29, 2021

If I understand correctly, the attack can be mitigated with the appropriate level of firewall rules. Both ingress and egress traffic should be blocked by default and selectively allowed based on need. In this case, DHCP traffic would only be allowed to 169.254.169.254.

You still have somebody in your network though, so there's that.

rantwasp · on June 29, 2021

that’s not how DHCP works. in the context of a machine coming up/renewing a lease it’s basically a broadcast and anyone on the network can reply. the traffic needs to happen on the interface where you get the ip (guessing the main interface is also using dhcp)

zomgwat · on June 29, 2021

How does traffic reach or leave the machine if a network level firewall is restricting access? It seems I have a fundamental misunderstanding of something.

res0nat0r · on June 29, 2021

Essentially this:

> The firewall/router of GCP blocks broadcast packets sent by VMs, so only the metadata server (169.254.169.254) receives them. However, some phases of the DHCP protocol don't rely on broadcasts, and the packets to be sent can be easily calculated and sent in advance.

> To mount this attack, the attacker needs to craft multiple DHCP packets using a set of precalculated/suspected XIDs and flood the victim's dhclient directly (no broadcasts here). If the XID is correct, the victim machine applies the network configuration. This is a race condition, but since the flood is fast and exhaustive, the metadata server has no real chance to win.

> Google heavily relies on the Metadata server, including the distribution of ssh public keys. The connection is secured at the network/routing layer and the server is not authenticated (no TLS, clear http only). The google_guest_agent process, that is responsible for processing the responses of the Metadata server, establishes the connection via the virtual hostname metadata.google.internal which is an alias in the /etc/hosts file.

He appears to spoof the DHCP packets to get the victim to respond to his rogue metadata server, insert the IP address of his metadata server into the /etc/hosts entry for metadata.google.internal, then is able to have his ssh pubkey installed on the victim so he can ssh to the host.

rantwasp · on June 29, 2021

if you have a network level firewall blocking DHCP traffic, you will not be able to do DHCP.

The way it works for physical hosts is that all machines in the same rack see the dhcp traffic and the TOR has a special ip helper configured where it's sending the DHCP traffic. So it's broadcast in the rack and after that point to point, but still there is zero to no security when it comes to DHCP traffic.

For VMs, I guess the hypervisor acts as the TOR with the same limitations.

southerntofu · on June 29, 2021

> any security-conscious GCP customers

Does that exist? In my book, if you're security conscious, you can only do self-hosting whether on premises or in your own bay in a datacenter.

Giving away your entire computing and networking to a third party such as Google is orthogonal to security.

SEJeff · on June 29, 2021

Confidential computing (Intel SGX, ARM TrustZone, AMD SEV-SNP) handle this by encrypting the virtual machine memory so that even having full root on the host does not expose vm compute or memory.

There are plenty of ways to do zero trust networking, a slick commercial implementation is https://tailscale.com/, which you can totally use in the cloud for secure node to node comms if you're worried about those things.

q3k · on June 29, 2021

> Confidential computing (Intel SGX, ARM TrustZone, AMD SEV-SNP) handle this by encrypting the virtual machine memory so that even having full root on the host does not expose vm compute or memory.

Google's current confidential compute offering does not prove at runtime that it's actually confidential. You just get a bit in your cloud console saying 'yep it's confidential' (and some runtime CPU bit too, but that's easily spoofable by a compromised hypervisor), but no cryptographically verifiable proof from AMD that things actually are confidential.

SEJeff · on June 29, 2021

Yes, Google tries to abstract SEV from you, but it is SEV-SNP that we really need for this. Our account manager confirmed they’re not offering SEV-SNP yet.

But you know this for metropolis, right? :)

throwaway3699 · on June 29, 2021

Most people's threat models do not assume Amazon or Google are threats. Especially when you sign very large contracts with them, the law is enough to keep them in check.

shiftpgdn · on June 29, 2021

You should absolutely consider your cloud provider a threat. What happens in a black swan even where a provider is completely compromised? Design around zero trust networks.

ClumsyPilot · on June 29, 2021

By all means, but then are assuming that your suppliers are a threat? Did you check every chip on the motherboard that comes im, verify the firmware and bios on all components, including firmware of webcams and SSD's? Who inspected source code of evrry driver? Did you vet every employee and what did you do about Intel Management engine?

All these measures are not feasible unless you are working in national security or a Megacorp, and insisting on one of them, while ignoring others, is daft

__s · on June 29, 2021

> working in national security

& for national security cases they're provided sovereign clouds

alksjdalkj · on June 29, 2021

Supply chain is still an issue in sovereign clouds. At some point there's still a trust decision, whether that's to trust the cloud provider, the hardware manufacturer, the chip manufacturer, etc.

fragmede · on June 29, 2021

For organisations with the resources to deal with an APT, great lengths are gone to in order to verify that the supply chain is trusted all the way down to the chip manufacturer. The hardware used isn't just bought from Best Buy and given a huge dose of trust, but instead there are with many many steps to verify that the eg the hard drives are using the expected firmware version. You spend as much as you can on the whole process, but if your threat model includes the CIA, China, and the FSB, it's exceeding expensive.

alksjdalkj · on June 29, 2021

I wish that were true but it's really not. At least not within the public sector, maybe wealthier private firms can afford to do that level of verification.

Anyway, even then you still need to make trust decisions. How do you verify the ICs in your HDD haven't been tampered with? How do you know the firmware wasn't built with a malicious compiler? Or that a bad actor didn't add a backdoor to the firmware? Realistically there's a lot of components in modern computers that we have no choice but to trust.

dmos62 · on June 29, 2021

It's unreasonable to always design around not trusting any third party.

SEJeff · on June 29, 2021

It really depends on your threat model. It is not always unreasonable.

Target trusted their HVAC management firm so much that they had full unsegmented access to the LAN in each store. The credit card swipe terminals in the same LAN were totally compromised and millions of users had their credit card credentials stolen.

Defense contractors and places that store / manage large amounts of money are totally within their mandates to trust no one, not even many of their own employees.

alksjdalkj · on June 29, 2021

Did Target really trust their HVAC firm or was their network just poorly segmented?

SEJeff · on June 29, 2021

Both.

Someone hacked their HVAC firm to hack target credit swipe terminals.

At the time it was the biggest hack in US history.

alksjdalkj · on June 29, 2021

Right, I'm familiar with the hack. My point is Target almost certainly didn't decide that the HVAC firm could be trusted to have access to the credit terminals - the fact that they had access was the result of poor security design, not Target's threat model.

EricE · on June 29, 2021

I've often found poor security designs justified by many of the arguments in this thread that it's unreasonable to treat everything as a threat.

They know it's a bad design but doesn't matter because the threat is too improbable. Until it isn't :p

dmos62 · on June 30, 2021

It's the everything always part of the argument that's unreasonable. You realise that that's impossible? You can't vet and control the whole stack. And, if you could, it would be prohibitively expensive.

SEJeff · on July 2, 2021

For certain use cases, it is not cost prohibitive. Take defense or banking…

SEJeff · on June 29, 2021

I’ve been in meetings where executives have said precisely this and I have tried to gently nudge them towards defense in depth.

SEJeff · on June 29, 2021

Ok fair. I see the lack of simple things like segmented vlans as a lack of a threat model entirely. They trusted them implicitly, not explicitly, through their clear incompetence. Perhaps that’s better?

I think we are mostly in agreement.

southerntofu · on June 29, 2021

Sure you must always put some levels of trust in 3rd parties. What level of trust is the important question. Ideally, you distribute that trust among several actors so a single compromise is not too much of a deal.

That's why you use different hardware vendors for your routers and servers, another vendor for your network connectivity, and yet other vendors for your software. This way, MiTM is mitigated by TLS (or equivalent) and server compromise is mitigated by a good firewall and network inspection stack. Placing all your eggs in a single Google basket is giving a lot of power to a single "don't be evil" corporation, who may get hacked or compelled by law enforcement to spy on you and your clients.

ClumsyPilot · on June 29, 2021

Do it right, and you might mitigate threats, but do it wrong, and you are introducing more points where you could be compromised - a single supplier can be audited, a 100 cannot

dangerface · on June 29, 2021

Thats the issue, Amazon and Google are dependencies that are overlooked as too big to fail. Anything overlooked because it "can't fail" is the perfect place to attack.

_lqaf · on June 29, 2021

But if you don't, your threat model is a work of fiction and you're wasting your time play-acting.

A threat model has no basis in reality if you do not accurately model threats, and your infra vendor is a glaringly obvious threat. Now, maybe that's a risk worth the tradeoffs, but how do you know that?

AnIdiotOnTheNet · on June 29, 2021

I agree, but given your how grey your text currently is I think too many HNers' careers depend on the cloud for them to ever agree with us.