The more concerning security finding here is that Google sat on this for 9 months. Assuming the claims hold, this is a serious problem for any security-conscious GCP customers. What other vulnerabilities are they sitting on? Do they have processes in place to promptly handle new ones? Doesn’t look like it…
This is especially questionable given the much shorter deadline that Project Zero gives other companies to fix bugs before publishing their vulnerabilities (regardless of whether there's been a fix). It only seems fair that Google should hold itself to the same standard.
Companies who use that response are even worse because they know very well there is no wining move from the researcher. The company have all the responsibility no matter what.
Both are Google - from an outside view we shouldn't distinguish. Google should hold itself to a consistent bar.
It highlights how divisions operate in silos at Google, and just because Project Zero causes a lot of positive security marketing for Google, it doesn't seem that the quality bar is consistently high across the company.
Also, please don't forget this is still not fixed.
Funny thing is I agree with you that Google should hold itself to that bar, but I don't agree as to Project Zero being the reason. I think we very much should distinguish Google from P0, and that P0's policy should be irrelevant here; their entire purpose is to be an independent team of security researchers finding vulnerability in software, indiscriminately. It seems a number of others here feel similarly (judging by the responses), and ironically their support for the position is probably being lost by dragging P0 into the conversation.
The reason I think Google should hold itself to that bar is something else: Google itself claims to use that bar. From the horse's mouth [1]:
> This is why Google adheres to a 90-day disclosure deadline. We notify vendors of vulnerabilities immediately, with details shared in public with the defensive community after 90 days, or sooner if the vendor releases a fix.
If they're going to do this to others as general company policy, they need to do this to themselves.
Are you suggesting Google to make all unfixed vulnerabilities public after 90 days? Would that be even if the finder does not want them to become public? Or just as an opt-out type of thing.
I'm only suggesting Google needs to fix everything in 90 days (and reveal them afterward as they consider that standard practice) so they don't have unfixed vulnerabilities past that. I don't really have opinions on what policies they should have for cases where that isn't followed, though I think of thing even having a policy for that case encourages it not to be followed to begin with.
Vulnerability deadlines are disclosure deadlines, not remediation deadlines. There's plenty of vulnerabilities that can't be fixed in that time, and I think it's fair for the public to know about them rather than keeping them secret forever.
"Fair to the public" was neither intended to be nor is the concern. Their stance has always been "better for security" and disclosing an unpatched vulnerability is generally worse for security unless you believe it'll encourage people to fix things by that deadline.
On this case knowing about this vulnerability allows you to take corrective action. If Google cannot fix the root cause this doesn't necessarily mean there aren't mitigations that can be done manually by an end user (yes it sucks, but still better than getting hacked)
When users can mitigate it I agree with you (I forgot about that case in the second half of my comment), but there have also been cases when users weren't able to do anything but they disclosed anyway, so that doesn't explain the policy.
Insecurity is invisible. Users have no way to know the weaknesses in the software they use until it's too late. Disclosure is meant to make it possible for users to see what weaknesses they might have so they can make informed decisions.
Users still benefit to know about issues that can't be fixed (think about Rowhammer, Spectre and similar), so as these attacks become more practical (eg https://leaky.page or half double) they can adjust their choices accordingly (switching browsers, devices, etc) if the risk imposed by them is too high.
Of course (using an analogy for a second), some can say that it would be better for people to never find out that they are at increased risk of some incurable disease, because they can't do anything about it.
But for software, you can't make individual decisions like that. Even if one person doesn't want to know about vulnerabilities in the software they use, others could still actually benefit to know about them, and the benefit of the many trump over the preferences of the few.
That is, unless the argument is that it's actively damaging for all of the public (or the majority) to know about vulnerabilities in the software they use. If the point is to advocate for complete unlimited secrecy, and for researchers to sit on unfixed bugs forever, then that's quite an extreme view of software security and vulnerability disclosure (but that some companies unfortunately still follow).
Disclosure policies like these aim to strike a balance between secrecy and public awareness. They put the onus of disclosure on the finder because it's their finding (and they are the deciders on how it's shared), and finders are more independent than the vendor, but I could imagine a world in which disclosure happens by default, by the company, even for unfixed bugs.
What is the thing being implied? Like as far as I can tell, Google's position seems to be that "it is best if vuln researchers have the freedom to disclose unfixed issues, especially after reporting them".
People criticize P0 for publishing issues despite companies asking for extensions. But we're criticizing Google here for...what? They didn't ask for an extension, they didn't try to prevent this person from disclosing. Where is the hypocritical thing?
The complaint is that Google's stance with Project Zero is "90 days is plenty sufficient; you're a bad vendor if you can't adhere to it", and then Google itself doesn't adhere to it, which implicates themselves here.
I see what they're saying if you lump them together; I just think it makes sense to treat P0 a little independently from Google. But otherwise it's got a point.
That's a common sentiment I just don't buy. People here love to hand-wave about some vague "benefit to the public", and maybe there is some benefit when the vulnerability can be mitigated on the user side, but it literally cannot be the case for the fraction of vulnerabilities that entities other than the vendor can do nothing about. The only "benefit" is it satisfies peoples' curiosity, which is a terrible way to do security. Yet P0 applies that policy indiscriminately.
> Can you point out the second part, specifically where "you're a bad vendor if..." is either stayed or implied py P0?
As to your question of when this is implied by P0, to me their actions and lack of a compelling rationale for their behavior I explained above is already plenty enough to imply it. But if you won't believe something unless it's in an actual quote from themselves, I guess here's something you can refer to [1]:
- "We were concerned that patches were taking a long time to be developed and released to users"
- "We used this model of disclosure for over a decade, and the results weren't particularly compelling. Many fixes took over six months to be released, while some of our vulnerability reports went unfixed entirely!"
- "We were optimistic that vendors could do better, but we weren't seeing the improvements to internal triage, patch development, testing, and release processes that we knew would provide the most benefit to users."
- "If most bugs are fixed in a reasonable timeframe (i.e. less than 90 days), [...]"
All the "reasonable time frame (i.e. < 90 days)", "your users aren't getting what they need", "your results aren't compelling", "you can do better", etc. are basically semi-diplomatic ways of saying you're a bad vendor when you're not meeting their "reasonable" 90-day timeline.
They literally directly describe it as a benefit to users, the sentiment you don't buy, and don't ever actually call vendors bad, except if you interpret the less benefit to users to be a moral impugnment of the vendors.
> They literally directly describe it as a benefit to users
"It" in that sentence does not refer to their own unpatched disclosures.
> They don't ever actually call vendors bad, except if you interpret the less benefit to users to be a moral impugnment of the vendors. What you cite proves my point!
They didn't fix it within that timeline. I don't know why everyone is saying "well they didn't stop disclosure in 90 days", but they didn't fix it in the timeline that they have allocated as being reasonable for all vulns they report.
At the limit, what you're saying would mean that vendors should feel obligated to fix issues they don't consider to be vulnerabilities, as long as they're reported as such. That'd clearly be absurd. Is there maybe some additional qualifying factor that's required to trigger this obligation that you've left implicit?
If you're leaving the determination to the vendor, they could just avoid the deadline by claiming it is not a vulnerability. That seems like a bad incentive.
There are things that literally cannot be fixed, or where the risk of the fix is higher than the risk of leaving the vulnerability open. (Even if it is publicly disclosed!)
It seems that we're all better off when these two concerns are not artificially coupled. A company can both admit that something is a vulnerability, and not fix it, if that's the right tradeoff. They're of course paying the PR cost of being seen as having unfixed security bugs, and an even bigger PR cost if the issue ends up being exploited and causes damage. But that's just part of the tradeoff computation.
I don't know what point you're trying to make here. Google acknowledges that this is a vulnerability ("nice catch"), Google pushes every other company to fix vulns in 90 days (or have it publicly disclosed, which is based on the assumption that vulns can be fixed in that time), and Google did not fix it in 90 days.
If you're asking me to create a perfect framework for disclosure, I'm not interested in doing that, and it's completely unnecessary to make a judgment of this single scenario.
> A company can both admit that something is a vulnerability, and not fix it, if that's the right tradeoff.
Google's 90 days policy is designed explicitly to give companies ample time to patch. And yes, this is them paying the PR cost - I am judging them negatively in this discussion because I agree with their 90 day policy.
I am saying that there are things that are technically vulnerabilities that are not worth fixing. Either they are too risky or expensive to fix, or too impractical to exploit, or too limited in damage to actually worry about. Given the line you drew was that there must be a fix in 90 days, if the company agrees it is a vulnerability, the logical conclusion is that the companies would end up claiming "not a vulnerability" when they mean WONTFIX.
If you think this particular issue should have been fixed within a given timeline, it should be on the merits of the issue itself. Not just by following a "everything must be fixed in 90 days" dogma. All that the repeated invocations of PZ have achieved is drown out any discussion on the report itself. How serious/exploitable is it actually, how would it be mitigated/fixed, what might have blocked that being done, etc. Seems like those would have been far more interesting discussions than a silly game of gotcha.
(If you believe there is no such thing as a vulnerability that cannot be fixed, or that's not worth fixing, then I don't know that we'll find common ground.)
> Given the line you drew was that there must be a fix in 90 days, if the company agrees it is a vulnerability, the logical conclusion is that the companies would end up claiming "not a vulnerability" when they mean WONTFIX.
OK, but that doesn't apply here, which is why I don't get why you're bringing up general policy issues in this specific instance. Google did acknowledge the vulnerability, as noted in the disclosure notes in the repo.
So like, let me just clearly list out some facts:
* Project 0 feels that 90 days is a good timeline for the vast majority of vulns to be patched (this is consistent with their data, and appears accurate)
* This issue was acknowledged by Google, though perhaps not explicitly as a vulnerability, all that I can see is that they ack'd it with "Good catch" - I take this as an ack of vulnerability
* This issue is now 3x the 90 day window that P0 considers to be sufficient in the vast majority of cases to fix vulnerabilities
I don't see why other information is supposed to be relevant. Yes, vendors in some hypothetical situation may feel the incentive to say "WONTFIX" - that has nothing to do with this scenario and has no bearing on the facts.
> If you think this particular issue should have been fixed within a given timeline, it should be on the merits of the issue itself.
That's not P0s opinion in the vast majority of cases - only in extreme cases, to my knowledge, do they break from their 90 day disclosure policy.
> Not just by following a "everything must be fixed in 90 days" dogma.
Dogma here is quite helpful. I see no reason to break from it in this instance.
> Seems like those would have been far more interesting discussions than a silly game of gotcha.
I'm not saying "gotcha", I'm saying that:
a) 9 months to fix this feels very high, Google should explain why it took so long to restore confidence
b) The fact that they have an internal culture of 90 days being a good time frame for patching purely makes it ironic - it is primarily the fact that I think this should have been patched much more quickly that would bother me as a customer.
> (If you believe there is no such thing as a vulnerability that cannot be fixed, or that's not worth fixing, then I don't know that we'll find common ground.)
Nope, 100% there are vulns that can't be fixed, vulns that aren't worth fixing, etc. But again, Google didn't say this was a "WONTFIX" though, and they did ack that this is a vuln. If it wasn't possible to fix it they could say so, but that isn't what they said at all, they just said they weren't prioritizing it.
If it's the case that this simply isn't patchable, they should say so. If they think this doesn't matter, why not say so? It certainly seems patchable.
It's not what happened, but the logical outcome of what you propose. Right now the rules are simple: "disclosure in 90 days, up to you whether to fix it". What you're proposing is that it is no longer up to the company to make that tradeoff. They must always fix it.
> That's not P0s opinion in the vast majority of cases - only in extreme cases, to my knowledge, do they break from their 90 day disclosure policy.
Again, that is a disclosure timeline. Not a demand for a fix in that timeline. In general it's in the vendors best interest release a fix in that timeline, especially given its immutability. You're trying to convert it to a demand for a fix no matter what. That is not productive.
> a) 9 months to fix this feels very high, Google should explain why it took so long to restore confidence
So why not argue for that explicitly? It seems like a much stronger approach than the "lol PZ hypocricy" option.
You're trying to talk about consequences of my statement, which I'm trying very hard not to talk about, because I don't care. I'm only talking about this very specific instance.
> Again, that is a disclosure timeline. Not a demand for a fix in that timeline.
Yes and it is based on the expectation of a fix within that timeline being practical.
> You're trying to convert it to a demand for a fix no matter what. That is not productive.
No I'm not, you're trying to say that I am, repeatedly, and I keep telling you I don't care about discussing disclosure policy broadly. I'm only talking about this once instance.
> It seems like a much stronger approach than the "lol PZ hypocricy" option.
Take that up with the person who posted about P0 initially. I'm only saying that it's ironic and that I support the 90 day window as being a very reasonable time to fix things, and that them going 3x over is a bad look.
> Again, that is a disclosure timeline. Not a demand for a fix in that timeline. In general it's in the vendors best interest release a fix in that timeline, especially given its immutability. You're trying to convert it to a demand for a fix no matter what.
I don't see what form it would come in if it were a demand in your view. We have a disagreement over private entities over a vulnerability; how would one "force" the other to do that except by disclosing it? Hold someone hostage?
> Google pushes every other company to fix vulns in 90 days (or have it publicly disclosed)
I believe you're mistaken about the conditional publishing. The 90 day clock starts when google reports the bug - they will make it public whether or not the vulnerability is remediated (with very few exceptions). By all appearances, Google is very willing to be on the receiving end of that on the basis that End-Users can protect themselves when they get the knowledge - in this case, GCE users are now aware that their servers are exploitable and make changes - like moving to AWS. I think the 90-day clock is reasonable stance to take, for the public (but not necessarily for the vendor).
http://g.co/appsecurity has more details but TL;DR is that Google is supportive of people disclosing unfixed bugs after 90 days, which is what happened here.
While there are a series of vulnerabilities here, none of them would be exploitable in this way if the metadata server was accessed via an IP instead of the hostname metadata.google.internal.
The metadata server is documented to be at 169.254.169.25, always[1]. But Google software (agents and libraries on VMs) resolves it by looking up metadata.google.internal. If metadata.google.internal isn't in /etc/hosts, as can be the case in containers, this can result in actual DNS lookups over the network to get an address that should be known.
AWS uses the same address for their metadata server, but accesses via the IP address and not some hostname[2].
I've seen Google managed DNS servers (in GKE clusters) fall over under the load of Google libraries querying for the metadata address[3]. I'm guessing Google wants to maintain some flexibility, which is why they are using a hostname, but there are tradeoffs.
It does not even take a lot. I run a production service on cloud run, the typical load is around 500qps, and the dns queries to resolve metadata server do fail frequently enough for this to be noticeable.
That is true, I was thinking specifically about the metadata and SSH keys. But DHCP can also set DNS servers, NTP servers, and other things that can either cause disruptions or be used to facilitate a different attack.
There might be a persistence issue, it seems like part of this attack was that the IP was persisted to /etc/hosts even after the real DHCP server took over again. But even just writing to /etc/hosts could open the door redirecting traffic to an attacker controlled server.
While it certainly seems like a fairly serious vulnerability I think it's worth highlighting that this attack requires that either you already have access to a machine on the same subnet as the target machine or that the firewall in front of the target machine is very lax. That's a pretty high bar for getting the attack to work in the wild.
-1: consider if you have complex vendor software on a GCE VM inside a larger GCP project... now a vulnerability in that vendor software means the whole GCP project is exposed. Vendor fixes are notoriously slow, so in practice you have to isolate vendor software to separate GCP projects.
Real example: I have a client with precisely this situation, and elsewhere in the GCP project is PII consumer data requiring public disclosure if exposed.
Luckily, GCE limits Cloud API access available to an instance to nothing by default. Meaning that access to BigQuery, Storage, SQL, User manage, etc is not allowed on that VM, even with root access to the VM, unless configured by the administrator.
This at least mitigates the the impact of this exploit to a degree. If a web server has access only to Cloud SQL, an attacker cannot use access to that VM to go digging around in Cloud Storage buckets unless GCS access is granted explicitly to the VM.
From there, IAM security applies. So even if the VM has Cloud API access to, say Big Query, the limitations of that Service Account then applies to any data that is accessed.
Well, so GCP is pretty big on serverless architecture. It is completely possible that there is no GCE instance at all which has Cloud API access to a particular service.
A company might be making heavy use of BigQuery in their project, but have a data processing pipeline that uses tools like Cloud Functions, Scheduled Queries, and BQ Transfer Service to push sanitized data into a Cloud SQL instance for the front end to use.
So no GCE instance will need Cloud API access to BigQuery, so no matter what level of access is obtained by an intruder on any VM, they will never be able to access BigQuery.
I completely agree and didn't mean to suggest it wasn't a serious vulnerability: as I understand it this attack means that if any VM in the subnet is compromised that can be leveraged in to an attack on any VM in the subnet so your attack surface has suddenly gotten a lot bigger and your isolation between VMs substantially poorer.
Most of my GCP clients shove PII into a GCP Service, like BQ. It's not put on a "server" per se, so firewall rules don't really apply here. The appropriate thing to assert is that necessary IAM permissions should be granted explicitly.
This is usually the case. As most of my clients use isolated GCP projects for housing PII data. This forces IAM permissions to be granted service accounts, which, hopefully, means that administrators are cognizant of the levels of access that they are granting to service accounts.
Not a guarantee, mind you, but hopefully some redflags would be raised if a someone requested PII-level access for a Service Account associated with a public facing web server.
IIUC this is insufficient (!) - even with a firewall between them, a VM now vulnerable to attack from another VM on the subnet (in the same GCP project).
I think the real learning here is not to colocate different things in a single GCP project. AFAIK projects don't cost anything so why not create one per service?
The way you're wording this suggests that this was a sensible design prior to this vulnerability, but in fact all sorts of tools, config, etc work within a project but not across projects, including IAM. Yes obviously anything can be duplicated but it's a big pain.
Probably easier to create separate subnets for VMs that don't trust each other ?
IAM works across projects easily. The "Organization" concept it Google Cloud is used to collect projects, and manage permissions of groups (or subfolders) of projects very easily.
Note that "have access to a machine on the same subnet" really just means "can send traffic to the VM local network". In other words, partial compromises (ex: docker container, VM, chroot) that have access to this network are enough, as well as attacks that let you send traffic (ex: TURN server abuse)
This is well worth reading. It describes how, through a series of well meaning steps, you shoot yourself in the face.
It all starts with:
"Note that the last 4 bytes (0a:80:00:02) of the MAC address (42:01:0a:80:00:02) are actually the same as the internal IP address of the box (10.128.0.2). This means, 1 of the 3 components is effectively public."
As someone who knows nothing about networking, can you plz explain why they set up the config as this? Does that mean the third byte from top must be 0a?
BTW I just checked my corporate intranet setup and MAC has nothing to do with the ip address.
Quite. MAC addresses don't need to line up with IP addresses or vice versa. They are completely different things. However in IPv6 there is a standard for link local addresses that does have a correlation between IP and MAC but there you go. It's designed to avoid clashes and is not the finest design decision ever made! That last will probably creep into a CVE one day just like this chain of events.
The config as designed probably looked like a good idea at the time. When you are worrying about millions of things in an address space like IPv4 and MAC, then having them tie up in some way may be useful for lookups in a very large database or two.
However, giving information away about something that was never designed from the outset to do so is not a good idea.
If you follow the chain of reasoning in the github post you can see that you can break the chain at multiple points by not "being clever". If you start by not making your IP addresses follow the MAC address you kill this problem off before it starts.
At the risk of becoming a real bore, I'll spell out why I think this is a really, really dumb thing:
If you have anything to do with fire fighting (and we all do in a way) you soon learn that there are three things required for a fire:
* Something to burn
* Air (strictly speaking: Oxygen)
* Something to start the fire (source of ignition)
Fire prevention is based around avoiding having all three things together at any point in time. So, putting out a fire often involves removing oxygen or removing the burning thing. Prevention can be as simple as making a rule at home that no one puts dish cloths on the hob to dry. That last phrase may need some translating!
So, you put your webby app or VM out on the internets for all to see and play with. Unlike your home front door, every Tom, Dick and Harry in the world can go and take a playful kick at it. So you need to take some care with it. There is no simple set of three factors that you can protect against as there is for fire prevention. Instead you need to follow some good practice and hope for the best.
One good (not best - there is no such thing) practice is to avoid giving away information unnecessarily. Linking IP to MAC is such a thing. Do it at home if you must but don't do it in public.
The automatic host numbering feature in the IPV6 standard (modified EUI-64,RFC 4291) was a big mistake. But I thought that worked the other way? that the MAC was part of the IP, not the IP part of the MAC.
"that the MAC was part of the IP, not the IP part of the MAC."
The IPv6 link local address is derived from the MAC address. I can't be arsed to look up the current RFCs so let's take a look at my laptop, that I'm using now (yes, I have changed a few digits but only for global addresses):
2: wlp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether e4:70:b8:f1:6b:5c brd ff:ff:ff:ff:ff:ff
inet 10.200.201.164/24 brd 10.200.200.255 scope global dynamic noprefixroute wlp2s0
valid_lft 73049sec preferred_lft 73049sec
inet6 2001:3d49:ad52:ddc8:6ba8:e800:8e96:143b/64 scope global temporary dynamic
valid_lft 86387sec preferred_lft 14387sec
inet6 2001:3d49:ad52:ddc8:5203:e5fe:5ed0:c173/64 scope global dynamic mngtmpaddr noprefixroute
valid_lft 86387sec preferred_lft 14387sec
inet6 fe80::506d:9c2f:8b7b:1d7e/64 scope link noprefixroute
valid_lft forever preferred_lft forever
My link local address is fe80::506d:9c2f:8b7b:1d7e and my MAC address for that interface is e4:70:b8:f1:6b:5c
<sound-effect>scratched-record</sound-effect>
It seems that times have changed. This is a laptop that uses Network Manager on Arch Linux. If I had to guess I suspect that the bloke who does NM has fixed that flaw or he's following a newer standard/RFC than I've (bothered to have) heard of.
All this stuff points out a strange dichotomy: you want to be seen ("Hello look at my website") and yet you don't want to be seen by the baddies. You want to draw attention to your wares but not have a bunch of state sponsored folk poking at your unmentionables and nicking your cash.
As of fairly recently, we are seeing quite a lot of remediation (for want of a better word) by additional state sponsored actors than is considered normal. These are not the usual lot who piss on your prized Begonias. This lot seem to know when to widdle effectively.
Yeah, automatic host discovery puts the MAC address (almost) verbatim at the end of an interface IP, but it's an optional feature.
In the vuln report, it turns out automatic host discovery is used after all, but the IPv4 is also based on this, which confused me.
The MAC address was:
42:01:0a:80:00:02
The IPv6 address was:
fe80::4001:aff:fe80:2
It's not just the last four bytes, it's the whole MAC address, but 0xFFFE is crammed in the middle of the MAC address, and the first byte's second to last bit is flipped. We can see that this is exacly the scheme that was used. Now the confusing part is that the IPv4 address is apparently also the MAC address:
Now the confusing part is that the IPv4 address is apparently also based on the MAC address, which means the IPv4, and MAC addresses can both be derived from the IPv6 address, and the IPv4 address can be derived from the MAC address.
(The entire host section of the IPv6 address is based on the MAC iirc, so the network section is the only part you need to guess there.)
Thanks! I don't understand the technical details but (combining multiple responds) I think I understand a bit now. I also remember that back when I was in the university we used sort of university intranet (every personal computer connected has to go through it to the Internet, but it's free anyway) and there was a lot of ARP attack then, and I learned to use arp -a and arp -v (maybe wrong).
This superficial dismissal doesn't even make sense. Google didn't control the disclosure timeline here. The people who found these vulnerabilities could have published on T+90, or, for that matter, T+1. Meanwhile, the norm that does exist (it's a soft norm) is that you respect the disclosure preferences of the person who reports the vulnerability to the extent you reasonably can.
I'm not sure I even understand the impulse behind writing a comment like this. Assume Google simply refuses to apply the P0 disclosure rule to themselves (this isn't the case, but just stipulate). Do you want them to stop funding Project Zero? Do you wish you knew less about high profile vulnerabilities?
I think you're mixing up the projects a bit. This vulnerability doesn't seem to be going via Project Zero. Other google teams are known to sometimes not react in time and when the reports are going through Project Zero, they are disclosed at 90 days, regardless if the project is internal or not. (I remember an unpatched Chrome vulnerability was published by Project Zero at the end of deadline)
So the Zero timeline applies to everyone the same. It doesn't mean fixes actually happen on the same timeline.
Their point is, Project Zero is a very public Google project to hold other companies (and yes, themselves too) accountable with disclosure policies Google (as a company) presumably stands behind. Thus it is quite ironic for an issue to not be fixed in a year.
Yes, sometimes other Google teams miss the deadline set by Project Zero too. That's not the point.
> The researcher here allowed 9 months for them to fix.
The researcher is basically allowed to do whatever they want here. They can wait 0 days and just post to the full-disclosure mailing list. Or they could never disclose it.
Personally, I've done both. I took DJB's "Unix Security Holes" class, where we had to find 10 vulnerabilities in OSS as the class's final project. All of those got 0-day disclosed because that is how DJB rolls. I've also independently found bugs, and I was satisfied by the resolution from emailing security@ that company.
He could have if he wanted to. He could have disclosed immediately if he wanted to.
Google does not have a hard policy to disclose. GPZ does. Vulns in external products found through other groups within Google do not share all the same processes as GPZ.
GPZ is not some independent entity google just funds, they are as much part of Google as any other team.
If you want to be that precise, it is bad look for part of your organization to have hard policy that you expect external companies to follow, while parts of your organization itself cannot do the same.
I am not saying Project Zero is wrong, clearly giving more time did not prod Google to actually fix timely, he certainly was being too polite and gave too much time, I don't know why, perhaps companies don't pay bounties if you disclose without their consent [2] ?
All I am just saying Google as a company should hold itself to the same hard standard and fix issues in 90 days this is what Google Project Zero as a team expects other companies[1] to do so, they will even reject requests for extensions.
As a company if they can't do it, they shouldn't expect others to do it either right? Or they should disclose reported vulnerabilities even if not fixed in 90 days.
[1] Maybe they do it for internal teams as well, but that not relevant to us, all we should be concerned is how they behave externally with disclosing and solving issues.
[2] Perhaps part of the reason GPZ is able to do this hard policy is because they don't depend on bug bounties as source of income as independent researchers do.
The timeline for disclosure is set by the reporter, no? "We will disclose this bug on YYYY-MM-DD". The other side can ask for an extension, but has no intrinsic right to one. Unless I am missing something, this has nothing to do with PZ, so their default timeline is totally irrelevant.
The post had no indication of a bounty being held hostage. From https://www.google.com/about/appsecurity/reward-program/ it seems like the requirement for a bounty is not for "don't disclose before fix is released", but "don't disclose without reasonable advance notice".
So I just don't see the inconsistency here. Project Zero gives a disclosure deadline. The reporter here chose not to give one. When they said they wanted to disclose, there was no stalling for extra time. Just what is the expectation here?
It's so strange to me that they have a process for adding a root key that involves no authentication at all. These are VMs with their images running their pre-installed software, it's not like this would have been a hard problem.
There's a really simple (albeit hacky) workaround which can be deployed fairly quickly. In /etc/dhcp/dhclient-exit-hooks.d/google_set_hostname replace this line:
if [ -n "$new_host_name" ] && [ -n "$new_ip_address" ]; then
with this:
if [ -n "$new_host_name" -a ! "$new_host_name" =~ metadata.google.internal ] && [ -n "$new_ip_address" ]; then
(Yes, =~ is a bashism, but google_set_hostname is a bash script.)
This prevents /etc/hosts from getting poisoned with a bogus entry for the metadata server. Of course, dhcpd should also be fixed to use a better random number generator, and the firewall should be default stop dhcp packets from any IP address other than Google's DHCP server. Belt and suspenders, after all. But fixing the dhclient exit hooks is a simple text edit.
The reason why I used the regex match is because the attacker might try to add one or spaces as a prefix and/or suffix, e.g " metadata.google.internal " which wouldn't match "metadata.google.internal" but the spaces in /etc/hosts name would be ignored and still be effective in poisoning the /etc/hosts lookup for metadata.google.internal.
The combination of dhcp, magic metadata servers, and cloud-init feels so awkward way of managing VM provisioning. I'm thinking would having a proper virtual device or maybe something on uefi layer clean up things?
Last time I helped administer a deployment on one of these clouds, one of the first things we did on the startup script for the instances was to install an iptables rule so that only uid 0 (root) could talk to the metadata servers. The need for that kind of firewall rule on every instance shows that these metadata servers are a bad design.
It would be much better if, instead of the network, these metadata servers were only visible as a PCIe or MMIO device. Of course, that would require a driver to be written, so at least initially, unmodified distributions would not be able to be used (but after a few years, every Linux and BSD distribution, and perhaps even Windows, would have that driver by default). That way, it would (on Linux) appear as files on /sys, readable only by root, without requiring any local firewall.
There are ways for (virtual) firmware to expose data directly into sysfs, e.g. DMI and WMI. There are probably nicer ones, too. A virtio-9p instance exposing metadata would do the trick, too. Or trusted emulated network interface.
Totally agree. A block device with the contents of the metadata service in it would be nice! It becomes trivial to provide at least basic access control to the service.
This is done by some cloud software (e.g., OpenNebula). The downside is that modifying the metadata is now difficult since block devices being hot-plugged can cause issues if in use.
OpenNebula solves this by attaching an ISO image with credentials and metadata to the CD-ROM virtual device, so only root can get the credentials to make calls and also the metadata is there.
It would seem simple enough for googles metadata server to have a valid HTTPS certificate and be hosted on a non-internal domain. Or use an internal domain, but make pre-built images use a custom CA.
Or Google could make a 'trusted network device', rather like a VPN, which routes traffic for 169.254.169.254 (the metadata server IP address) and add metadata.google.internal to the hosts file as 169.254.169.254.
Not mTLS, but AWS metadata v2 has moved to an authenticated session-based system. Of course, an attacker who can make arbitrary requests can create tokens for limited sessions, but it's certainly an improvement.
Presumably the machines have a mechanism for managing their CAs, (the trust store that ships with the OS). If machines aren't being updated frequently enough to get a new CA, they're badly outdated in other ways
To the metadata servers? They presumably hold keys to access all kinds of backend systems anyway. The certs don't require any additional trust. There must already be infrastructure in place for deploying said keys.
You could also do a hybrid where each machine gets a volume with an x509 cert and key only root has access to which can then be used to mTLS to a network service (which can then manage the certs)
That'd be a hybrid of cloud init data volume and network service
you could, the problem with this approach is how do you manage these volumes and the infrastructure around it.
How do you get the keys on that volume?
Usually, this trust is established when the machine is built the first time and it gets an identity and a cert assigned to it. You have the same problems (of how you control the infra and ensure that you and only you can do this on the network).
Has it been verified that GCE is still vulnerable?
There's clearly a communication gap between the researcher and Google. But perhaps the techies at Google saw it and fixed it and it just hasn't been communicated, or some other change in GCE has closed it or mitigated it?
The usual configuration is 1 or many subnets per VPC, and one or many VPC per project. a Shared VPC setup between projects is also possible but requires prior agreement from both projects.
The problem is that it means that any compromised app, container or chroot can now be escalated to a full fleet takeover unless you have the mentioned mitigations in place.
This attack allows an adversary to move from root permissions on one VM in a GCP project to gaining root permissions on another VM in the same project.
The attack is unlikely to succeed unless the attacker knows the exact time a particular VM was last rebooted, or can cause a reboot.
Overall, this attack alone probably won't be the reason you will have to start filling in a GDPR notification to your users...
I'm super confused by the statement "The unix time component has a more broad domain, but this turns out to be not a practical problem (see later)."
I don't know what exactly he expected us to "see later," except that he knows exactly when the target machine rebooted down to a 15 second window because he already had full control over it before starting the attack...
I can imagine cases where an attacker could have good knowledge of the reboot time of a system. For example, they could ping it for months waiting for a maintenance reboot.
Or they could flood the machine with requests, hoping to cause some monitoring system or human to reboot it.
Well, it can be used in a chain of attack to jump from one compromised, external-facing machine to another only internal one that might hold more sensitive data. Or to run some inside-job (although in that case developers should not be able to spawn up a VM in production)
If I understand correctly, the attack can be mitigated with the appropriate level of firewall rules. Both ingress and egress traffic should be blocked by default and selectively allowed based on need. In this case, DHCP traffic would only be allowed to 169.254.169.254.
You still have somebody in your network though, so there's that.
that’s not how DHCP works. in the context of a machine coming up/renewing a lease it’s basically a broadcast and anyone on the network can reply. the traffic needs to happen on the interface where you get the ip (guessing the main interface is also using dhcp)
How does traffic reach or leave the machine if a network level firewall is restricting access? It seems I have a fundamental misunderstanding of something.
> The firewall/router of GCP blocks broadcast packets sent by VMs, so only the metadata server (169.254.169.254) receives them. However, some phases of the DHCP protocol don't rely on broadcasts, and the packets to be sent can be easily calculated and sent in advance.
> To mount this attack, the attacker needs to craft multiple DHCP packets using a set of precalculated/suspected XIDs and flood the victim's dhclient directly (no broadcasts here). If the XID is correct, the victim machine applies the network configuration. This is a race condition, but since the flood is fast and exhaustive, the metadata server has no real chance to win.
> Google heavily relies on the Metadata server, including the distribution of ssh public keys. The connection is secured at the network/routing layer and the server is not authenticated (no TLS, clear http only). The google_guest_agent process, that is responsible for processing the responses of the Metadata server, establishes the connection via the virtual hostname metadata.google.internal which is an alias in the /etc/hosts file.
He appears to spoof the DHCP packets to get the victim to respond to his rogue metadata server, insert the IP address of his metadata server into the /etc/hosts entry for metadata.google.internal, then is able to have his ssh pubkey installed on the victim so he can ssh to the host.
if you have a network level firewall blocking DHCP traffic, you will not be able to do DHCP.
The way it works for physical hosts is that all machines in the same rack see the dhcp traffic and the TOR has a special ip helper configured where it's sending the DHCP traffic. So it's broadcast in the rack and after that point to point, but still there is zero to no security when it comes to DHCP traffic.
For VMs, I guess the hypervisor acts as the TOR with the same limitations.
Confidential computing (Intel SGX, ARM TrustZone, AMD SEV-SNP) handle this by encrypting the virtual machine memory so that even having full root on the host does not expose vm compute or memory.
There are plenty of ways to do zero trust networking, a slick commercial implementation is https://tailscale.com/, which you can totally use in the cloud for secure node to node comms if you're worried about those things.
> Confidential computing (Intel SGX, ARM TrustZone, AMD SEV-SNP) handle this by encrypting the virtual machine memory so that even having full root on the host does not expose vm compute or memory.
Google's current confidential compute offering does not prove at runtime that it's actually confidential. You just get a bit in your cloud console saying 'yep it's confidential' (and some runtime CPU bit too, but that's easily spoofable by a compromised hypervisor), but no cryptographically verifiable proof from AMD that things actually are confidential.
Yes, Google tries to abstract SEV from you, but it is SEV-SNP that we really need for this. Our account manager confirmed they’re not offering SEV-SNP yet.
Most people's threat models do not assume Amazon or Google are threats. Especially when you sign very large contracts with them, the law is enough to keep them in check.
You should absolutely consider your cloud provider a threat. What happens in a black swan even where a provider is completely compromised? Design around zero trust networks.
By all means, but then are assuming that your suppliers are a threat? Did you check every chip on the motherboard that comes im, verify the firmware and bios on all components, including firmware of webcams and SSD's? Who inspected source code of evrry driver? Did you vet every employee and what did you do about Intel Management engine?
All these measures are not feasible unless you are working in national security or a Megacorp, and insisting on one of them, while ignoring others, is daft
Supply chain is still an issue in sovereign clouds. At some point there's still a trust decision, whether that's to trust the cloud provider, the hardware manufacturer, the chip manufacturer, etc.
For organisations with the resources to deal with an APT, great lengths are gone to in order to verify that the supply chain is trusted all the way down to the chip manufacturer. The hardware used isn't just bought from Best Buy and given a huge dose of trust, but instead there are with many many steps to verify that the eg the hard drives are using the expected firmware version. You spend as much as you can on the whole process, but if your threat model includes the CIA, China, and the FSB, it's exceeding expensive.
I wish that were true but it's really not. At least not within the public sector, maybe wealthier private firms can afford to do that level of verification.
Anyway, even then you still need to make trust decisions. How do you verify the ICs in your HDD haven't been tampered with? How do you know the firmware wasn't built with a malicious compiler? Or that a bad actor didn't add a backdoor to the firmware? Realistically there's a lot of components in modern computers that we have no choice but to trust.
It really depends on your threat model. It is not always unreasonable.
Target trusted their HVAC management firm so much that they had full unsegmented access to the LAN in each store. The credit card swipe terminals in the same LAN were totally compromised and millions of users had their credit card credentials stolen.
Defense contractors and places that store / manage large amounts of money are totally within their mandates to trust no one, not even many of their own employees.
Right, I'm familiar with the hack. My point is Target almost certainly didn't decide that the HVAC firm could be trusted to have access to the credit terminals - the fact that they had access was the result of poor security design, not Target's threat model.
It's the everything always part of the argument that's unreasonable. You realise that that's impossible? You can't vet and control the whole stack. And, if you could, it would be prohibitively expensive.
Ok fair. I see the lack of simple things like segmented vlans as a lack of a threat model entirely. They trusted them implicitly, not explicitly, through their clear incompetence. Perhaps that’s better?
Sure you must always put some levels of trust in 3rd parties. What level of trust is the important question. Ideally, you distribute that trust among several actors so a single compromise is not too much of a deal.
That's why you use different hardware vendors for your routers and servers, another vendor for your network connectivity, and yet other vendors for your software. This way, MiTM is mitigated by TLS (or equivalent) and server compromise is mitigated by a good firewall and network inspection stack. Placing all your eggs in a single Google basket is giving a lot of power to a single "don't be evil" corporation, who may get hacked or compelled by law enforcement to spy on you and your clients.
Do it right, and you might mitigate threats, but do it wrong, and you are introducing more points where you could be compromised - a single supplier can be audited, a 100 cannot
Thats the issue, Amazon and Google are dependencies that are overlooked as too big to fail. Anything overlooked because it "can't fail" is the perfect place to attack.
But if you don't, your threat model is a work of fiction and you're wasting your time play-acting.
A threat model has no basis in reality if you do not accurately model threats, and your infra vendor is a glaringly obvious threat. Now, maybe that's a risk worth the tradeoffs, but how do you know that?