Hacker News new | past | comments | ask | show | jobs | submit login

Awesome Google... Now learn what an availability zone is and stop creating them with firewalls across the same data center.

Oh and make your data centers smaller. Not so big they can be seen in Google Maps. Because otherwise, you will be unable to move those whale sized workloads to an alternative.

https://youtu.be/mDNHK-SzXEM?t=564

https://news.ycombinator.com/item?id=35713001

"Unmasking Google Cloud: How to Determine if a Region Supports Physical Zone Separation" - https://cagataygurturk.medium.com/unmasking-google-cloud-how...




Making a datacenter not visible from Google Maps, at least on most big cities where Google zones are deployed, would mean making them smaller than a car. Or even smaller than a dishwasher.

If I check London (where europe-west2 is kinda located) on Google Maps right now, I can easily discern manhole covers or people. If I check Jakarta (Asia-southeast2) things smaller than a car get confusing, but you can definitely see them.


Your comment does not address the essence of the point I was trying to make. If you have a monstrous data-center, instead of many smaller, in relative size, you are putting too many eggs on a giant basket.


What if you have dozens of big data centers?


To reinforce your point:

The scale of cloud data centres reflects the scale of their customer base, not the size of the basket for each individual customer.

Larger data centres actually improve availability through several mechanisms: more power components such as generators means the failure of any one is just a few percent instead of a total blackout. You can also partition core infrastructure like routers and power rails into more fault domains and update domains.

Some large clouds have two update domains and five fault domains on top of three zones that are more than 10km apart. You can’t beat ~30 individual partitions with your data centres at a reasonable cost!


I provided three different references. Despite the massive downvotes on my comment I guess by Google engineers, as a troll...:-)I take comfort on the fact nobody was able to advance a reference to prove me wrong.


AWS Zone is sort-roughly-kinda a GCP Region. It sounds like you want multi-region: https://cloud.google.com/compute/docs/regions-zones


> It sounds like you want multi-region

If you use Google Cloud....with the 100 ms of latency that will add to every interaction....


You haven't actually made an argument.

It is true that the nomenclature "AWS Availability Zone" has a different meaning than "GCP Zone" when discussing the physical separation between zones within the same region.

It's unclear why this is inherently a bad thing, as long as them same overall level of reliability is achieved.


The phrase "as long as the same overall level of reliability is achieved" is logically flawed when discussing physically co-located vs. geographically separated infrastructure.


Justify that claim.

In my experience, the set of issues that would affect 2 buildings close to each other, but not two buildings a mile apart, is vanishingly small, usually just last mile fiber cuts or power issues (which are rare and mitigated by having multiple independent providers), as well as issues like building fires (which are exceedingly rare, we know of, perhaps two of notable impact in more than a decade across the big three cloud providers).

Everything else is done at the zone level no matter what (onsite repair work, rollouts, upgrades, control plane changes, etc.) or can impact an entire region (non-last mile fiber or power cuts, inclement weather, regional power starvation, etc.)

There is a potential gain from physical zone isolation, but it protects against a relatively small set of issues. Is it really better to invest in that, or to invest the resources in other safety improvements?


I think you're undermining the seriousness of a physical event like a fire. Even if the likelihood of these things is "vanishingly small", the impact is so large that it more than offsets it. Taking the OVH data center fire as an example, multiple companies completely lost their data and are effectively dead now. When you're talking about a company-ending-event, many people would consider even just two examples per decade as a completely unacceptable failure rate. And it's more than just fires: we're also talking about tornados, floods, hurricanes, terrorist attacks, etc.

Google even recognizes this, and suggests that for disaster recovery planning, you should use multiple regions. AWS on the other hand does acknowledge some use cases for multiple regions (mostly performance or data sovereignty), but maintains the stance that if your only concern is DR, then a single region should be enough for the vast majority of workloads.

There's more to the story though, of course. GCP makes it easier to use multiple regions, including things like dual-region storage buckets, or just making more regions available for use. For example GCP has ~3 times as many regions in the US as AWS does (although each region is comparatively smaller). I'm not sure if there's consensus on which is the "right" way to do it. They both have pros and cons.


what happened in gcp paris region then?


One of the vanishingly small set of issues I mentioned.

It is true, and obvious, that GCP and AWS and Azure use different architectures. It does not obviously follow that any of those architectures are inherently more reliable. And even if it did, it doesn't obviously follow that any of the platforms are inherently more reliable due to a specific architectural decision.

Like, all cloud providers still have regional outages.


I think you should have started this discussion by disclosing you work at Google...

> One of the vanishingly small set of issues

At your scale, this attitude is even more concerning since the rare event at scale is not rare anymore.


I think you're abusing the saying "at scale, rare events aren't rare" (https://longform.asmartbear.com/scale-rare/ etc.) here. It is true that when you are running thousands of machines, events that happen rarely happen often, but that scale usually becomes relevant at thousands, or hundreds of thousands, or millions of things (https://www.backblaze.com/cloud-storage/resources/hard-drive...).

That concept is useful when the scale of things you have is the same order of magnitude as the rate of failure. But we clearly don't have that here, because even at scale, these events aren't common. Like I said, there have been, across all cloud providers, less than a handful over a decade.

Like, you seem to be proclaiming that these kinds of events are common and, well, no, they aren't. That's why they make the top of HN when they do happen.


To address the availability point of your comment, Google's terminology is slightly different to AWS.

On GCP it sounds like you want to have a multi region architecture, not multi-zone (if you want firewalls outside the same data center).

> Resources that live in a zone, such as virtual machine instances or zonal persistent disks, are referred to as zonal resources. Other resources, like static external IP addresses, are regional. Regional resources can be used by any resource in that region, regardless of zone, while zonal resources can only be used by other resources in the same zone.

https://cloud.google.com/compute/docs/regions-zones

(No affiliation with Google, just had a similar confusion at one point)


You also need to go multi-region with AWS. I liked their AZ story but in practice it hasn't avoided multi-zone outages (maybe deploys?)


[flagged]


This isn't even close to true. You can just go on Google Maps and visually see the literally *hundreds* of wholly-owned and custom built data centers from AWS, MS, and Google. Edge locations (like Cloud CDN) are often in colos, but the main regions compute/storage are not. Most of them are even labeled on Google Maps.

Here's a couple search terms you can just type into Google Maps and see a small fraction of what I mean:

- "Google Data Center Berkeley County"

- "Microsoft Data Center Boydton"

- "GXO council bluffs" (two locations will appear, both are GCP data centers)

- "Google Data Center - Henderson"

- "Microsoft - DB5 Datacentre" (this one is in Dublin, and is huuuuuge)

- "Meta Datacenter Clonee"

- "Google Data Center (New Albany)" (just to the east of this one is a massive Meta data center campus, and to the immediate east of it is a Microsoft data center campus under construction)

And that's just a small sample. There are hundreds of these sites across the US. You're somewhat right that a lot of international locations are colocated in places like Equinix data centers, but even then it's not all of them and varies by country (for example in Dublin they mostly all have their own buildings, not colo). If you know where to look and what the buildings look like, the custom-build and self-owned data centers from the big cloud providers are easy to spot since they all have their own custom design.


While the OP is more wrong than right they aren't completely incorrect.

I'm in Australia.

GCP has 2 regions in Australia, Sydney and Melbourne. The Sydney region is in the Equinox DC. Not sure where the Melbourne one is but it isn't a Google-owned facility.

You can see this by comparing Google's Data Center list: https://www.google.com/about/datacenters/locations/ vs their Cloud Location list https://cloud.google.com/about/locations#asia-pacific

Note that the Cloud Locations aren't just "edge": they offer hosting, GPUs etc etc at these locations.


Thank you for your words of support @nl.

I agree with you that @anewplace is clearly taking a very US / North America centric view of the world and arrogantly claiming they know everything and telling me I'm some idiot.

Its very telling that @anewplace has gone quiet and not lecturing you in a condescending manner about how you must have "misunderstood".


[flagged]


Yea, you're not the only "insider" here. And you're 100% wrong. Just because you completely misunderstand what those Amazon/MS employees are doing in those buildings doesn't mean that you know what you're talking about.

The big cloud players have the vast majority of their compute and storage hosted out of their own custom built and self-owned data centers. The stuff you see in colos is just the edge locations like Cloudfront and Cloud CDN, or the new-ish offerings like AWS Local Zones (which are a mix between self-owned and colo, depending on how large the local zone is).

Most of this is publicly available by just reading sites like datacenterdynamics.com regularly, btw. No insider knowledge needed.


This doesn't seem right for GCP.

Compare https://cloud.google.com/about/locations vs https://www.google.com/about/datacenters/locations/

The Cloud locations aren't just edge locations (scroll down on that page and note most have all APIs supported) and there are a lot more of them than there are Google-owned DCs.


[flagged]


Well those people lied to you then, or more likely there was a misunderstanding, because you can literally just look up the sites I mentioned above and see that you're entirely incorrect.

You don't need to be under NDA to see the hundreds of billions of dollars worth of custom built and self-owned data centers that the big players have.

Hell, you can literally just look at their public websites: https://www.google.com/about/datacenters/locations/

I am one of those "pay grades many layers higher", and I can personally confirm that each of the locations above is wholly owned and used by Google, and only Google, which already invalidates your claim that "you can count the wholly-owned sites on one hand". Again, this isn't secret info, so I have no issue sharing it.


I assume they are referring to PoPs, or other locations where large providers house frontends and other smaller resources.


Ignore him.

He's got a tinfoil hat on and won't be persuaded..

> Because I'm not relying on what one person or one company told me, my facts have been diligently and discretely cross-checked.

"Discretely cross-checked" already tells me he chooses to live in his own reality.


[flagged]


I'm not trying to make you divulge anything. I don't particularly care who you talk to, or who you are, nor do I care if you take it as a "personal insult" that you might be wrong.

You are right that it would be nuts that multiple senior people would collude to lie to you, which is why it's almost certainly more likely that you are just misunderstanding the information that was provided to you. It's possible to prove that you are incorrect based on publicly available data from multiple different sources. You can keep being stubborn if you want, but that won't make any of your statements correct.

You didn't ask for my advice, but I'll give it anyway: try to be more open to the possibility that you're wrong, especially when evidence that you're wrong is right in front of you. End of story.


You know it is very much region dependent.

You are correct many facilities are owned by the hyperscalers, and they also extensively use colos for hosting entire regions (not only PoPs), specially outside the US. More recently I’d also include Ireland.

I have worked at two cloud providers very close to the netops teams due to my customers, but I have signed NDAs so I won’t go further into it, specially since one of my ex-employers is very touchy about this subject.


I see anewplace hasn't come down on you (@rescbr) like a ton of bricks like he did for me.

You are basically saying the same thing I did, but with different words.

So anewplace owes me a big-time apology.


> try to be more open to the possibility that you're wrong,

Same to you chum, same to you.

Just because it doesn't look that way from your view of the world, doesn't mean you are right either.

Perhaps just accept we are both right and that you are missing aspects of my context that I cannot talk about due to the sensitive nature of it.


It can be true that all the big clouds/cdns/websites are in all the big colos and that big tech also has many owned and operated sites elsewhere.

As one of these big companies. You've got to be in the big colos because that's where you interconnect and peer. You don't want to have a full datacenter installation at one of these places if you can avoid it, because costs are high; but building your own has a long timetable, so it makes sense to put things into colos from time to time and of course, things get entrenched.

I've seen datacenter lists when I worked at Yahoo and Facebook, and it was a mix of small installations at PoPs, larger installations at commercial colo facilities, and owned and operated data centers. Usually new large installations were owned and operated, but it took a long time to move out of commercial colos too. And then there's also whole building leases, from companies that specialize in that. Outside the US, there was more likely hood of being in commercial colo, I think because of logistics, but at large system counts, the dollar efficiency of running it yourself becomes more appealing (assuming land, electricity, and fiber are available)


It is true that every cloud provider uses some edge/colo infra, but it is also not true that most (or even really any relevant) processing happens in those colo/edge locations.

Google lists their dc locations publicly: https://www.google.com/about/datacenters/locations/

Aws doesn't list the campuses as publicly, but https://aws.amazon.com/about-aws/global-infrastructure/regio... shows the AZ vs edge deployments and any situation with multiple AZs is going to have buildings, not floors operated by Amazon.

And limiting to just outside the US, both aws and Google have more than ten wholly owned campuses each, and then on top of that, there is edge/colo space.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: