Something I appreciate very much with AWS is how much dogfooding Amazon does. In contrast, Google has said in the past that GCP is not used by Google engineers (and it shows, occasionally GCP goes down but actual Google products do not!). There is also no clear indication that Microsoft is actively using the public deployment of Azure.
Part of is because gcp became a product much later in google's history. As a result, most of the products that were used were tied to internal google infra, and was hard to put on gcp.
I think this will gradually change. Kubernetes came out of all the internal learning after all, I would be surprised if they aren't putting new greenfield stuff on GCP and other Google Cloud products.
I would guess at least number of backend services for Xbox One titles. I remember seeing an advert for Titanfall being run on Azure. That was a few years ago so my memory may be flawed.
I suspect 343 Industries (Microsoft Studio behind the latest Halo games) also uses it. They are one of the big users of Project Orleans https://dotnet.github.io/orleans/, a C# distributed actor framework. Most of their documentation and samples are shown using the framework in Azure.
> In contrast, Google has said in the past that GCP is not used by Google engineers (and it shows, occasionally GCP goes down but actual Google products do not!)
Amazon (the shopping website) is generally online when AWS goes down, so that doesn't really indicate anything.
Also, GCP is directly modeled after the infrastructure that Google uses to develop internally. I don't know if it's actually the same exact infrastructure that they sell as GCP, but it's not like they're developing GCP fresh and selling it.
Netflix is also generally online when (parts of) AWS goes down, because they've architected their services to be resilient underlying failures (as Amazon and Google have too!). Netflix has blogged quite heavily about this, for example in a relatively recent blog about multi-region redundancy[0].
This is on point, GCP is modeled after Google's infrastructure not how Google itself is run. Whereas Amazon runs on the same technology just not in the same datacenter.
(My impression is that Amazon operates out of a different region than the public AWS. Much like GovCloud isn't the same as the public regions. Hence an AWS region being impacted doesn't necessarily impact Amazon itself since their separately operated.)
AWS never went down. What goes down occasionally are subset of its services in limited regions. And sometimes that does impact the amazon.com website also. Just not every aspect of it.
The term "dogfooding" is a tech slang for the use of one's own products. In some uses, it implies that developers or companies are using their own products to work out bugs, as in beta testing. One benefit of dogfooding is that it shows that a company is confident about its products.
I didnt know what the term meant so had to look it up. Yeah, I agree.
I’m curious, how do you prep for things like the Prime Day? I imagine almost all parts of Amazon.com and AWS operate at their limits, especially resource-wise. What precautions do you take?
Jeff covered this in the post (GameDays, excessive amounts of auditing, etc.).
Regarding the resource limits you suggest, they mention metrics of 50+ pB of data movement and 3.34 trillion DynamoDB queries in 30 hours, all of which they elastically scale down after the event... so I'd say resource limitations are more in terms of humans on-deck and crisis management, rather than physical limitation of hardware.
(edited to correct size - was 52 pB not 520 pB...)
I can't speak for the whole company, but I've done 5 peaks (holiday seasons) as an SDE in the Warehouse/Delivery orgs.
The key to remember about Amazon DevOps is that Developers are also DevOps. My team would usually share a dedicated DevOps team with 3 to 6 other teams- usually rotating front-line on-call duty between 2 or 3 people in the US and 2-3 in India. That DevOps person has the job of: what is the problem? Do my teams own this problem (if not, redirect to the right place)? Do I know how to immediately fix this? If not, for which of my teams do I page the on-call SDE?"
When you have a great DevOps team, SDE oncall duty is a walk in the park. But DevOps people take time to become great- and many of them are also applying for transfers to SDE roles.
Overall, each org decides how on-call/DevOps is. If time and effort and spent investing in stable software, it's easy. If other priorities get in the way, things can get bad.
Sure, and for the second year in a row, the Amazon website told me a price for a deal, told me the deal was still available, and then wouldn't let me purchase it.
Overall things went well for Amazon, but flawless? I think not.
If the AWS cost estimator is not broken, it wouldn't even be _crazy_ expensive at ~8mio/month. Snap with 1 billion committed to AWS over the next 5 years could definitely come close to the same order of magnitude on a consistent basis.
This is a good example of the stupidity of the normal rules that Hacker News enforces about article titles. I am glad they made an exception in this case. I never would have read this if the headline text had been "Prime Day 2017 – Powered by AWS". I did read the article, because in this case, Hacker News allowed the title to highlight what I would find interesting: the incredible performance of DynamoDB.
I don't think the rules actually say you can't edit titles. At best, they ask that you please use the original title unless misleading or linkbait.
You could argue that the original title is misleading, because you would've thought that it was some advertisement (though isn't anything touting large numbers a form of advertising?).
I miss SABLE. I like it better than DDB, but I'm probably in the minority there. It made building high throughput services easy. I'm somewhat surprised it mentioned publicly.
I guess it explains why I am not able to find any information related to SABLE. Are you able to release anything related to it? It seems really interesting.
> SABLE is at the core of Amazon.com retail infrastructure. It more than 400 billion requests per day from Amazon.com websites, Web Services, and internal Amazon systems.
If they operate like lots of other big corps, I would think the retail arm actually would have to pay AWS for the service. Obviously zero sum, but you want can't know your true profit and loss for individual business units if you don't.
It's worth clarifying that question. What does "Go Down" mean? One region is down? Is it a bad software bug?
The database itself is fault tolerant. Nodes can go down all day without any real consequence. Each region is isolated, so a failure in one region won't affect other regions.
I guess I'm at a loss for the type of failure with DDB that would make it a SPOF, especially more than other datastores.
They introduce change one region at a time usually. DDB did have outage many times and brought half of AWS operations down. But good question how they kept Amazon.com alive if some part of their systems rely on DDB.
So what? I still got a 500 error while trying to process an order and when I refreshed - sold out. "60% of the time - it works every time" is only good for Panther scented cologne.
12,900,000 / second - WOW. Too bad the request rate was probably closer to 15,000,000 / second and you just abandoned everyone else.
"Prime Day" is a rip-off, and the pricing schemes border on fraudulent. Give yourselves a pat on the back for tricking people into spending money they'd be better off saving.
If you're going to say stuff like this, at least provide a modicum of evidence. What pricing scheme, in particular, is borderline fraudulent? Does it differ fundamentally from events like Black Friday? Why should people save money instead of spending it?
I'm not going to say you're lying but I think you're being eh misleading. This investigation is happening due to increased scrutiny of the Whole Foods merger and isn't a formal investigation by the FTC.
Effectively the FTC is required to do its due diligence and the FTC has not accused Amazon of anything. It's some third-party group making the accusation.
> A study conducted by Consumer Watchdog in March found that for 61% of Amazon AMZN, -2.48% products, the pre-discounted price or “reference price” used for comparison with the new sale price was higher than what the products had been sold for in the past 90 days. This means the amount Amazon advertises under “you save” could be inaccurate. Amazon said the prices reflect averages of prices listed by competitors and other sources.
Isn't this what brick and mortar retailers do all the time? The "save" amount is (in my experience) often based on an inflated list price, which makes it look like you're "saving" more.
Doing so is still dodgy - but what makes it different when Amazon does it vs anyone else?
This sounds pretty tame to me. Hardly the Deep Throat moment you're making it out to be. In fact, I'd wager that most retailers probably do this for big sales events.