My jaw landed on the floor when it said that pinterest runs on AWS. AWS is fantastic but I would've expected that the performance-per-dollar ratio is way below having your own colo'd hardware at pinterest scale -- and would leave you spending talent on trying to optimize your software, instead of just throwing more hardware at your operation.
Is this common now, that pinterest-sized businesses are running on AWS?
Zynga bet on being able to build out their own datacenter cheaper than what it cost them on AWS. They took this experiment very far and ultimately couldn't beat AWS. There're a lot of hidden costs when you consider AWS-level datacenter management vs. the type of comparisons you might be used to looking at simple dedicated hardware vs. AWS offerings.
Zynga's case was more nuanced than that story leads on. Zynga built out its datacenter capacity assuming a base line of users, which in hindsight, was optimistic and left them with significant excess capacity and costs.
It actually can make more sense for larger websites than for smaller ones, as Pinterest can scale with demand and thus be running 50% or less VMs during quiet periods (e.g. middle of the night).
I'm not exactly sure how you jumped from them using AWS to the assumption that they're "throwing more hardware at [their] problems" rather than optimizing their software. AWS or other cloud services don't really indicate a certain priority as far as internal development, just like first party hardware doesn't (and there are companies that throw additional hardware at problems, just like there are companies that throw extra virtual capacity at it).
> It actually can make more sense for larger websites than for smaller ones, as Pinterest can scale with demand and thus be running 50% or less VMs during quiet periods (e.g. middle of the night).
You do understand we can buy things at 33% the price AWS charges, right? And have 24/7 access?
Oh, did you know you can do the same thing at other providers for cheaper? :|
Is it not possible that the peak load is more than 3x the average load? With your own hardware you have to be provisioned for peak load 100% of the time.
Even if you take your premise at face value, AWS is literally the most expensive option.
Personally, I've never seen a real world use case where you dropped below 33% capacity. In practice, you can get hourly billing for cheaper than AWS anyway with providers large enough to meet most customers needs.
Linode, and most alternatives, don't have S3/Blob storage as a service.. also missing are distributed key-value storage as a service and sql as a service.
Generally, with any given service as you can scale it makes sense to run your own instances.. but realistically someone who knows insert-tech better than AWS/Azure admins is a pretty rare breed, and having to in-source those specific skills costs in terms of time and money, redundancy even more so.
If you have a team of 3-5 people, you can do far more with AWS or Azure than you can most alternatives... Employees aren't free.
Really? could you elaborate on these other providers? I would expect at least, VPS, remote files (S3/Azure-Blob), distributed key-value store as a service, some SQL variant as a service, and load balancing as a service. Most of the alternatives I've seen don't offer those services, and most only do VPS.
AWS margin is so much higher than your own physical hardware, therefore its cheaper to throw your own hardware at the problem instead of AWS resources (and their built int margin), before finally contemplating spending expensive engineer time on it.
AWS margin is only relevant if you can be as Efficent as Amazon if you don't have 24/7 need for 100% of your servers then Amazon is likely to be more Efficent internally. It's also highly negotiable at scale. Do 10k per month and you get standard rates, do 200k and you get a little leverage. Do 2M per month and it can be a level playing field.
Cost savings start (moving to your own hardware) at $50K/month spend with AWS. I'd imagine the cost savings are enormous at $2MM/month, although I'll need to dig my spreadsheet out to know for sure.
I doubt you can get accurate numbers for 2MM/month accounts without actually having a 2MM/month account to shop around with. Though if you have real numbers I would love to see them.
it's not.
Most people forgetting maintenance costs. They are raising much higher in non Cloud environments.
Instead of tracking just your load and buying servers you need to track EVERYTHING.
Hardware failure, replace it, etc.
Also networking, it's really really really hard to run a Big network. AWS VPC is easy, AWS VPC is predictable. A normal network is not predictable.
People forget about a lot of things, especially that it isn't that easy to buy hardware and put it into a datacenter.
Also renting the datacenter or buying it? What about the clima? Do you need special coolers etc? What about raising electricity cost?
Or higher network costs to your t2, t1 provider?
On AWS a box costs x Cent per Hour and most likely these prices will fall. You only need to pay that.
We run on AWS and especially due to the networking we pay less than we would pay in a normal datacenter.
Mostly cause of the fact that they want money for everything. Even in the small scale, just buying 3-4 AWS instances upfront is really really cheap.
I do Devops/Infrastructure for a living, I've shown the math to both my day gig and numerous consulting clients that AWS is 30-40% more expensive, even when taking into account reserved instance pricing. External networking expecially is super expensive compared to traditional transit pricing.
They're absolutely wonderful for bursty or async workloads, but if you know your load, you're better off on your own gear.
I didn't meant external networking.
Connect two datacenters, see what costs come up.
Connect machines inside your datacenter, etc..
Your maintenance cost will raise.
AWS is cheap. Whatever you did, you operated on a really slow scale where money didn't mattered. Or you completely forgot a lot in your calculation.
> AWS is cheap. Whatever you did, you operated on a really slow scale where money didn't mattered. Or you completely forgot a lot in your calculation.
I've managed thousands of instances in AWS, and I've managed thousands of servers in colos, both owned (DOE facility, web hosting startup, consulting firm) or with the space leased (Equinix, Savvis, Cogent, IBM).
You could most definitely run elastic workloads in your own datacenter now using Docker containers, and one of the many orchestration frameworks that exist for it.
As I mentioned above, if you don't have a predictable workload, AWS is great. Exceptional even. But if you know your workload and can hire 2-3 people to manage it (which you should be doing at scale), its cheaper to get your own gear and colo it.
> You could most definitely run elastic workloads in your own datacenter now using Docker containers
Have you actually done this though ? Because I have extensive experience with both Docker and OpenStack and I am sorry but neither comes close to AWS for stability and reliability. They are both really buggy.
Also you seem to be completely missing the either side of AWS which is the software. Using ELB, SNS, SQS, RDS etc. can save a lot of money.
> and I am sorry but neither comes close to AWS for stability and reliability.
All of my instances in us-east-1 lost outbound connectivity Sunday morning for half an hour, and an hour last night. Tell me about this stellar stability and reliability.
As long as you design your application properly, it'll be reliable wether you run it in AWS, on OpenStack, or even bare metal. StackOverflow does it, you can do it too (they colo all of their equipment in NYC). I have not done this with Docker or OpenStack, but I have using Xen and KVM virtualization (Docker would be about the same as using lxc).
ELB? Haproxy. RDS? MySQL/Postgresql/MS SQL (in that order), unless you absolutely have zero management experience with it. I concede SQS and SNS have no open source counterpart; you'd have to roll your own.
I am not talking about the individual nodes. I am talking about the overall platform.
This idea that you can design your application properly and it will be reliable anywhere is ridiculous. If the underlying platform is fundamentally flawed (i.e. more than just nodes coming/going) then your options are very limited. Docker and OpenStack are examples of platforms that are really buggy when used in Production settings.
AWS as a platform is very stable and of a generally high quality. And the components like ELB, RDS etc are managed. Trying to manage the hardware AND all the core infrastructure components on top is more than just one person's job like you mentioned before.
> This idea that you can design your application properly and it will be reliable anywhere is ridiculous.
Thats not true at all. If you've designed your application that any instance could fail at any time, it doesn't matter where your instances are hosted: AWS, Digital Ocean, OpenStack, or your own bare metal servers. As long as you've architected the environment to pass traffic to only health nodes (you're doing service checks, aren't you?), healthy microservices (again, service checks), you can promote slaves to masters for your datastores, and you have sufficient capacity for failures, yes, you can design your application to be fault redundant on any infrastructure.
TL;DR If your app is designed for high availability and is already structured to scale out horizontally, its doesn't matter what the underlying infrastructure is.
managing != operating them.
You could manage a thousand servers, but ONE PERSON just can't operate thousands of servers.
also how did you managed file access in a scaling manner? did you've setup hdfs? and when yes how did you provided your developers that space? did you created some kind of infrastructure for that (libraries, etc)?
There are so many things, which just doesn't work out.
As already said, you've just managed it. But other people did the work, that this could work out and they needed a lot of time to get things done.
Take a look at google, they always try to bring costs down, through things like borg, kubernetes, etc. but they throw thousands of people at those problems, also at the networking thing, they have a huge team of PROGRAMMERS that try everything that the management of their network will be easier.
also if you don't need things all the time, why should you buy it?
(The below was ~5 years ago; it would be even easier today)
> You could manage a thousand servers, but ONE PERSON just can't operate thousands of servers.
Yes, I literally managed AND operated ~5500 linux servers in a facility. Remote power management, serial console access, PXE booting. I would replace any failed hardware when convenient, usually after my morning coffee or in the evening if I wasn't in a rush to get home.
> also how did you managed file access in a scaling manner?
Shared NFS with work distributed to servers on-demand or scheduled, based on workload.
> and when yes how did you provided your developers that space?
Limited SSH access to their environment; diskspace was provisioned in minutes. Scheduled jobs could be executed using an open source scheduling framework.
> did you created some kind of infrastructure for that (libraries, etc)?
No, the majority of the tooling was a few thousand lines of bash scripting
> There are so many things, which just doesn't work out. As already said, you've just managed it. But other people did the work, that this could work out and they needed a lot of time to get things done. Take a look at google, they always try to bring costs down, through things like borg, kubernetes, etc. but they throw thousands of people at those problems, also at the networking thing, they have a huge team of PROGRAMMERS that try everything that the management of their network will be easier. also if you don't need things all the time, why should you buy it?
Not knowing how to do something does not make it impossible or hard, simply an unknown how to do it.
Are you saying of something major broke at 2AM on Sunday you where the only person who would be available to show up and fix it? If so, your adding a lot of risk for major downtime ex: Your hit by a bus. If not, then you did not manage 5500 servers by yourself.
PS: I could see a team of 10 people handling 55,000 servers though. But that's serious scale and fairly lean.
I never said I was the only person; we were talking infrastructure cost savings, not human resources. Using AWS doesn't magically mean everything manages itself.
To directly answer your question, people from other teams were available if for some reason I was not.
Human resources are a major component of infrastucture costs. Sure, for small teams you can have someone doing everything but at scale you lose the fudge factor.
I would also be suspicious of 1:10,000+ numbers as they often don't include people that are part of the server team but not standing in the data center. Aka Facebooks 3 person dev team, a manager, whoever designs there HW, the team spun up to do mass HW installations etc.
In the end AWS covers a lot of seperate costs which are only really obvious as you keep scaling up.
Most applications/companies don't need to scale to hundreds of servers, let alone thousands... Having the number of employees to setup and manage that level of infrastructure has a not insignificant cost. If you can have your dev+ops team not have to have as much specialized knowledge and get more done with AWS, Azure or similar, then there are definitely costs to be saved. Of course the more of AWS/Azure infrastructure you leverage the more locked in you are...
I'd like to see a similar analysis for say 50-100 servers, and see if AWS really costs more vs. colocation when you add in the resources to setup and maintain the infrastructure.
> As mentioned, Etsy and others have gone against the cloud tide and seem to be making it work. Etsy CTO Kellan Elliott-McCrea told me that the online marketplace has discovered "very real cost savings" and higher utilization by running in its own data centers.
Even your own article contradicts your claim. :P
Yes, for people who can't capacity plan effectively...paying someone else to have idle capacity makes sense. For the rest of the world, it doesn't.
Most of the world doesn't need hundreds or thousands of servers... most businesses need fewer than fifty or a hundred. And growing is pretty easy... Once you hit a certain scale it absolutely makes sense to consider migrating, but that's the point where you have teams of dedicated employees.
And even then the costs of moving off of the cloud solution may never be worth it.
Umm. Look, I really don't think you know what you are talking about and you seem really eager to defend your whole "USE AWS BECAUSE ITS SO EASY" mantra as you replied to every almost all of the posts I made in this thread.
So I'm going to distill this into one post:
> Most of the world doesn't need hundreds or thousands of servers... most businesses need fewer than fifty or a hundred. And growing is pretty easy... Once you hit a certain scale it absolutely makes sense to consider migrating, but that's the point where you have teams of dedicated employees.
Fewer nodes actually means the cloud makes less sense not more...often you can get away with 10 1Us in different DCs. At that scale, "the cloud" makes even less sense than you think it does. Leasing dedicated servers is cheap compared to the cost of automated deployments of 50 VMs, especially in labor costs. At which point, we are just talking Active/Passive deployments across 2 datacenters and such.
Amazon really, at this scale, isn't giving you anything you can't do for yourself. The only time AWS seems to make sense is as a 1-2 man shop where you literally don't have enough time in the day or at extremely volatile loads where you can't capacity plan. Most businesses are neither of those.
> And even then the costs of moving off of the cloud solution may never be worth it.
Yes, that is called vendor lock in and that is what AWS, etc. rely on. If you don't develop the in-house talent, you can never move off without problems. Waking up one day and going "LOL I WANTS TO MOVE" is a bad idea.
> Really? could you elaborate on these other providers? I would expect at least, VPS, remote files (S3/Azure-Blob), distributed key-value store as a service, some SQL variant as a service, and load balancing as a service. Most of the alternatives I've seen don't offer those services, and most only do VPS.
Honestly, if you can't figure this out on your own without relying on AWS at the scale of 3 people...you have no business discussing this, frankly. This is a case of the blind assuming everyone else is.
I have hobbies with 99.5% uptime that essentially have all of those functions serviced for cheaper than AWS.
> Linode, and most alternatives, don't have S3/Blob storage as a service.. also missing are distributed key-value storage as a service and sql as a service.
...this is one of those times where I'm tempted to just run a site that includes all these things as a hobby just to point out how stupid this comment is. I don't really need more hobbies but this YC fanboy behavior is ridiculous. Well, except for the "kv/sql as a service" because that is just silly and can be done just fine with traditional clusters that have existed for years.
> Generally, with any given service as you can scale it makes sense to run your own instances.. but realistically someone who knows insert-tech better than AWS/Azure admins is a pretty rare breed, and having to in-source those specific skills costs in terms of time and money, redundancy even more so.
> If you have a team of 3-5 people, you can do far more with AWS or Azure than you can most alternatives... Employees aren't free.
It seems rare to you 'cause it isn't something you are good at. AWS/Azure simply aren't cost effective. Fully loaded sysadmins cost for someone compentent is like $30/hr. Setting up an internal cluster for any of those things is much, much cheaper than AWS is after a year for literally every business I've ever dealt with IRL.
If I'm running all my own services... now I need to Hire people to manage my OSes for the app, as well as systems for databases, key-value stores, backup, mail, load balancing, etc, etc... few people are good at more than one or two of those things, which means I'd have to hire 2-3 people to cover those skills. 2-3 people at even 100k ($30/hr + employer taxes, etc), is quite a bit expensive than AWS services extra costs. Those are people you wouldn't necessarily need if you leveraged those services from AWS.
And when you say, you can just hire people to setup, and run maintenance via contracted services... I've tried hiring a PT postgresql dba, and it was pretty much impossible to do... those services I did reach out to mostly just plain didn't respond, even with a quote.
> as well as systems for databases, key-value stores, backup, mail, load balancing, etc, etc... few people are good at more than one or two of those things, which means I'd have to hire 2-3 people to cover those skills
That just isn't true. If those people were such unicorns, everyone would beg to hire me because I'm worth 2-3 people. :/ I'd be able to get a job anywhere with no trouble.
Given that isn't the case and most people find me unimpressive...yeah. I just think the problem is you, honestly.
I think often you'd rather throw money at the problem, and have your engineering talent working on new features. This is one the of the nice things about AWS - oftentimes for the cost of a few hours of engineering time (what you might spend on a meeting to discuss optimisations) you could run a faster machine(s) for a few weeks.
Yes. Pinterest is big but it's not Amazon, Google or Facebook big. With things like reserved instance pricing, and not having to spend as much money on a physical ops team that can manage real hardware, it leaves room to optimize software and cloud architecture.
without knowing the specifics of their workload I would bet that it's probably a draw at their scale. they may be big enough to start considering this sort of transition... but, it also takes a significant amount of time, energy, people and energy to make that transition, so they probably won't do it until it's a no brainer.
Curious if the numbers in the article are their entire db workload, 20-30k qps. If so they have a while to go before they are ready to graduate from the cloud in my opinion. (having no idea of the real meaty specifics, but imagining as it is similar in function to many that I have worked on).
GCP seems much less expensive. Compared to Azure, for instance, I've found VM to cost from 50% to 10%. (Not to mention the SSD options aren't idiotically lame.) The GCP portal and speed also totally obliterates Azure.
AWS had more complicated pricing via that weird reservation system and "reselling" marketplace. But GCP, as far as I could tell, was still a bit less expensive.
Netflix runs on AWS (at least a lot of it does). Which is a bigger business than Pinterest, and more interestingly to me, in direct competition with Amazon.
> 3. We’re not very good at predicting customer growth or device engagement.
That really is a large part of why Netflix runs on AWS. Capacity prediction isn't something they can do reliably due to the nature of their service.
That makes sense for a video service.
For 99% of websites, you are not going to have trouble with capacity prediction on the scale of 72 hour turn around for additional capacity. Almost any DC can sell you capacity in under 48 hours and if it takes you longer than 24 hours to deploy...yeah...that is on you.
Netflix is a bit unusual in that they just need partners with big pipes. Their servers can probably fill up pretty much any pipe with only a relatively modest amount of CPU and memory.
This is as opposed to a big database driven site like Pintrest that has to custom build pages for every view thousands of times every second. They are far more likely to be memory and CPU bound than Netflix.
Sure, but I didn't mean to speak to the technical ramifications of the decision to use AWS, rather I was trying to speak to the business ramifications.
That is, AWS has always been a high premium service with the standard story being that you make up for that premium with lower operational costs and more operational sophistication.
But when large (and technically sophisticated) companies like Pinterest and Netflix continue to use it you have to question that story (and to me, especially in Netflix's case as they have the added incentive being a direct competitor to Amazon in their main business).
My current working conspiracy theory is that implicitly or explicitly, the growth of AWS is tied to some sort of regulatory arbitrage and has virtually nothing to do with hardware performance, software resources, or operational sophistication.
That is, AWS allows you to move costs that would have been capital expenditures into other buckets and that is in some way that is obscure to me, very valuable from a business point of view.
There are not a lot of Pinterest-sized businesses. :) Amazon is documented as giving these guys some serious discounts. Even a cursory analysis of the costs of running at this at scale on AWS shows it's bad value at 'recommended retail' prices.
Would be interesting to see how this compares to RDS--i.e. perhaps all these optimizations are already in place, or perhaps it makes sense to not use RDS, and optimize mysql for your own workload.
A question, with such high load, why not run it on dedicated hardware with PCI express SSDs? You are paying around $1000-2500 per month for the instance type for that you can get dedicated machine and lessen the total number of servers.
Awesome.. As much as I dislike Google as a business, I love their engineering and despite being totally biased coming on to GCP, they won me entirely in a day. We're so, so very much happier dealing with GCP. No RAID to get perf - just ask for more. No complexity. Simple, damn fast, and cheap.
I cannot imagine how Azure or AWS compete (esp Azure - even without making a cheap comment about their terrible new portal) when it comes to IaaS. GCP is just flat out better.
It's amazing to me that anyone still tries to run mysql with the glibc allocator. That allocator is not suitable for any kind of multithreaded workload. Use tcmalloc or jemalloc for everything.
Is this common now, that pinterest-sized businesses are running on AWS?