I sometimes feel like I'm the only one using EC2 as it was intended.
S3stat's nightly job takes about 60 hours to run, and it needs to start and finish between 3am and 6am every morning. Amazon kindly keeps 20 machines ready to do that for me and only charges the time I'm actually using them. That's pretty amazing, and well worth the price in my mind.
So yeah, if you need a box to run your webserver 24/7/365, you can find a better deal elsewhere. But that's really never been what EC2 is for. To continue the example, S3stat.com lives in a cage at a colo since, as the author points out, that's a much better deal than running it on EC2.
> So yeah, if you need a box to run your webserver 24/7/365, you can find a better deal elsewhere. But that's really never been what EC2 is for.
What Amazon offering is? This is AWS we're talking about. They have a service solution for nearly everything. You're saying in 2013 they still don't have a service for 24/7/365 website hosting?
I know you were asking for effect, but of course they do and it's EC2. That's why they have a heavy reservation pricing tier, which only makes sense for 24/7/365 (you pay for hours even if you don't use them).
But, it's fun to point at "elastic" and tell people they're "doing it wrong" because they don't take a name chosen 7 years ago literally. As if somehow the service (called EC2 virtually everywhere -- not Elastic Cloud Compute) could never evolve beyond that initial use case. Incidentally, the "elastic" in EBS must have a different meaning because one of its primary selling points is that it's persistent storage.
In other words they took a system targeted especially at people who needed on-demand computing and as it got popular, adapted it to the needs of the 24/7/365 web-hosting by offering an alternate pricing model, point-and-click user interfaces and additional features and services like EBS and CloudWatch.
The point is absolutely not that EC2 never evolved beyond its initial use case and isn't good at other things.
The point is that while they have 24/7/365 hosting services, there's never been any reason to expect that they would be better at it than anyone else. So why do we continue to see blog posts about not liking EC2 with vague complaints about the horrible price-to-performance ratio getting lots of upvotes?
Because EC2 still seems to be a lot of people's default option for 24/365 servers, even though it isn't particularly good for it. Why is that so? Evidently there haven't been enough blog posts on the subject yet!
On top of paying 8x what it cost before/after I moved from/back to Softlayer, AWS services just didn't work right. EBS failed often. RDS, which runs on top of EBS, would fail often too. When an entire AZ failed, despite paying double the hourly rate for "Multi-AZ" instances that were supposed to automatically fail over to another zone, nothing failed over, it just failed.
If I need on-demand instances these days, I'll do it at Linode. They bill to the day, so you don't need to commit to a month for temporary instances. It's not as scriptable, but it works and doesn't cost a fortune over renting hardware either. More often, I just get servers with more CPU cores and more RAM than I need so there's plenty of room to absorb spikes.
We use Softlayer too for the same reasons. Have been with them for over 10 years now - since they were called The Planet. Just had one hardware failure during the entire time. They also now offer Cloud computing instances - which can be deployed either hourly or monthly. And you can mix and match your Cloud and dedicated servers - giving you all the flexibility to scale that you get with AWS / EC2.
Our own experience with their Cloud offering is more recent and a bit different. We have been using cloud computing instances running Haproxy as load balancers to route millions of requests to multiple physical web servers for over a year now. Can't generalize - but we have had no problems so far.
We use ELB (Elastic Load Balancer) with an auto-scaling group behind it. The instances get added or removed based on the latency reported by CloudWatch/ELB. During the course of a day, the number of EC2 instances running can vary from 8 to 25.
Usually 8 is enough, but when spikes happen or when the capacity of instances to process stuff drops (which does happen in cloud-computing, based on what neighbours you have and what they are doing), then new instances are started in a matter of minutes.
And this billing by hour does save us a lot of money. Because 8 instances is enough, until it isn't and the traffic is so huge that it can choke and freeze 8 instances.
Incidental, this ability is one reason we moved off Heroku.
Scaling up/down is usually for CPU-bound jobs. EC2 has a bad cost/CPU workload ratio when compared to a provider like Linode. You really have to bank on the hourly billing (as the article touched on) to make the cost it worthwhile.
I would imagine for a lot of people, they don't need that type of granularity when spinning up and shutting down a server instance. Unless you have some automation to spin up new instances as load increases for short bursts, I could see people working on a schedule that is across a few days.
It's in the AWS docs -- basically, EBS (and RDS built on top of it) is more reliable than a standard hard drive, but nothing like S3 -- you need to expect RDS to fail eventually, just like you would with a server of your own.
But, that's why AWS gives you RDS backups, AZ failover, etc. -- the backups won't fail (they're stored on S3), and with AZ failover, you can make downtime vanishingly small. But you need to actually do that.
A lot of people seem to confuse S3's 99.9999...% reliability with the EBS-backed stuff, which you must plan for eventual failure with.
How does Rackspace work? Do they work just like Linode or are they an AWS competitor? I heard some good reviews about them, but not sure...Anyone here on Rackspace who could give us some pluses and minuses of the service?
We had a 2/3 ec2 1/3 rackspace split, and ended up moving largely off rackspace for operational reasons. I found the interface much more usable at AWS, whereas the one at Rackspace was a 'five clicks to do anything' webif, though apparently they've improved in recent months.
I'm not heavily experienced in cloud offerings, but it's far easier to manage the AWS stuff than the rackspace - the DNS management is quite flexible yet couldn't be simpler with AWS's 'Route 53', but with the part of Rackspace we were using it wasn't 'all in one place', which made it hard to peruse or alter.
I found support at both places to be upbeat and knowledgable, though I don't know about timeliness since I've only really lodged low-priority tickets. AWS does need more domain knowledge in order to understand its flexibility, and I've gotten a good workout from my $50/mo support add-on.
We were on Rackspace before moving to AWS. Their sales person flat out told us on our contract renewal call that the only thing they could compete with AWS with was customer service.
I appreciated their honestly but they were lacking a few required services for us at the time.
1. AWS beats all others when it comes to security [1].
2. EC2 is just one item in the package called AWS. Hence, if you just build something more than just "web-app with *db" at the back-end, say a full blown platform, then I know no other option for you to get the full stack integrated API for Data-warehouse, DNS, Load-balancing, auto-scaling, billing, etc.
3. Speed is sometimes over-rated. You should be be speedy where it matters more. That is, how fast can you redeploy your entire cloud from scratch in case of a disaster should be more interesting to you than if a webpage takes 20 more ms to get to the browser. In our case, at AWS, it is a matter of < 20 minutes.
1. Agreed. IAM is really, really, nice, and I find myself missing it greatly when I'm on other platforms.
2. We use quite a few more things than EC2 for our systems.
- SQS eliminates the need to build / manage a queuing system.
- DynamoDB/SimpleDB eliminate the need to build / manage a distributed data store.
- OpsWorks eliminates the need for a DevOps team (mostly).
- ELB eliminates the need to build / manage a load balancer.
- SES seamlessly takes care of out-bound mail.
- Direct Connect gives us a way to extend our DC tools into the "cloud".
- And to top it all off, I can bring up any/all of these services at a moments notice, run some experiments, and then shut them down when I'm done.
I don't think there are very many vendors that can help us do these things with this much flexibility. Yea, AWS can be expensive, but we feel like its worth it.
> 3. Speed is sometimes over-rated. You should be be speedy where it matters more. That is, how fast can you redeploy your entire cloud from scratch in case of a disaster should be more interesting to you than if a webpage takes 20 more ms to get to the browser. In our case, at AWS, it is a matter of < 20 minutes.
Personally, I'd rather have 20ms shaved off my users time than a 20 minute disaster recovery time. Disasters happen perhaps once a year (on AWS, possibly less on dedicated servers), people are loading pages every day.
That's perhaps depend on the type of your customers/users, and their priorities.
If you have "users", then you might be right, as no harm will be done, if once in a few years, their free service will be shutdown for 18 hours.
However, if your customers are running core and critical parts of their business on your system, this part becomes a significant factor in the equation.
My nontechnical boss doesn't know about the 20ms difference (thinks her computer is slow?) but an outage is visible like the difference between day and night.
At 2s difference it's fair to say that a user is going to notice. Maybe even at 200ms. But at 20ms? I'm not sure that counts.
I know it's just a fabricated number but the point is that the server you're running on won't make a difference to the user experience. And, probably in the case of the differences were talking in these machines, a user would never notice it.
20ms could mean the difference between a single digit rank in the App Store and a three digit rank. I see it every day - if my systems drop by 50ms average, I see a drop in rank.
And 20ms? Try more like 300ms+ difference. Hell, sometimes a full second or more for some sites. Anyone who says total disaster recovery time is more important than total latency isn't running anything remotely to scale.
Yes and yes, we're using a CDN. Neither of those change that EC2 instances are slow and have unreliable performance, and the network is abstracted to the point where you can't effectively control packet flow.
EC2 will not become faster. However, at least, for web traffic, CPU and IO are not the only factors.
Network is, in fact, a major player in the latency, and by being globally distributed, configuring Route53 appropriately, and integrating CloudFront CDN, a given web-app gets a boost that I doubt a faster computer can beat.
We experimented with EC2 in the early stages to use as a possible load failover, and no matter what we did, we could never get better than 150ms ping times even between internal zones.
The EC2 network also has mysterious packet filtering on it that prevented IPSec tunnels from working correctly.
I've managed to leverage CDN and globally distributed servers for a fraction of the cost of Amazon services just fine, and I have the added benefit of 100% full control of all aspects of it, including the network.
A good analogy is that EC2 is like an interpreted language versus compiled - you can get a lot done and it's easier to get started, but if you're really serious about performance, you need to program in C.
This is not entirely correct. AWS does offer AWS GovCloud which, provides an environment that enables agencies to comply with HIPAA regulations [1]. You have to be a US government organization to use it though.
Updated: AWS also has a whitepaper on Creating HIPAA-Compliant Medical Data Applications with AWS [2]. Looks like this is support on the standard non GovCloud stack.
The trouble with HIPAA requirements is that they're not clearly defined and are open to a variety of interpretations.
Our experts advise a safe, CYA approach and mandate a BAA agreement is in place with every partner touching sensitive patient data, even if encrypted and protected on multiple levels. Thus far Amazon is not accommodating to such a request.
Other's have their own opinions and, in the end, we all weigh the risks vs rewards (including Amazon itself - I'm sure they've plenty of reasons of operating in their present gray area).
I worked for a major hospital once and they were all about the CYA agreements. The funny thing was the HIPPA is more a state of mind, not a 100 point punch list. So you're really just practicing CYA more than anything else.
I don't believe you need to be a US government organization to use the GovCloud region. I think you just have to be a US corporation or person and pay through the nose. It's only available directly via signing an actual contract, not a la carte like normal AWS services.
As of March 2013 (two years past those publish dates), Amazon has still not agreed to the legal "Business Associate Agreement" provisions of HIPPA that would permit you to use their services to store Protected Health Information. They said they are considering it, but this has been the status for quite some time. Rackspace, on the other hand, has agreed (for a surcharge).
According to Amazon, their employees are not allowed to access to your data, so you don't need to sign a business associate agreement with them to be HIPAA compliant. I imagine this is similar to how sending patient information through the post office is not considered a disclosure to the post office.
Actually the HMO I worked for did. Every vendor such as ISP's, Colo's, and some API suppliers had to sign the CYA agreement. Most of them are aghast when you ask them to sign. Basically they have to take on all of the liabilities. I've never seen it have to be exercised however.
Do you sign business associate agreements with your colo facility, ISP, and landlord? They also are physically capable of accessing your data, even though they are legally or contractually forbidden from doing so.
The orgs that I have worked with draw the line somewhere between colo and ISP. Anyone with potential access to unencrypted network traffic or whom is operating equipment containing affected data. Usually the lawyers can agree to contractural terms for the landlord without a BAA
I'm not arguing that it makes sense, just that it happens.
And there are plenty of health care companies that evaluate AWS and decide they don't need a BAA, due to the way the system is constructed. This is a 'your legal team' issue, not a global issue (ie: it's an issue, but not a blanket problem for everybody).
It's not just a question of speed. If your machines are slow, that means you need more machines to handle your throughput, which means you are paying for that 20 ms slowdown in actual dollars.
Sure, but the whole thing is predicated on the 20ms slowdown coming from a slow machine, not network latency. And that's a pretty good assumption. Due to RAM limitations and abysmal performance, I could maybe push 15 concurrent requests on a c1.medium running a Rails app in Passenger with a non-CoW Ruby. Forking is terribly slow on EC2. An m1.small was out of the question.
I'm working on a web service (build on top of Scala and the JVM) that's handling between 1500 and 3000 reqs per second per c1.medium instance, with an average time per request of under 15ms. This is real traffic, with the web service receiving between 16,000 and 30,000 total requests per second during the day. A c1.xlarge can do 7000 reqs per second or even more, but for the moment I felt like the difference in pricing is too big and it's cheaper and safer just starting more c1.medium instances (with auto-scaling based on latency), but in case we'll need more RAM, then we'll probably switch to c1.xlarge.
If scalability matters, you should have picked a better platform. Ruby/Rails/Passenger is a terrible platform for scalability / performance. And even if AWS is slower than other solutions, the first problem you have is your own heavy-weight app and the platform you've chosen. 15 concurrent requests per second makes me chuckle.
I just wanted to add -- since you're not the first to point out the Rails part -- that I've also run a 42 node Cassandra cluster on a m1.xlarges and did a fair bit of CPU-bound operations (encryption and compression) on hundreds of TB of data on cc2.8xlarge. I just used the Rails one as an example.
In the case of Cassandra, disk I/O was a constant issue. So, we grew the cluster much larger than would be necessary on another provider. We also lost instances pretty regularly. If we were lucky, Amazon would notify us about degraded hardware, but usually the instance would stay up but do things like drop 20% of its packets. Replacing a node in Cassandra is easy enough, but you quickly learn how much their I/O levels impact network performance as well. Nowadays Cassandra has the ability to compress data to reduce network load, but you then run into EC2's fairly low CPU performance.
The CPU-bound application I mentioned wasn't so bad, but we paid heftily for that ($2.40 / hour - some volume discount). At the high end the hardware tends not to be over-subscribed.
Performance, price, and reliability were all issues in all cases. Those are not EC2's strong suits and haven't been for a while.
I don't entirely disagree. All I can say is REE and Rails 2.3 were far lighter weight and faster than Ruby 1.9 and Rails 3.2. Given it's a 3.5 year old app, the landscape was pretty different back then. I looked at Lift and didn't like it. Django was still in a weird place. And ultimately Rails looked like the best option for a variety of reasons.
Things evolve and whole hog rewrites are difficult. Nowadays we run in JRuby and things are quite a bit better. But we can't run on anything smaller than an m1.large. The low I/O and meager RAM in a c1.medium preclude its use. (BTW, that's where a lot of the original 15 came from -- with a process using 100 MB RAM and only 1.7 GB available, it's hard to squeeze much more out of that).
But the larger point is with virtually any other provider you can pick a configuration that matches the needs of your app (rather than the other way around), don't have to fight with CPU steal, don't have to fight with over-subscribed hardware, and don't have to deal with machine configurations from 2006. Yeah, Rails is never going to outperform your Scala web service. But if the app would run just fine on the other N - 1 providers, then it's disingenuous to gloss over the execution environment as well.
Run it on top of JDK 7 and use the CMS garbage collector, as JRuby (and Scala) tend to generate a lot of short-term garbage and experiment with the new generation proportion (something like -XX:+UseConcMarkSweepGC -XX:NewRatio=1 -XX:MaxGCPauseMillis=850). You can also profile memory usage (make sure you're not stressing the GC, as that can steal away CPU resources) and for that I believe you can use Java profilers (like YourKit which is pretty good).
Also, try to do more stuff async, like in another thread, process or server. Use caching where it's easy, but don't over do it, as dealing with complex cache invalidation policies is a PITA.
That's one way to look at it. Another is when this app started 3.5 years ago, Rails & the app had a drastically different performance profile and Amazon didn't have super-over-subscribed hardware. Not that it matters much, but there's nothing convenient about having to engineer around EC2. And doubling your capacity or constantly upgrading instance sizes is not cheap, nor a scalable solution in any practical sense.
Pick your language though. With terrible forking performance, any process-based execution environment is going to have similar issues. And I found running a servlet container on anything smaller than an m1.large to be an utter waste. 1.7 GB RAM isn't enough for many JVM-based apps and threading could easily overwhelm the system. Anything less than high I/O capacity just can't keep up.
In regards to speed, I think one place EC2/EBS fails (I'm speaking based on other people's experiences) is the consistency. 200ms is better than 100ms if you get 200ms every time, but from what I've read, a lot of AWS services are all over the board. This makes infrastructure very hard to provision and predict.
Any argument involving price seems to completely ignore TCO. We are using Heroku for most of our needs and suggest it to all our clients and we couldn't be happier to pay the premium over "expensive" AWS. We save a lot on IT and maintenance. Unless you are paying thousands of dollars per month, or your time is very cheap doing your own servers will cost you more.
If AWS were so expensive and "not worth it" what are the guys from Netflix smoking? ;)
"Combined cost for each datacenter is about $1.3M per year." ... "If we spent the $1.3M per year on a complete EC2 site instead, we could afford the following architecture, provided that we used one-year reserved instances." ... "This means that we could add more than 60% capacity our current configuration"
I use AWS extensively, but to be completely fair, Netflix doesn't really care about operations costs. Yes, seriously. Their media licensing is so much more expensive, they do not optimize bang-for-buck, going as far as using larger instances (to avoid noisy neighbor problems) that are much less cost effective.
So while AWS might be completely worth it for some people (it is for us), Netflix isn't the best argument :).
"Netflix doesn't really care about operations costs."
Considering that they're a publicly traded company, they have a fiduciary duty to watch all costs. Though server costs don't compare in relation to media licensing, I'm sure they pay some attention.
That is not how fiduciary duty works. The company's duties require it to make good-faith decisions regarding spending, which can include a good-faith decision that it is not worth their time or energy to chase nickels and dimes.
I stated that a responsable fiduciary monitors where they spend their money. This then allows them to make good-faith decisions. I never said that they're trying to chase nickels and dimes on the ops front.
We do spend thousands a month of hosting. We have servers in 5 geographies...If you include our CDN costs, the amount we spend is quite large. The money we save is enough to get a full time devops and then some.
Netflix probably doesn't pay what you and I pay. Also, for every Netflix, you can find 100 examples that use dedicated or collocated.
> Unless you are paying thousands of dollars per month, or your time is very cheap doing your own servers will cost you more.
This is down to familiarity of tooling, not some intrinsic advantage that EC2 has over managed dedicated hosting. The simple matter is that there just isn't the maturity of tools around automated provisioning of dedicated hosts, which makes them seem higher overhead.
> If AWS were so expensive and "not worth it" what are the guys from Netflix smoking? ;)
Simple: Netflix have very bursty load, so it costs them less to pay the EC2 and virtualisation premium than it would to keep an equivalent amount of dedicated hardware on warm standby. Is your load bursty? Then EC2 might make sense.
EC2 (or any cloud virtualisation platform) will always lose in a shootout with managed dedicated hardware, unless the shootout parameters are provisioning time and tooling, simply because EC2 is managed dedicated hardware plus an extra layer of stuff on top that has to be paid for.
I agree with the author, EC2 seems really expensive compared to other clouds. One of the comments suggested DigitalOcean, which has much better pricing, and you can do flexible by the hour pricing and it has a rest API to bring up/down servers.
I've had a rather spotty experience with DigitalOcean unfortunately. Some times you can't create instances, it just hangs there waiting forever. You can't replicate snapshots between regions. I click the button but nothing happens... I've also had an order of magnitude worse network bandwidth sending backups to S3, both in Amsterdam and NY.
It's ok for a dev box or playing around, and it's definitely cheap, but I wouldn't trust any production servers on it. _Yet_. I do see they improve at a very impressive pace, with new features becoming available regularly, so things hopefully change for the better... It's still nowhere near being a match to AWS, or even Linode.
Whenever someone says $CLOUD costs less than EC2 it's always a very limited cloud with a small number of customers with a tiny fraction of a fraction of the features of AWS. It's never an Azure or a Rackspace.
This article pretty much reflects my experience with EC2. EC2 makes it very easy way to get started with things when you're figuring stuff out and don't have a great deal of experience in managing servers and machines. You don't have to deal with RAID, power outages, etc. and can literally treat hardware like software.
But once you get up and running, you will realize that this comes with a great cost. EC2 absolutely sucks in terms of raw performance. I also notice variation in performance of machines on different times of the day, and between machines.
Amazon has done a great job at selling "the cloud", and I can see many CXOs buying into that. But the fact is, renting dedicated machines from a good provider isn't exactly that hard. They take care of a lot of things for you.
EC2 is good if you have wildly varying amounts of traffic such that you need to really have that kind of elasticity in your infrastructure. However, for most businesses, especially web apps, that's not the case.
I was shocked when I learned that many EC2 users with highly variable traffic don't spin down during off-peak because sometimes Amazon has problems that prevent them from spinning their instances back up.
I love the idea of their cloud services, but the reality is a decidedly less attractive beast.
Hmm I would like to hear more about these problems that prevent people from spinning up instances. Is it a frequently occurring problem or only happens rarely (e.g. when APIs are down). Also are they managing the instances themselves or using EC2's AutoScaling cluster? I run a dynamically scaling cluster on EC2 and have not run into the problems like you mentioned, so I would like to hear more about them if possible. Maybe they are spinning up and down too rapidly and exceeds the API rate limit? You'd have to have a bugged/bad provisioning system to accomplish that though...
I'm aware of the API rate limit when an AZ goes down, but the original comment was about problems that people run into for highly-variable traffic sites that scale up and down on a regular basis (e.g. spin up 20 instances during the day with high traffic, at night shutdown 15 of them to save cost, on a daily basis). This of course is not related to any AZ/API downtime, and I am not aware of any problems that could interfere with the normal usage of APIs during normal service operations, which is why I wanted to hear more about the details of those problems.
It's due to rarely occurring problems such as APIs down/unresponsive, insufficient/incorrect instances available, and a third major class of problem
that's frustrating my memory at the moment. In addition to not scaling up and down (automatically or manually), many of these apps are architected to require as little from AWS as is possible to reduce their exposure. For example, they'll refrain from using ELB because ELB depends on EBS.
These decisions were made by companies that started off believing fully in the promise of elasticity, and gradually shifted to less elastic architectures as they experienced issues. That said, it's worth noting that these are firms with very high costs of downtime, so the magnitude of failures was very high.
While clearly Amazon is working on bringing down the cost of EC2, as it is today you're paying a premium to get at the AWS ecosystem (and for some companies it's well worth it). There's no other great reason to choose EC2 for servers given the pricing. I understand why people keep writing about the cost, but if you have use for the myriad of AWS services, then good luck replicating what they offer on dedicated servers (without spending an equally huge sum, either in dollars or time or both).
Firstly AWS sets the benchmark for security, transparency and compliance which we know you won't get from Linode for example. Secondly AWS has a damn good network. Thirdly lots of software 'understands' EC2 e.g. Cassandra, Hazelcast.
AWS' popularity comes not just from the wide array of decent, well integrated and cohesive services e.g. S3, SQS, ELB, ElastiCache but also all of the third party services that are hosted within the AWS network e.g. MongoHQ, IronMQ/IronIO. AWS is very much an ecosystem.
If you are building a new app from scratch there a lot of benefits to having others manage the commodity parts of your infrastructure.
Well I think my point was exactly that: if you're building a fairly standard site / service / app, you absolutely do not have any practical reason to use AWS, particularly if you have a modest budget.
The radical majority (99.999%?) of all sites on the web can easily be run by a cheap dedicated server. And I'm not talking Softlayer, which itself is expensive in the world of dedicated hosts; there are several better priced providers that are nearly as good as Softlayer.
EC2 is never going to appeal to the bottom 99% of the web, until their prices come way down (and Amazon may never care about that). Dedicated hosts will keep offering more and more ooomph per dollar. The demands on a typical site in the US market (prime AWS customers currently) are not going up much per year at this point, as web usage is no longer growing much for the first world. Meanwhile hardware and bandwidth for an average dedicated server just keep getting better.
Services that used to cost me $300 to $500 / month to run four or five years ago, I can now operate for 1/3 that price on even more powerful dedicated servers.
Programmatic access is a big thing, and is what makes EC2, GCE, Rackspace and friends, different from dedicated hosting. (In the case of AWS / GCE, global availability is something else).
Using the Cloud isn't only about instant scaling and per hour billing. Those are useful features, but mostly for specific use cases.
--
The Cloud it's about the ability to have software that controls the hardware.
It's about being able to orchestrate automated failovers that provision new servers.
It's about being able to have services that grow and shrink depending on usage.
It's about being able to replace your app servers with data crunching servers at night and relaunch new app servers at day - all transparently.
Of course, those are just a few examples, but the general idea is that programmatic access (APIs) is the big thing. The rest is secondary.
---
Is is, however, true that this is not useful to everyone.
On the topic of succeeding using the Cloud, it's indeed difficult. That's why companies are building cloud management tools to help users do this.
You can orchestrate automated failovers that provisions new servers even if you have dedicated servers to handle your base load. And the price difference with EC2 is so huge that you need really massive traffic spikes for it not to be cheaper to have a bunch of extra dedicated servers on standby.
Nothing stops your from spinning up data crunching EC2 instances at night either, if you want to, and if it really is more cost-effective for you than having VM on your dedicated hardware.
There's plenty of API's available if you want to run your own "private cloud" on those dedicated servers too. I never deploy outside VM's any more, even though I also mainly use dedicated servers.
EC2 is cost effective if you truly have really short term (< 4-6 hours per day) batch processing needs. It continues to shock me how many people take the pain and cost of dealing with EC2 for more typical web app usage.
Switching load or transferring data between a remote DC (your dedicated servers) and AWS is not that easy when you start having large-scale infrastructure though.
This is a great idea on paper - but running your baseline infrastructure at your dedicated provider and the rest on EC2 is definitely a challenging task.
Not that it's impossible, but that's probably going to be extra work at the app level.
---
Running OpenStack / CloudStack, or similar software on dedicated servers is indeed a relevant solution too, but this is an extra maintenance cost to bear in mind.
As someone who loves EC2 (and happens to work for Amazon, but in a different domain), I can't figure out why people/companies haven't moved towards Hybrid Clouds. There are several strengths and weaknesses of the various providers, and the reward-to-risk ratio of locking yourself into one platform is just too small to matter. There are several scenarios where I could see myself using a combination of Rackspace + EC2 + Linode + colocation. A little puppet/chef knowledge goes a long way.
The crazy part is that so far at the Amazon sales events I've attended the AWS staff frequently repeated the recommendation that you buy dedicated iron for your 24x7 workload and use EC2 for bursty jobs, usually pushing VPC as the way to tie everything together.
I can't quite parse what you're saying about the reward to risk level being too low for lock in, but having worked with a number of startups that were 100% aws and paying dearly for it thge lock in was their major problem - ec2 ancillary private branded or unique services all seem to be designed with lock in being the number one goal. From what I've seen the first acknowledgment of lock in seems to open the flood gate, kind of a well if we're already "stuck" we might as well use this proprietary feature that will save us 10% in build time. Not the smartest approach, but no one accuses most web startups of being in the long game.
I know of a few people that say that there are benefits to sticking to one platform...that there is less maintenance and lower friction from the "one stop shop" setups that you get on App Engine or Heroku or AWS. I'm just saying those benefits are tiny in comparison to the risks of lock in.
Most organizations when they try to use AWS or Rackspace as a mere extension of their datacenter - discover that is quite hard, and that their apps require extensive redesign to fit the new environment. This in turn leads to the lock-in and lack of flexibility.
These are appropriate considerations to have; and the concerns about performance are very well-founded.
Nonetheless, the values I see in EC2 and AWS come in the API, role management, and the centralization of services. Being able to automate everything is outstanding (even if many of the APIs feel very young). Allowing anyone in the organization access to the management console with appropriate privileges is helpful, and something that I haven't seen available from other providers - at least not with the level of customization that AWS provides. Also, knowing that just about everything is handled in one place makes things easier logistically - it's small overhead, but managing DNS and CDN and Hosting all in the same interface is convenient.
Hosting decisions must be made on a case by case basis; but it's wrong to avoid EC2 (and AWS) simply because you can get better performing servers elsewhere at a lower cost per time period.
Considering Amazon offers an entire pricing tier around 24/7 utilization (heavy reserved instances), it's probably safe to say that EC2 is no longer purely elastic.
And of course, the second letter stands for compute. If you find yourself using words like traffic instead of computation and discussing the finer points of the per-hour billing feature instead of how many CPU-hours of jobs are enqueued, you should definitely consider that you're trying to use a tool that wasn't designed with your needs in mind.
That said, AWS in general has a lot of useful tools for web developers, and certainly one use-case of an elastic compute cloud is temporary development instances.
At Scribd we initially used EC2 for systems like converting uploaded documents to HTML5 (the capacity we needed varied a lot over short time spans due to things like variable API usage). Meanwhile, the app and web servers (and many others) lived at Softlayer (still the "best-looking horse at the glue factory" for managed hosting, IMHO).
First, people seem to think EC2 saves you time and hassle. Compared to collocated servers, this is true. But compared to dedicated servers, managed or unmanaged, it simply isn't. Hardware fails? They'll replace it (at no cost).
I totally disagree. OK, maybe it takes a bit more time to set everything up, but when you have it all up and running (AMI's, Autoscaling, etc.) you have a robust setup that can scale when you scale (up or down). Adding a new instance literally only takes some minutes. Try that with a dedicated setup. It's more expensive for sure, but comes with a lot of flexibility.
What I see as an EC2 (or AWS in general) issue / challenge is that it's hard to switch providers. When you use S3, SQS, Cloudwatch, Elasticache, Cloudfront, DynamoDB, SES, Route53 your code is totally integrated / adapted to AWS. You can't take your code and deploy it on a different setup elsewhere.
In that case I would say you have architected an app for aws, then, and we should be careful not to do that for portability's sake. Http://www.12factor.net/backing-services
+ You get very unpredictable server resources, which is why disk and network throughput can vary wildly.
These issues are why we started Uptano, really just to scratch this itch for ourselves. Plug: https://uptano.com
The thing is, EC2 really was neat when it launched, but there are so many things that can be improved upon. Amazon has moved surprisingly slowly in improving EC2 itself.
I've run the numbers for our PCI-compliant setup about 1.5 years ago and Amazon was about 3-4x more expensive than the average bid for my RFP for several smaller hosting providers for a mix of dedicated and virtual servers. Another thing I've discovered with AWS (again, it was 1.5 years ago) is a horrible network latencies even inside the same data center on the same VPC: on a "normal" gigabit switch in a dedicated setup I would expect stable ping < 0.5msec with spikes to 1-2 msec on a loaded network. With Amazon, the pings were rarely below 10-20msec inside the same data center in the same VPC. It is easy to explain since Amazon probably doesn't allocate virtual servers "together" on the same router thus you have extra delays on going between multiple busy routers. In the same time it is pretty bad for many things (e.g. MySQL Galera Cluster that we use).
So, my take on Amazon EC2: you want to use it when you are small (i.e. don't really care about performance) or when you are really big (when you can get "special" deal from Amazon and dedicate a lot of internal resources to make it work). In other cases a better option might be a mix of dedicated hardware (servers and networking stuff) and virtual servers at a managed hosting provider that would allow you to create the setup that works best for your project.
Network latency is very inconsistent on EC2. This is a huge issue is you use a piece of software that assume that you are in a data centre with good networking, as you'd expect in an regular data centre.
It's not about "assuming" things. It's about severe performance degradation beyond any expected levels. Yes, there are solutions to workaround these things. But it will cost way more than a dedicated hardware setup if you account all the costs (software licenses, OPS salaries, etc).
Assumptions are certainly part of it. Software is often designed to consider certain round-trips cheap and others expensive. If this assumption does not apply, it will perform terribly.
Sure, EC2 may not provide the best price/performance, but some of their new services are killer: Redshift makes Map/Reduce obsolete, and DynamoDB makes other NoSQL DBs weep with its scale/performance/flexibility.
These services make it an easy decision to stick on AWS, and sadly ec2 as well
That's not entirely accurate either. If you buy a heavy reserved instance you have to pay for every hour, whether or not you use it, for the length of the reservation. So, it's technically still per-hour billing at a discounted rate, but you gain absolutely nothing by shutting down a machine.
What you described is not true, however, for light or medium reservations where shutting down a machine means you have no cost.
The other key difference is that reservations apply to hours used, not specific instances, which means you can do things like buy heavy instances for your 100% utilization level and medium for the amount beyond that which you commonly but not always use.
I made no claim otherwise about light or medium instances. I was merely pointing out that for many people Amazon is no longer a purely pay-per-hour or pay-for-what-you-use service. Many people overlook the wording change made for heavy reservations. The parent's refutation of the article isn't as cut and try as stated.
People use EC2 not because of price but flexibility. A good example: When we launch a product and expect lots of traffic, i can quadruple my server count in a few minutes. Once the event is over with, i can spin them down. I can't do this with traditional hosting.
OP. I tried to address that in the post. I'm not sure how to explain it differently. The only reason you need to spin up more instances to handle the load is because of how slow the machines are in the first place. That $750 you spend on 2x m3-xlarges could get you machines that can handle 4x the load (or more). Under normal conditions, you spend $750/m. Under peak, you'll spend $3000/month with EC2 but still $750/m with dedicated (plus it's less to manage).
Also in the post is that for the scenario you're describing, having dedicated servers handle the base traffic and relying on spot instances for the spikes, is much more cost effective.
The main issue with the hybrid approach is devops cost. you will be basically dealing with two very different infrastructures, and building the ops for such a hybrid system could be costly (in money and time). Taking this into account, the money you save shrinks, In addition, a more complex system most likely has more parts to fail as well, so your total reliability might suffer somewhat as well.
Another thing to take into account is inter-datacenter network reliability. I have personally ran a hybrid system (dedicated + EC2) before and suffered a routing issue. The upstream provider for the data center that my dedicated machines were in suffered a routing outage to EC2 US-East region (or at least to the AZs that my instances were in), and my EC2 web app servers could not connect to my DB for around 10 hours (during the day time too). If you have a hybrid system, during peak hours (where you have lots of EC2 spot instances serving traffic), networking issues could result in unexpected downtime. This is of course, in addition to the regular SLA downtime you get from either the dedicated provider or AWS, which is not an issue if you have everything hosted together, but could become an unavoidable problem if not.
My experience is that the devops cost drops: Most of the time you're dealing with a much simpler and more predictable dedicated environment - your challenges happen when you have to handle (rare) abnormal spikes.
In terms of inter-datacenter network reliability: You have to deal with this if you want reliable hosting anyway. If you spread your database and app servers across data centers, yes, you are begging for problems and the problem is that your app is not designed for resilience.
But you can easily enough do "hybrid" within the same datacenter, if you opt for any of the number of EC2 alternatives from companies that also do dedicated hosting.
If you are using a hybrid approach (assuming dedicated + EC2), you have to have the ops system in place to deal with the dedicated environment AND the EC2 environment, which is why devops cost will be higher. Of course if you only deal with dedicated hardware your devops cost could drop.
Inter-datacenter network issues are less of a problem when your datacenters are from the same provider, because they are responsible for making sure the connection is good. Plus, when problems do occur, you can troubleshoot fairly easily (when I had the routing problem with dedicated provider + EC2, I had to bounce back and forth with the network support for both a few times before one of them admitted the routing issue with the upstream network provider) as you are dealing with a single company.
I understand the hybrid solutions that other providers such as softlayer offers, but I was mostly addressing the suggestion of using dedicated + EC2 in the original article.
> you have to have the ops system in place to deal with the dedicated environment AND the EC2 environment, which is why devops cost will be higher.
Only if you choose your systems so that they can't be used across both platforms. I don't see why anyone would do that if they want to run a hybrid setup.
> Inter-datacenter network issues are less of a problem when your datacenters are from the same provider
If your data centers are from the same provider, your added degree of resilience is much lower.
> because they are responsible for making sure the connection is good.
That doesn't help you when one of the data centers goes out entirely. Such as when the power needs to be cut for fire brigade safety due to a fire alarm (yes, I've experienced that), or the supposedly redundant UPS's triggers failsafes and causes the entire site to go down (experienced that too), or when one of the sites see cascading failures take out heir entire network (seen that happen too).
Assuming you will have live, working network connections between your locations, and/or that all your locations will stay online is pretty much guaranteed to cut your availability.
Basically, if your systems can't operate independently, adding an extra data center means adding more failure points.
Rackspace and Softlayer both have dedi/cloud hybrid offerings. You don't need to support multiple companies or multiple private networks to get the benefits. They still offer multiple datacenters for reliability.
I am aware of that but thanks for pointing it out explicitly for others. I was addressing more about the original article's suggesting of running a hybrid system of dedicated + EC2.
You can get that flexibility without putting your base load on EC2: Use EC2 (or your "normal" hosts cloud services) to spin up extra servers when needed, and host your base load elsewhere. Extra bonus: It allows you to save even more money by flying closer to the wire with your long-term provisioned servers.
These days there are so many hosts that offer combination of colo + dedicated servers + cloud services that you can seamlessly do both even if you don't want to deal with more than one provider.
I agree with that but things are getting better. Providers like Internap exists now which has "instantly provisioned" dedicated hardware and a hosting API (http://www.internap.com/agile/). I personally have not had experience with them but from I can see it seems to be good.
The hybrid technology you mentioned came from Voxel, which was looking really good before the company was acquired by Internap. Now, it seems like the service has gone downhill: http://www.webhostingtalk.com/showthread.php?t=1213599
I'm trying to figure out where this niche is that EC2 is more expensive than traditional hosting on a price/performance basis and I'm drawing a blank. Yes, they're vastly more expensive than the sticker price on a server (assuming you can size out your server needs years in advance), but most of the cost of running a realistic server are elsewhere, housing/power/cooling, NAS, networking, licences, manpower, spares, security, monitoring... The actual per-hour fees aren't really the top consideration for choosing among the alternatives and AWS is at least competitive on a TCO basis.
The only things I get a big price/performance advantage over AWS for are things I might as well host on my desk, and even there it's dicey.
I think a lot of cloud developers have lost touch about what a modern dedicated server is capable of.
In February our website was top 3,000 on Alexa. 150 requests a second on average looking over that month. Those are dynamically generated pages, since all the static assets are served by CDN.
What handles all that?
3 servers running a PHP application. Oh, and the Postgres database is sitting on one of those servers and replicates to one of the others.
We can do it with 2, but we have 3 so that if some of them dies we don't have to worry.
These are the cheapest servers that Softlayer offers and they run at a low load average.
EC2 is more expensive than renting dedicated hardware. Housing/power/cooling/networking/etc are all built-in to that price, not added costs. Colocating your own hardware is not the only option. You can provision someone else's dedicated hardware in under an hour, all costs included, with 24/7 on-site staff to handle failed components or other issues (which again are included in the price).
I'm speccing out some things I host on AWS on softlayer's dedicated hosting, and I'm not coming up with vastly cheaper prices there either. I can get big (compared to ec2) servers cheap but storage costs eat up most of the advantage.
Softlayer, along with Rackspace, are expensive providers. The price will be similar to EC2 (but the raw performance of the CPUs and hard disks will be much better).
Hetzner and OVH provide the starkest contrast when it comes to price. They are also pretty well respected.
There's a million other choices in between. webhostingtalk.com is the best place to get more info..but I can give you names that have been around for a long time and tend to be liked by customers: webnx, singlehop, 100tb (softlayer reseller), reliablesite.net, hivelocity, netdepot. The list goes on and on, but if you want to see price differences, check them out.
Just last week I ran the numbers on my current DC costs - hardware, cooling, space, network - versus EC2 and it was 60% more expensive per month for EC2.
Quite a premium. Even adding in the salaries for my admin team only drop it a few more percentage points.
You are making it sounds like EC2 versus dedicated hosting. There are other cloud hosting providers such as Joyent who offer much better performance and reliability at lower price. In Joyent case, they even have experts of the whole stack (Illumos, ZFS, DTrace, KVM) working internally.
You pay for convenience. Saying EC2 is slower is like noticing that you have to wait in line at the bank: yea it's slower than just stuffing cash under your mattress, but you don't have to worry about it being stolen, either.
Except that everytime you want to have access to your money you have to go to the bank and line up.
Then randomly the bank won't work, all banks in the region will shut down randomly and you're stuck without money, and you need to pay rent but you can't.
You are invested in certain banks in the region but sometime s it feels like the bank has noisy neighbours, and it takes a while to withdraw money.
So what you're saying is the mattress is better or that the mattress is a false analogy?
I'm happy to pay a premium to not have to worry about the countless things that AWS deals with. Whenever people complain about AWS having downtime I have to wonder how sure they are that their uptime track record would be better.
While you're waiting in line at the slow bank because that's the bank you've used for years, you're ignoring the bank across the street with no line, cheaper fees, and 80% of the same services you need and will use the most.
I don't want 80% of the services and have to build or manage the other 20%. I want 100% of the services.
An app I am running uses ec2, s3, sqs, ses, rds, swf, elb, elasticache, cloudfront, cloudwatch, route 53, and opsworks. Take any of those out and you give me a potential headache, time sink, or out-of-scope responsibility. It all just works and while I am sure I could save as much as 50% by going cheap and rolling it all myself, my time is valuable, I'd have worse uptime, and with proper reservation of resources my aws bills are, in relative terms, marginal.
I have done both building your own data center and running the whole enterprise on AWS. I like AWS better. Some folks have hinted at it here but the ability to spin up a node rapidly is a huge game changer. Ya, you could accomplish some of this with open source and your own hardware but the flexibility will pale in comparison to what you get from Amazon. A year ago, I was all on board the OpenStack train. I think it is great by way of keeping Amazon honest and therefore keeping prices in check. I'd still pay a premium for Amazon over a public OpenStack or a roll-your-own solution.
>> At our most expensive location, we get 64GB ECC RAM, 4x1TB RAID 10 (hardware), Dual E5-2620 and 4 1Gbps adapters (bonded, 2x public, 2x private) for around 60% the price of an M3-2XLarge.
M3-2XLarge = $720 per month, 60% = $432
As a softlayer customer, I would be very surprised if you can get 2620/64GB/4TB disks at this price range, we also consider ourself not a small customer, but for the above config, I guess we need to pay $800+. (after discount)
We get access to Softlayer's vpn, their control panel..pretty much everything.
Years ago, at a different company, we used Softlayer proper...you can negotiate prices down by huge amounts..even so, they are overpriced when you take into account their reseller. (Note, I know the deal says 48GB, but we have 10 or so of those servers, and they all came with 64...ssshhhh).
I wish that the author had actually included a benchmark with nice pretty graphs. The list of example hardware is nice, but doesn't tell me nearly as much as a couple of benchmarks would. Blanket statements such as "EC2's price-to-performance ratio is horrible" don't really do much to convince me. Yeah, the author is probably correct. But to what extent is he correct?
I really like the concept of Amazon EC2 but it's incredibly expensive and the performance is horrible.
Give me a full 1 gigabits connection, a few ipv4 adresses, all ports open and I'm all good. 90 % of all businesses are okay with this.
The server business is getting too centralized and it's bad for the internet.
I've been using Singlehop.com for over three years, and I couldn't be any happier. I rented a dedicated server in which I can create and destroy as many VMs as I want. I didn't have to setup anything. Once I got the dynamic server everything was available for me. No sweat, no headache, same price every month.
The amount of FUD in this article, and on this thread are amazing.
The folks who claim it is more expensive to run AWS over on-premises infrastructure are either not counting their total costs (power, staff, off-site data storage, co-location) or not architecting things properly on AWS.
We use SoftLayer to run a few hundred nodes and they have issues as well. For example, their network sucks and they have high rate of hardware failure (sometimes hard drive failures are off the charts).
I definitely agree with you on the disk failures, and that's something I need to bring up with our account manager.
The above said, the network has been solid for us at SoftLayer for the last 1 1/2 - 2 years (things were rocky for a little while before that as the client base outgrew their capacity). Of course, your systems might be in different data centers to us so mileage may differ.
Do the small ec2 instances have the same performance problems as the micros? I just told my boss the other day I wasnt comfortable usings micros in production. Surely the smalls perform better?
Micros are really a completely different class from smalls and larges, because they offer "burst" performance. You shouldn't run anything in production on a micro, unless it's non-critical and not public-facing. Perhaps a 100% static web site, although even that I'm not sure about.
They can be appropriate for production workloads. You just need to understand the consequences of that burst performance. You also need to understand that you'll hit the invisible CPU usage limit without any notice.
You can actually run static web site from S3. Its really easy to setup, you can point your domain at a public S3 bucket. I put together this small website (http://www.de-encode.com/) to play around with it, seems to work well.
Yes they perform better. Benchmark it. But, if you plan on using a few instances that'll be on all the time, and none of the more advanced features....you'll get much better performance, for less, by using someone else. For small setups, Linode and DigitalOcean tend to be much loved (and Hetzner if latency to Europe isn't an issue).
Yeah I have some Linode, Rackspace and DigitalOcean servers, and yes they all seem to be a lot faster than the tiny EC2 instances. Never tried a small EC2 though.
Route53 is _nice_. ELBs are _nice_. Cloudwatch is _nice_. Security groups are _nice_. You can get similar stuff with most providers, but AWS is a better holistic package for my money.
Lets take the m2.4xlarge reserved instance type, it costs somewhere around $2k/month on AWS, but with a "generic hosting provider" it costs about %10-15 of that. Don't forget the 5-20% virtualization performance penalty either. The tradeoff is that for $2k you can get it pretty quickly and shut it down whenever you want, but if you spend more than 80hrs a month with that in use you should really get a dedicated resource somewhere else...
OVH has a datacenter in Quebec. We use OVH in France (OP, btw). It started off because 1 system needed SSDs, and Hetzner isn't great about SSD. Once we had 1 system there, it was easier to put them all there.
Doing it again, I'd pick Hetzner over OVH. The OVH machines are great (weird disk partition though) and they've worked reliably..but the management console is horrible and while the support is fast and helpful, it tends to take a couple back and forth to get to a meaningful conclusion.
Our interactive server-based app (TeamSpace) absolutely cannot run in AWS/EC2. The latency is the killer. Even when throughput is high, latency per message can be 100s of milliseconds in/out of a datacenter, which is on top of our app latency and our client latency, leading to a miserable experience.
It's almost funny to notice so many people got so brainwashed with "the cloud" hype - especially AWS hype - they completely ignore the alternative and end up paying more for less.
"But hey, we're elastic. Gigity!" :-)
There are many use cases for which EC2 is exactly perfect and not overpriced.
As a consultant recently I was tasked to build a service which (rather not say), I had good data on requests per day, total data storage, factored in ELB, etc.
The cost was less than 2K a year. Now consider that the company refused to buy hardware or pay anyone to support it because they had a full staff of sysops maintaining their (rather not say) at great cost in their own data center.
2K is absolutely nothing to a company. Most of us are not Netflix or Dropbox. Don't pretend you have those kinds of problems if you don't. It could have cost 3x that per year and they still wouldn't have noticed it.
AWS for manageable workloads is dirt cheap at scale. I think you'd have to be nuts not to use it.
I see these complaints a lot, and they are certainly accurate, but they miss the point. I use AWS because when an app server explodes, a new one is automatically spun up and added to the load balancer pool. The OS image is loaded from s3, which I don't have to worry about backing up. The app is loaded from s3, the templates are loaded from s3. I don't have to do anything. I would absolutely love to use dedicated hardware for the consistent performance. But where can I get dedicated hardware that lets me provision new servers automatically and have them running with our custom OS image in minutes?
S3stat's nightly job takes about 60 hours to run, and it needs to start and finish between 3am and 6am every morning. Amazon kindly keeps 20 machines ready to do that for me and only charges the time I'm actually using them. That's pretty amazing, and well worth the price in my mind.
So yeah, if you need a box to run your webserver 24/7/365, you can find a better deal elsewhere. But that's really never been what EC2 is for. To continue the example, S3stat.com lives in a cage at a colo since, as the author points out, that's a much better deal than running it on EC2.