Hacker News new | past | comments | ask | show | jobs | submit login
Google App Engine Broken For 4 Hours And Counting (techcrunch.com)
61 points by peter123 on July 2, 2009 | hide | past | favorite | 46 comments



this is a bit like flying vs driving. If you're in the drivers seat you have control - you hope - and your destiny is in your own hands, if you're in a plane it is someone else driving (unless you are an airline pilot). The accident rate is lower for planes per mile flown but if it goes wrong then it usually does so in ways that make the headlines. Still, more people die driving than flying.

When the 'cloud' goes down (or at least some part of it) then you'll notice this immediately because of the large number of sites going down all at once. But when you compare it with the accumulated downtime of all those users had they not been 'cloud users' but hosted on their own kit then it is very well possible that the balance is still in favour of hosting in the cloud.


Nice analogy but like all analogies, it's too simplistic and flawed. You left out one important and critical part: the hypothetical passenger in your example is tied to a specific plane/airline. If you don't like your pilot or plane type, you cannot move to a different airline or request a different plane or a different pilot since you're chained to the specific plane.

Due to Google App Engine's API lock-in, you're stuck with them as a provider... quite possibly forever due to heavy BigTable dependency.

Even though I'm a huge fan of cloud computing, I'd rather use a strategy that uses platforms/planes that are built from reusable parts and allow you to switch your plane/airline provider as you please. Don't like Delta? Just go to AA counter and you don't have to change your luggage, clothing etc.

Until there's a second, GAE-compatible, ISV provider that offers full compatibility with GAE, I'd avoid GAE like a plague.


I'm sorry sir, but if my pilot is stuck in a thunderstorm and I don't think he knows what he's doing, I can't "request a different plane."


It's a metaphor, not a description or some sort of iron law of physics. If I was on Google App Engine right now and there was a competitor that I could switch too, then I damn well could be up, especially if I took the opportunity to keep both options actively available for myself. No matter how hard "switching planes in midair" might be, it's just a metaphor.


Yes but if you survive the flight you can switch after you land.


That's not a fair comparison. You can't switch mid flight or mid cloud crash. But you can evaluate safety records every time you feel inclined and swap your airlines at some point.

If GAE fails to live up to the better-then-DIY-on-average promise, you can't leave.


or at least carry a parachute. business continuity plans should really be part of the spec for everyone who's making money from their app.


Does that account for 6 hours of downtime with minimal information as to what's going on? Good luck with that!


Actually, if this thing works then there is no longer an API lock-in with Google App Engine: http://code.google.com/p/appscale/

Assuming you can still dump your data out of Googlage?


all understanding of the real world is simplistic and flawed. Even your analogy because there is no good cloud as service provided that offers anything as good as big table for distributed storage plus map reduce.


I think a better analogy for cloud computing is electricity. For mission critical applications investing the time and effort into a power / app backup is probably a good idea, but the onus is on us.


Actually, as django abstracts the GAE-api, there's still a way to escape.


Great analogy. I would rather have all the big Brains at Google working to solve the problem instead of my puny brain working trying to restart my server


More specifically:

We have no real network administrators. Within our engineering team we collectively have the skill to be an effective at systems administration, but the hardware side is really a complete mystery.

Co-location mostly solves that, but the cloud takes it a step further. By running in a virtualized environment we can handle what we're good at it and let others build complex data centers to scale our traffic.

When your bootstrapping a start-up, that's just huge.


exactly, shit happens, if you host on your own, you are just as vulnerable to power outages etc. + at least when a cloud goes down, they have hundreds of pros trying to fix it


hope not. too many cooks?


I do my own hosting, but I agree with this completely. Downtime at any scale is inevitable, but with Google's massive infrastructure you're far better off.


A lot of sites get 99.99% uptime easily. It's not difficult when you know what you're doing and have plenty redundancy. When you depend on any cloud stuff you can't do any kind of graceful degradation. Google goes down, you go down. Google goes bankrupt, you go bankrupt.

Considering how cheap dedicated servers are moving a service to the cloud makes little sense (exceptions notwithstanding).


99.99% is only 52 minutes of downtime a year... I don't think any of S3, Rackspace, Goole AppEngine, or even www.amazon.com have uptime that good this year.

Getting that kind of uptime is much harder than it sounds and for a lot of websites not worth the extra cost. How much money would pay to go from 99.9% uptime (~9 hours /year) to 99.99% (52 minutes)?


I disagree. For a small to medium size site, getting 99.99% uptime should basically be the default. If you have a competent staff and a decent provider, about the only thing that will take you down is a power/equipment failure at the DC - which does happen, from time to time, admittedly. Yeah, Rackspace had some problems this year, but most of the other tier 1 hosts have been rock solid.

Obviously it's more complex for a large site but for the vast majority I would say that level of uptime is the rule rather than the exception.

Three nines is pretty unacceptable in this day and age. Providers might only guarantee that level of uptime but if they really were down that much I'd run a mile. Netcraft is your friend!

update: edited post to better reflect reality


How are you defining a site and uptime? Are you talking about dynamically switching systems to allow for upgrades without downtime?

I'm mostly wondering how it compares with the setup at my current location.

I have worked on systems that were designed for this, but I'm not sure if it's cost effective for most web applications. Most things can wait on occasion.

Thank you.


and what happens when an engineering fluke hits your KV store and locks it up. And it takes you 10hours to fix :)

You see your not actually paying for the 3 9's or the 5 9's. That is just corporate bullshit. Your paying for the promise that whatever the problem someone will be able to fix it in a few hours at most.


The hardware may be inexpensive, but "knowing what you're doing" and being able to pull 5 nines is definitely not cheap.


5 nines is not something you get on paper worth the ink normally. SLAs in the civil sector generally top out at 3 nines. Very, very rarely you talk about 4 and those are dealt out by the insurance company, not by the service provider.

I'm talking about real SLAs with compensation here, mind you, not the toilet paper you get from every cheapo ISP.


The cloud is just an enormous SPOF. I like having my own kit somewhere in the loop if only to throw a proper error message.


The bad thing is we can do nothing but wait. The good thing is we don't have to do anything but wait ;-)


Now fixed again, as per my own app and as per http://groups.google.com/group/google-appengine-downtime-not...


Not sure why I was downvoted: It just got fixed a few minutes ago, as stated on the link and as per my own app.


I guess anything I post in this thread is going to be downvoted :)



I disagree. The GAE status page was down for hours: http://code.google.com/status/appengine

What's the point of having a status page it's only up as long as your service? We shouldn't have to hunt around a Google Group for information about what's going on.


What's the point of hosting in a cloud if there's still a single point of failure like this? I realize that it's currently free, but I thought one of the main advantages of moving to the cloud was redundancy and fault tolerance.


Cloud outages may not be frequent, but they sure are noticeable.


I'm so tired of reading about web servers/services going down.


You are supposed to say something clever like, "Too bad TechCrunch isn't hosted on App Engine..."


This is worse than the 8-hr outage of S3 sometime ago... most apps could still respond without S3 static assets. If your entire app is hosted on AppEngine, you're screwed for 4 hrs and counting...


> This is worse than the 8-hr outage of S3 sometime ago

I don't think so. Javascript files hosted os s3 would hang the page loading and without css/images the app would be useless too.


But you could quickly recover by hosting those JS files yourself and relinking them. If your app is coupled tightly with AppEngine APIs, then there is nowhere else you can host your app.


Theoretically you could fire up AppScale on EC2: http://code.google.com/p/appscale/


What about your data? Most web applications have persistent data of some sort that is vital to the user experience -- without it, you don't have much of a site.

[Edit: And keeping a hot copy of that data is a lot harder than it sounds]


GAE still had read-only access to data so it would be possible to backup and move elsewhere


At least we got a detailed explanation from AWS as to what happened and what was put in place to prevent it from happening again.


multi-tenant architectures are the geocities of cloud computing. This is the main problem with something like gae, if they have an internal problem it takes down their entire cloud and all the apps with it.


which means that a lot more people are mad and it gets fixed sooner. Have you ever had a problem with cable tv or phone service that just effects your home? It can take weeks. If it effects a whole city, it is fixed within hours.


hmm... My appengine site is running just fine

Update: Actually, my app is in the "read only mode" they described... the moment I tried to update anything it went to hell :)


I concur, my blog on App Engine is down.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: