Hacker News new | past | comments | ask | show | jobs | submit login
Google’s Gmail Outage Is a Sign of Things to Come (businessweek.com)
30 points by Libertatea on Dec 18, 2012 | hide | past | favorite | 10 comments



This article is FUD. Complete nonesense really. Continuous integration had nothing to do with the Gmail outage (which mind you was just for the web app, no emails were lost and since my phone/desktop clients kept working I only knew Gmail was down because of Twitter).

> The search giant reported that it conducted an update of its load-balancing software from 8:45 a.m. to 9:13 a.m. U.S. West Coast time, and after the problems were detected it managed to quickly roll back the buggy code. But this didn’t stop some people from questioning why Google would roll out a software update during peak e-mail hours on the West Coast.

Perhaps because the update wasn't to Gmail, it was to the load balancing system? It's not uncommon to have issues arise in load balancing while under load. I'd say the engineers did a good job dealing with the problem all things considered--load balancing issues are notoriously tricky.


When exactly is a good time for Google to release code? Facebook? Twitter? These are global markets. There are over 1 billion smartphones in the world, alone, and less than a tenth are US; I'd gander most are used to interact with one of these or similar big guys that do "continuous deployment."


The best time to deploy is the time with the best engineers at the ready. If you're updating component X, you should do so when the stakeholders of X are online. It sounds like Google was able to solve things quickly, I don't think deploying at another time would have been any better.


I'd even argue that the best time to deploy is during the best engineers' working hours. At least you guarantee they're around when/if things go bad.

And by "best", I don't mean all stars (a la Fitzpatrick, Norvig, Pike or other Google celebs), but the engineer who actually followed the whole project/patch from dev environment to production.


Absolutely, I started editing in a quip about how unscalable, even for Google, to keep engineers that both want to work on hard problems and stay up until 4am to push/support them. Best time to push, first hours I'm arriving, had a cup of joe, and have had a good night's sleep to work out any further mental bugs.


> The best time to deploy is the time with the best engineers at the ready.

or to put it simple: not at 1700 on friday.


> When exactly is a good time for Google to release code? Facebook? Twitter? These are global markets

The overwhelming vast majority of the world's population lives between approximately 120 west longitude and 120 east longitude. That's just 2/3 of Earth. If you take a globe and orient it so you are staring at the middle of the Pacific Ocean, you won't see very much land.

This means that even sites that are truly global should still have a strong 24 hour cyclic variation in their traffic.


That's true, but anecdotal data would give me at least half your peaks are still online. Definitely higher for sites like Google. Is it really worth the other "non-continuous deployment" risks to not deploy against a quarter of your base? Let's say you really have a lower-bound of half your upper-bound; what if your deploy is only going to go south with somewhere in between? Now your engineer not only stayed up until 2am PST, but he is getting woken up at 6am PST to fix a hellstorm.


in a nutshell: gmail went down for 18 minutes because of continuous deployment[citation needed], which they explain a bit of the rationale for. Then they come back to the headline, basically saying that as more companies do continuous deployment, we might see more breakages, though probably briefer.

Not the most focused article, but kind of cool that they wrote about such a thing. Though the headline feels like pure linkbait.

They also missed my personal favorite attribute of continuous deployment: isolated problems. If you deploy 10 features and something breaks, unless they're completely orthogonal you now have more than one place to look/person to ask/team to deal with. If you deploy one and it breaks, you know where the problem lies.


> Not the most focused article, but kind of cool that they wrote about such a thing

Why? GigaOm writes about that sort of thing all the time. (BusinessWeek is just syndicating here, it's a GigaOm article: http://gigaom.com/cloud/why-you-should-expect-more-online-ou... )




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: