Outages like these don't really resolve instantly. Any given production system t...

truthseeker11 · on June 7, 2019

Unless I’m misunderstanding Google blog post they are reporting ~4+ hours of serious issues. We experienced about two days.

If it was possible to have this fixed sooner I’m sure they would have done that. That’s not the point of my comment tough.

jacques_chester · on June 7, 2019

The root cause apparently lasted for ~4.5 hours, but residual effects were observed for days:

> From Sunday 2 June, 2019 12:00 until Tuesday 4 June, 2019 11:30, 50% of service configuration push workflows failed ... Since Tuesday 4 June, 2019 11:30, service configuration pushes have been successful, but may take up to one hour to take effect. As a result, requests to new Endpoints services may return 500 errors for up to 1 hour after the configuration push. We expect to return to the expected sub-minute configuration propagation by Friday 7 June 2019.

Though they report most systems returning to normal by ~17:00 PT, I expect that there will still be residual noise and that a lot of customers will have their own local recovery issues.

Edit: I probably sound dismissive, which is not fair of me. I would definitely ask Google to investigate and ideally give you credits to cover the full span of impact on your systems, not just the core outage.

truthseeker11 · on June 7, 2019

That’s ok, I didn’t think your comment was dismissive. Those facts are buried in the report. Their opening sentence makes the incident sound lesser than what it really was.