Cloudflare: Your status page showed "all systems operational" for over 20 minutes while your primary domain was returning a 502 error. Please change this to update automatically, many other engineering teams depend on you.
Sadly this reminds me of AWS outages too where the same applies. How is it that hundreds of developers know there's an issue before AWS do, or Cloudflare in this instance. See my blog post on similar AWS uptime reporting issues at https://www.ably.io/blog/honest-status-reporting-aws-service.
At Ably, our status site had an incident update about Cloudflare issues being worked on (by routing away from CF) before Cloudflare did: https://status.ably.io/incidents/647
We have machine generated incidents created automatically when error rates increase beyond a certain point stating "Our automated systems have detected a fault, we've been alerted and looking at it". See https://status.ably.io/incidents/569 for example. I think much larger companies like Cloudflare and Amazon could certainly invest a bit in similar systems to make it easier for their customers to know where the problem likely lies.
Heh, I am reminded of when the control plane at AWS went down... and we had a custom autoscaling config that would query for the number of instances running and scale appropriately... but when the AWS API died... we kept getting zero running instances...
So our system thought none were running and so it kept launching instances....
These were SPOT instances and thus only cost like .10 per hour...
But we launched like 2500 instances which all needed to slurp down their DB and config - so it overloaded all other control plane systems...
We had to reboot the entire system. Which took forever.
The only good things was this happened at 11am - so all team members were online and avail... and then AWS refunded all costs.
---
The other fun time was when a newbie dev checked in AWS creds to git - but he created the 201th repo (we had only paid for 200) -- and as it was the next repo which wasnt paid for, it was by default public - thus slurped up by bots asap - which then used the AWS creds to launch bitcoin mining bots in every single region around the globe. Like 1700 instances.
The thing that sucked about that was it happened at like 3am and we had to rally on that one pretty fast. AWS still refunded all costs...
At least in the case with AWS, unfortunately there's business involved - because of their uptime guarantee, incidents that would be called downtime by a purely technical team are left as "operational" or "partly degraded". Otherwise, they might have to shell out millions or tens of millions.
You have to provide "your request logs that document the errors and corroborate your claimed outage" for the AWS Compute SLA https://aws.amazon.com/compute/sla/
It might have been cached on the server but was not cached on my clients: I did a forced refresh and used a private browser window but it only showed the “almost everything is good” view throughout the duration of the incident.
They should derive whether something is down from how much traffic the status page gets, and have alerts tied to that as well. I think it would be pretty accurate.
Interesting, I was command+shift+R refreshing and also tried from a VPN in another region. Perhaps our CF-hosted sites returned 502s in my region sooner than yours, causing me to check the status page sooner.
Seems unlikely, I don't use cloudflare have never visited the status page before. I noticied several 502 pages this morning and searched for "cloudflare status page" and saw the "all systems operational".
It isn't in their best interest to have a reliable status page. They want status page to contain information about failure only after people know about issues through other means. For the same reason cloud providers don't provide rebates on outages unless you ask for it explicitly.
I think the only way is to have status page operated by an independent 3rd party, but I don't think there's a viable business model for someone to provide such service. Perhaps there might be even a risk of lawsuits against you.
Once cloudflare.com came back I decided to check out their business SLA, and it's not very encouraging:
> For any and each Outage Period during a monthly billing period the Company will provide as a Service Credit an amount calculated as follows: Service Credit = (Outage Period minutes * Affected Customer Ratio) ÷ Scheduled Availability minutes
So assuming an outage affects 100% of your users (this one seems like it did, but that's not clear), they only refund the time the service was offline? According to pingdom this outage lasted ~25 minutes, so that's 25/(31 * 24 * 60) = .056% of our bill, roughly 11 cents.
It sounds like you just don't pay for the time the service didn't work, which isn't much of a guarantee, that's just expected (of course you shouldn't pay for services not provided). Most SLAs for critical services have something like under 99.99% uptime you get 10% of your bill back, under 99.5% you get 20% back, under 99% you get 50% back. (*Numbers completely made up to demonstrate the concept.)
Am I misreading this? Morning coffee hasn't kicked in yet so maybe I am.
This doesn't surprise me at all - SLA's are widely overrated. No SLA will cover damages incurred by lost business due to an outage. What you likely want is some kind of third-party insurance for downtime caused by outages out of your control - but I'm not even sure this exists.
I'm definitely not suggesting CF should cover losses. Sorry if I gave that impression. That would effectively require them to be an insurance company since they'd have to investigate claims, and possibly charge customers differently based on risk. (i.e. you don't want to bill a customer $200 per month if 10 minutes of downtime could lose $20 million in sales.)
I mean something like Amazon EC2's SLA (https://aws.amazon.com/compute/sla/) where credits are proportional to downtime, but not 1:1. i.e. they credit 100% for >= 5% downtime. With Cloudflare's SLA, 5% downtime (1.5 days in a month) would only give you a 5% credit.
There are different types of contingent business interruption insurance available. If you're large enough (like Fortune 500), you can negotiate the terms of your policy.
> In the rare event of downtime, Enterprise customers receive a 25x credit against the monthly fee, in proportion to the respective disruption and affected customer ratio.
That's the business SLA. The Enterprise one is 100% uptime with a 25x payback so those are the ones they are focusing on keeping up. We were with Verizon before CloudFlare and their SLA was a similar pay back for outage. I think this is pretty typical, what service did you have that had a different setup for the SLA?
* 99.0% <= uptime < 99.9%: 10% service credit
* 95.0% <= uptime < 99.0%: 25% service credit
* 0.0% <= uptime < 95.0%: 100% service credit
But CloudFlare's Enterprise SLA (25x credit) is similar or maybe even a little bit better (because you get to 100% at 96% instead of 95%). Of course when you are doing an Enterprise deal you can negotiate for whatever terms are mutually acceptable as long as you're willing to pay.
In any case, the function of the credit policy is to ensure there is enough pain for the provider to put in place the quality / reliability practices, process and code to protect themselves from losses. IMO most sustainable business pull in much more revenue per hour than 25x their CDN cost.
It would be interesting to know how CloudFlare's infra and processes differentiate free, Business and Enterprise customers.
I left Cloudflare for AWS a long time ago despite CF's affordability since they didn't seem to care that much about uptime or quality. Their frontend was corrupting response bodies + caching response bodies (in retrospect, this was probably pre-discovery cloudbleed) and there was no way to get a response or help with it.
Throughout this outage, https://www.cloudflarestatus.com/ continued to show green - all services operational, with almost all services marked 'operational' and some vague cryptic message about users in this region being affected:
Investigating - Cloudflare is observing network performance issues. Customers may be experiencing 502 errors while accessing sites on Cloudflare. We are working to mitigate impact to Internet users in this region.
It seemed very much like a global outage affecting all services. Is this status page not automatically updated with service status, or is it just manually updated by humans? Even if manually updated, surely when posting that status message, the status of all the services should be set to degraded?
This is not my experience, I have received many updates both through email and updates to the cloudflare status page throughout the incident, except for possibly the first 10 minutes
You're probably talking about the yellow note at the top of the page (which is still there, with a little more detail now). That was updated and is fine.
I'm talking about the service indicators for each service lower down the page, which remained green throughout and appear to be just decoration, not an actual indication of service status, they all said operational throughout the incident (I reloaded a few times).
In particular I'm thinking of the Cloudflare Sites and Services section.
A great example for why you shouldn't transfer your domain to Cloudflare Registrar if you're also using their CDN. Those who have transferred their domains cannot change DNS servers to mitigate the outage.
One thing is not every free or pro plan on cloudflare is personal use.
I'm running the web servers, official wiki, and game external resource portal for the most active open source video game on github, through cloudflare, and maybe we might not want our 60 million requests a month website to go down when cloudflare does.
Because I can tell you right now our 300 a month budget (that mind you, is capable of covering 7 game servers that can handle 100 connected players (each)) can't take the 80 dollar hit just to make cloudflare not a single point of failure.
If you don’t count this as a personal or hobby project it’s a community project. They don’t have a pricing plan for that so you either have to go pro or go to some other provider who gives you this much for free. Why should a company give you even more pro features for free if you are already getting a lot of things for free?
Ah, used to be an Enterprise only feature. We have access to that, but didn't realise it's now available to Business. Perhaps time for them to consider offering it more widely!
Changing your NS records at the registry could help, but keep in mind most TLDs are serving NS records with 1-2 day TTLs, so you'll still see a lot of traffic going to the old server.
If this is something you want to be able to mitigate, you really need to be running a seperate DNS infra from your hosting/CDN and use short TTL cnames to delegate hostnames to the CDN. This becomes a big challenge if you host on an apex domain (eg example.org instead of www.example.org), so don't do that.
Problem is not just about using Cloudflare as your DNS registrar really. Even if you have a different registrar, the Cloudflare model is to have the NS (nameserver) records set up to point to Cloudflare, and then they in turn resolve the DNS. You cannot really use Cloudflare without that set up. Changes to nameservers at a registrar level are rarely quick, at least quick enough to mitigate a disaster like this. It's why we've used two completely different domains at Ably (ably.io and ably-realtime.com) for all services we provide.
We wrote about a strategy to circumvent this sort of thing a little while back https://www.ably.io/blog/routing-around-single-point-of-fail.... Given two incidents in a matter of weeks, I think a revisit of that article in light of most businesses who operate on a single domain would be useful :)
For about 30 minutes today, visitors to Cloudflare sites received 502 errors caused by a massive spike in CPU utilization on our network. This CPU spike was caused by a bad software deploy that was rolled back. Once rolled back the service returned to normal operation and all domains using Cloudflare returned to normal traffic levels.
This was not an attack (as some have speculated) and we are incredibly sorry that this incident occurred. Internal teams are meeting as I write performing a full post-mortem to understand how this occurred and how we prevent this from ever occurring again.
I agree that the centralization of the internet is troubling, but Cloudflare is solving a systemic problem that nobody else is tackling. The DDoS problem was not being solved for anyone except for enterprise customers until Cloudflare came along and to this day there is very little competition in this space. Your post makes it sound like making a "mesh based DDoS" system is somehow trivial. Who is going to pay for this? How does it work? How do you ensure latency is not atrocious? Why hasn't someone made this already? Cloudflare at least has a financial model that can be sustained, and it doesn't include harvesting all of our personal data.
Without CF, many websites would not stay on-line during an attack. And they would cease to exist because many of those places would never be able to afford DDoS protection. I know so many sites, including ones I run, that I would not be able to keep on the public internet without CF DDoS protection. There really is no real competition in this space.
I think we need to consider the fact that while this outage does take a lot of sites off-line at once, it is temporary, and it is still extremely rare. And the alternative is potentially that many websites would cease to exist at all, period, without something like Cloudflare existing.
> Your post makes it sound like making a "mesh based DDoS" system is somehow trivial.
It does not. "There has to be" ends in "!" and is an expression is a wish.
> Who is going to pay for this?
All of us. Everyone. I keep circling back to the idea that we should all join a protection ring, so we'd all share "cost" on this.
> How do you ensure latency is not atrocious?
> Why hasn't someone made this already?
First one is very technical, and the answer is most probably by localisation and by only actually turning it on when needed.
For the second: because it's a very hard technical problem and there is basically no money in it. Business value maybe, but it would need people and companies to collaborate, it would probably need committee level decisions, and so far, nobody wanted to deal with this.
Or at least that's my theory.
EDIT
Maybe dat:// will eventually become a viable option, and with that, due to the distributed nature, this kind of DDOS protection is sort of built in.
This seems like a case for putting more of the internet through a single gateway. Having my downtime correlated with everyone else's means users will be more forgiving because they'll perceive it as "the Internet's down" rather than "lkbm's site is broken". (We saw this with CloudFlare. Some users were pissed, and others jumped in with "it's not their fault; AWS is down". That doesn't happen when our stuff specifically goes down.
We need to decentralize the Internet, but this occurrence is not the reason. It's an argument to keep consolidating.
It's a horribly short-sighted, irresponsible and dangerous attitude. When your service goes down on its own its users can switch to some backup process temporarily. If half the internet goes down, they are screwed. How much they're screwed depends on how they're using your services at the moment, which you most likely can't even know.
The argument being made above only holds water if Cloudflare has worse uptime than smaller providers, and that it's because they're big.
This is often the case when monopolies realize they're in a position where they can get away with sucking, but I don't believe that to be the case with Cloudflare yet.
I get your point. But I'm not sure I can ever grok the mindset of someone who thinks, "good, that'll teach 'em". Not sure I'd ever want to work with that individual.
What if that behavior isn't self-rewarding and is actually expensive? Not using Cloudflare, Google Maps, Analytics et all means you need to use something else, need to spend attention points somewhere, need to pay for the services. Very few people will do that because "it's the right thing to do".
I don't think there is another cheap & easy to administer way, or more people would be doing that. Also CF is nice for the slight protection it provides against obvious bots and mass hacking attempts. And decreasing requests for static page resources. And it also (as long as your website doesn't leak its own ip, i.e. not sending emails except through Received-path-scrubbing services) hides your ip somewhat, meaning if you're careful you can run a website from your home on a raspberry pi without any issues.
I agree on principle, but as an end user, I must selfishly disagree. I find life after Cloudflare to be better than before it.
There's a Cloudflare location right next to me, and that improves latency across many of the websites I use, and makes their public DNS service the fastest.
I just wish they had active competition in the same space. CDNs have network locations nearby but they don't offer an easy UX for relatively unexperienced website owners, and DDoS protection services usually have less network locations than content CDNs.
With a lot of these things, if people can't agree to make a standard system to do all of it just as well then there's going to be a big company that does so. We have had this with e-mail, signed exchanges, DDOS protection and will probably have it with many other things unless people pull themselves together at least after these proprietary solutions are created and create better alternatives.
Their free CDN is my biggest draw to their service: I can handle millions of requests per day with a $5 VM, sane caching headers, unoptimized code, and Cloudflare free tier.
More importantly, their admin dashboard is down. It's impossible to bypass their "orange cloud" proxies and send traffic directly to our hosting. That they can't flip a switch and have their nameservers send dash.cloudflare.com to a separate piece of redundant infrastructure is mind-boggling.
Seems like even other registrars might rely on Cloudflare (e.g. Namecheap) so now people have to continuously ensure there’s no cross-pollination between their infra providers...
I think the only option here would be to change our name servers at registrar level to point to AWS and recreate all DNS records there, but then you have to deal with name server propagation.
That number matches what I am seeing on StatusGator: Of the 438 status pages we monitor, 52 of them are showing some kind of warn or down notice right now. That's almost 12%.
Though some of them might not be because of Cloudflare, the ones I spot checked all do appear related. Medium, DigitalOcean, Shopify, CodeShip, Pingdom, and many more. The impact is staggering.
Update - Cloudflare has implemented a fix for this issue and is currently monitoring the results.
Description:
Major outage impacted all Cloudflare services globally. We saw a massive spike in CPU that caused primary and secondary systems to fall over. We shut down the process that was causing the CPU spike. Service restored to normal within ~30 minutes. We’re now investigating the root cause of what happened.
Don't want to go off topic, but if I want to prevent my website going down because of stuff like this in the future, will having back up DNS entries solve the problem?
I know DNS will fall back if it can't reach a service, but would a 502 trigger that?
Yeah, you'll want to keep your domain registered through a different registrar and if CF goes down you can update your DNS Name Servers point from CF to something like AWS Route 53.
This has a few drawbacks like making sure your Route53 configuration is identical to your CF config, ensuring your origin servers can cope with the additional load if CF caching isn't available and the DNS propagation time required for the Name Servers to update.
During the last outage, we were able to get into the CF dashboard and simply disable the proxy which allowed our clients to access our origin servers directly but this time we can't even get into the Dashboard.
Yeah, if I had access to the DNS records this would be easy, but like you said, even the dashboard is down.
Ideally, I'd want something where if Cloudflare goes down, I don't have to change anything, but... 502 isn't going to trigger that without some work on my part.
If you received a HTTP 502 then DNS must've already resolved. Browsers typically will do a DNS lookup, and then try establishing a TCP connection to one of the returned hosts. Its only if it can't establish a TCP connection to a host will it (sometimes) try another host from the DNS response.
Since I'm mostly thinking about static sites, I could also have something local that pings the site, and if it goes down, it could update my nameservers to point elsewhere on its own.
I'm not sure if the browser receives all A/AAAA records from the syscall, or just one. I guess that if the browser has the whole list, and the error is in the 500 range, it could retry a different IP but I'm not sure if browsers do this.
DNS isn’t handled by the kernel, it’s handled by the network library runtime and that does return a list of addresses (I think it actually has you iterate through them in C anyway.)
What would solve it is having your DNS have a low lifetime and then changing the DNS to point to not-Cloudflare. It would still be down for some users as long as the old (Cloudflare) DNS is cached, though.
A problem with this though is that some registrars take hours to propagate, by the time you have it switched it will have already likely been resolved. If you spread that across hundreds of customers, you'd have a bad time.
Browsers would deal with it just fine (assuming the site is down hard and not responding with errors). Its some of the API tools and old libraries that may not. They would need retry logic that mimics the browser cycling through multiple A records. OTOH, API tools that have retry logic would just keep trying until the errors clear up. A browser will stop retrying when something responds unless there was javascript running in memory that had retry logic.
If there are errors, the site would need to be modified to not respond if broken and unable to proxy to a working origin. Perhaps CF have not coded their proxies in this manor.
The lookup is essentially random. If you point DNS to 2 IP's and one of those goes down, then (without going into detail) half of your requests will fail.
On one hand I think "Maybe I should diversify my infrastructure."
And on the other I think "But one of the biggest upsells was convinience."
And it's fortunate I don't have a third hand, because I'd be thinking "Oh crap oh crap I just migrated a client website to LightSail + CloudFlare saying how super awesome and robust it would be."
But it's okay now because it looks like everything is back up!
That didn't seem like trolling – just a public call for Verizon to follow internet best practices. Given that most large ISPs treat failures as a PR exercise, that's probably necessary.
They reached out to Verizon privately, a Tier 1 carrier with expectations and responsibilities as a good netizen, and got no response.
They attempted to reach out through Verizon's public forms of communication and got a bullshit irrelevant CS response despite requesting escalation.
They then called out Verizon before the community as a whole.
They don't have the luxury of waiting for a well prepared letter from some Verizon lawyers. Modern day customer expectations don't allow for it. You may call it trolling, but all I saw was a company asking another company to stop pissing in the public pool.
So what should the ideal redundancy plan be here? If you can't log into the CDN provider and they are down do you you just have a second one ready (and paid for) and then log into your registrar and be ready to switch to that secondary CDN provider in this scenario? Or is there some sort of load balancing / routing solution between CDN's that I don't know about / understand?
If you use Cloudflare nameservers, you have to change to new nameservers, wait for that to propogate, and then wait for clients cached records TTLs to expire. So it will be a major disruption no matter what you do.
Unless you need EV you can just pull some wildcards from Lets Encrpt (as long as you don't use pubkey pinning). No need to automate as it's just a one off.
The Cloudflare DNS-over-HTTPS resolver was serving up 502 errors as well, though the standard port 53 UDP resolver was working. This event definitely made me regret choosing Cloudflare as my sole DoH server.
Seems fitting that Cloudflare spoke so aggressively against Verizon[0][1] last week and then this incident happens to them. I will be interested to read the postmortem on this situation. I really like Cloudflare but you should be careful not to jinx yourself with blogs posts like that.
Any workarounds or solutions ? I'm an on-call engineer with lots of questions coming in. I'm not sure what I can do apart from moving the domain off Cloudflare, bug DNS propagation would take a few hours and by then Cloudflare might be up again.
Outages can always happen, when they do with companies like this, at least you'll know that some of the best people out there are working on the issue and that it will be resolved asap.
CloudFlare has proven in the past to be a very capable party, I don't think panicking now and try to move everything away is a smart move. Also, a few people have been saying that even if you want to, the site to do so is not reachable, so that would be a challenge as well.
My suggestion is wait. I wouldn't even consider flipping my sites over to another DNS unless the outage begins lasting over four hours or so. A lot of top sites use Cloudflare and this sort of outage is extremely rare for them (I can't remember a time when Cloudflare's own site and dashboard were taken offline).
Then again, it depends on the priority of your site. But there are tons of top sites on Cloudflare and I bet a lot of those places don't have plans for emergency switching over to another DNS provider / CDN on short notice as it's often a fairly disruptive change, especially now that more frontend logic for a site is implemented alongside the CDN/LB.
Holy crap, cloudflare down, and seemingly all the covers. Major sites such as Digitalocean are all down, and no way to easily disable cloudflare since their site is down.
Any CDN is a single point of failure and limits your availability to as low as three nines. Although anycast-based CDNs like Cloudflare are much less reliable than DNS-based CDNs, those can do orders of magnitude better.
They rely on a single network infrastructure as opposed to many independent networks with independent edge nodes where isolating faults is rather trivial in comparison.
A possible downstream effect of this: Pingdom appears to be alerting VERY late, at least for us. I'm guessing with 8% of the web affected, their alerting systems aren't prepared for this many simultaneous alerts.
This is frustrating - can't access CF to turn off CF to make sites accessible. There should be an emergency admin/dash access to turn off protection for cases like this.
Also getting this in the UK. Not completely down, bits and pieces of medium comes through but very slowly and incomplete. I’m also unable to access npm.
It’s better that this happens now than later. I am confident CF will put protections in place to prevent this from happening again, but also put a switch in place to provide an instant fix the next time something like this happens.
It does suck to have a service down for a bit, but what CF offers, at the price point is pretty incredible.
Good luck to CF, and I wish you the best with coming up with a robust future-proof solution.
>We are working to mitigate impact to Internet users in this region.
>This incident affects: North America (Ashburn, VA, United States - (IAD), Atlanta, GA, United States - (ATL), Boston, MA, United States - (BOS), Buffalo, NY, United States - (BUF), Calgary, AB, Canada - (YYC), Charlotte, NC, United States - (CLT), Chicago, IL, United States - (ORD), Columbus, OH, United States - (CMH), Dallas, TX, United States - (DFW), Denver, CO, United States - (DEN), Detroit, MI, United States - (DTW), Houston, TX, United States - (IAH), Indianapolis, IN, United States - (IND), Jacksonville, FL, United States - (JAX), Kansas City, MO, United States - (MCI), Las Vegas, NV, United States - (LAS), Los Angeles, CA, United States - (LAX), McAllen, TX, United States - (MFE), Memphis, TN, United States - (MEM), Miami, FL, United States - (MIA), Minneapolis, MN, United States - (MSP), Montgomery, AL, United States - (MGM), Montréal, QC, Canada - (YUL), Nashville, TN, United States - (BNA), Newark, NJ, United States - (EWR), Norfolk, VA, United States - (ORF), Omaha, NE, United States - (OMA), Phoenix, AZ, United States - (PHX), Pittsburgh, PA, United States - (PIT), Portland, OR, United States - (PDX), Queretaro, MX, Mexico - (QRO), Richmond, Virginia - (RIC), Sacramento, CA, United States - (SMF), Salt Lake City, UT, United States - (SLC), San Diego, CA, United States - (SAN), San Jose, CA, United States - (SJC), Saskatoon, SK, Canada - (YXE), Seattle, WA, United States - (SEA), St. Louis, MO, United States - (STL), Tampa, FL, United States - (TPA), Toronto, ON, Canada - (YYZ), Vancouver, BC, Canada - (YVR), Tallahassee, FL, United States - (TLH), Winnipeg, MB, Canada - (YWG)), Middle East (Amman, Jordan - (AMM), Baghdad, Iraq - (BGW), Baku, Azerbaijan - (GYD), Beirut, Lebanon - (BEY), Doha, Qatar - (DOH), Dubai, United Arab Emirates - (DXB), Kuwait City, Kuwait - (KWI), Manama, Bahrain - (BAH), Muscat, Oman - (MCT), Ramallah - (ZDM), Riyadh, Saudi Arabia - (RUH), Tel Aviv, Israel - (TLV)), Asia (Bangkok, Thailand - (BKK), Cebu, Philippines - (CEB), Chengdu, China - (CTU), Chennai, India - (MAA), Colombo, Sri Lanka - (CMB), Dongguan, China - (SZX), Foshan, China - (FUO), Fuzhou, China - (FOC), Guangzhou, China - (CAN), Hangzhou, China - (HGH), Hanoi, Vietnam - (HAN), Hengyang, China - (HNY), Ho Chi Minh City, Vietnam - (SGN), Hong Kong - (HKG), Hyderabad, India - (HYD), Islamabad, Pakistan - (ISB), Jinan, China - (TNA), Karachi, Pakistan - (KHI), Kathmandu, Nepal - (KTM), Kuala Lumpur, Malaysia - (KUL), Lahore, Pakistan - (LHE), Langfang, China - (NAY), Luoyang, China - (LYA), Macau - (MFM), Manila, Philippines - (MNL), Mumbai, India - (BOM), Nanning, China - (NNG), New Delhi, India - (DEL), Osaka, Japan - (KIX), Phnom Penh, Cambodia - (PNH), Qingdao, China - (TAO), Seoul, South Korea - (ICN), Shanghai, China - (SHA), Shenyang, China - (SHE), Shijiazhuang, China - (SJW), Singapore, Singapore - (SIN), Suzhou, China - (SZV), Taipei - (TPE), Tianjin, China - (TSN), Tokyo, Japan - (NRT), Ulaanbaatar, Mongolia - (ULN), Wuhan, China - (WUH), Wuxi, China - (WUX), Xi'an, China - (XIY), Yerevan, Armenia - (EVN), Zhengzhou, China - (CGO), Zuzhou, China - (CSX)), Africa (Cairo, Egypt - (CAI), Casablanca, Morocco - (CMN), Cape Town, South Africa - (CPT), Dar Es Salaam, Tanzania - (DAR), Djibouti City, Djibouti - (JIB), Durban, South Africa - (DUR), Johannesburg, South Africa - (JNB), Lagos, Nigeria - (LOS), Luanda, Angola - (LAD), Maputo, MZ - (MPM), Mombasa, Kenya - (MBA), Port Louis, Mauritius - (MRU), Réunion, France - (RUN), Kigali, Rwanda - (KGL)), Latin America & the Caribbean (Asunción, Paraguay - (ASU), Bogotá, Colombia - (BOG), Buenos Aires, Argentina - (EZE), Curitiba, Brazil - (CWB), Fortaleza, Brazil - (FOR), Lima, Peru - (LIM), Medellín, Colombia - (MDE), Mexico City, Mexico - (MEX), Panama City, Panama - (PTY), Porto Alegre, Brazil - (POA), Quito, Ecuador - (UIO), Rio de Janeiro, Brazil - (GIG), São Paulo, Brazil - (GRU), Santiago, Chile - (SCL), Willemstad, Curaçao - (CUR)), Oceania (Auckland, New Zealand - (AKL), Brisbane, QLD, Australia - (BNE), Melbourne, VIC, Australia - (MEL), Perth, WA, Australia - (PER), Sydney, NSW, Australia - (SYD)), and Europe (Amsterdam, Netherlands - (AMS), Athens, Greece - (ATH), Barcelona, Spain - (BCN), Belgrade, Serbia - (BEG), Berlin, Germany - (TXL), Brussels, Belgium - (BRU), Bucharest, Romania - (OTP), Budapest, Hungary - (BUD), Chișinău, Moldova - (KIV), Copenhagen, Denmark - (CPH), Dublin, Ireland - (DUB), Düsseldorf, Germany - (DUS), Edinburgh, United Kingdom - (EDI), Frankfurt, Germany - (FRA), Geneva, Switzerland - (GVA), Gothenburg, Sweden - (GOT), Hamburg, Germany - (HAM), Helsinki, Finland - (HEL), Istanbul, Turkey - (IST), Kyiv, Ukraine - (KBP), Lisbon, Portugal - (LIS), London, United Kingdom - (LHR), Luxembourg City, Luxembourg - (LUX), Madrid, Spain - (MAD), Manchester, United Kingdom - (MAN), Marseille, France - (MRS), Milan, Italy - (MXP), Moscow, Russia - (DME), Munich, Germany - (MUC), Nicosia, Cyprus - (LCA), Oslo, Norway - (OSL), Paris, France - (CDG), Prague, Czech Republic - (PRG), Reykjavík, Iceland - (KEF), Riga, Latvia - (RIX), Rome, Italy - (FCO), Saint Petersburg, Russia - (LED), Sofia, Bulgaria - (SOF), Stockholm, Sweden - (ARN), Tallinn, Estonia - (TLL), Thessaloniki, Greece - (SKG), Vienna, Austria - (VIE), Vilnius, Lithuania - (VNO), Warsaw, Poland - (WAW), Zagreb, Croatia - (ZAG), Zürich, Switzerland - (ZRH)).
Wonder how can we use CloudFlare and have a fallback plan in place for situations like this. What would be a good architecture for this?
So far I've read that would be good to have the registrar out of CloudFlare and use them as CDN only.
What else?
"Unfortunately, one of these rules contained a regular expression that caused CPU to spike to 100% on our machines worldwide. This 100% CPU spike caused the 502 errors that our customers saw."
What's ironic is that my experience of Cloudflare Sales is that they take advantage of any downtime from their competitor to try and get people to migrate to their services...
It's very convenient for me to be able to tell me clients their site is down because half the internet is down, and there is nothing I can do about it.
This is just my personal guess, but it's likely China flexing again after the protest escalation yesterday in Hong Kong.
A few weeks ago Telegram was attacked by China [0], and Hong Kong protesters used Telegram to communicate.
This time when CloudFlare was down, the most popular local forum among protesters, lihkg.com, was brought down as well.
https://i.imgur.com/qHBM2JW.png