I share falcolas's take on it. On other end, Cloudfare says "Pay us $200-5,000 (avg) a month, we'll handle the details, and don't worry about a data bill." AWS method sounds like a step backwards in cloud DDOS protection. Or a step forward in their next annual report. Whichever. ;)
I'm still a believer in the value of dial-up, leased lines, satellite, or radio for aiding security. You still have to apply protection to them but don't have whole Internet coming after you with protocols that aid attackers more than defenders. My method is typically to obfuscate identifiers for Internet services and use methods like authentication at packet level (eg port-knocking or VPN). The configuration details are sent over the non-Internet medium. Even dial-up can move basic credentials and some I.P. addresses quickly. Don't need to do it often, either. If you hide it (eg SILENTKNOCK), attackers start getting pretty pissed and desperate wondering why not a single packet gets through.
This method is primarily for intranet sites, though. Web sites or apps facing the public naturally are at high risk. Best to just use Cloudfare or a similar service along with hiring good security folks.
Having been on the wrong side of this equation, I can say this is exactly what they expect you to do and it's a fool's errand. I've a client on AWS faced with routine DDoS-scale spikes of traffic at certain points in the year.
The client's app almost always goes down in the midst of the fury. The point of failure typically comes down to amazon's load balancers or an auto-scaling failure. In the end our friends at amazon tell us to add more, bigger servers to 'outscale' the traffic and put the blame on us when everything blows up. sigh
Yeah, the "be ready to scale to absorb the impact" seems to be in their best interest. Maybe not their network ops teams, but their billing teams must like it.
So, basically, pull out the pocketbook and we'll hook up our vacuum to it.
Attempting to outscale a DDOS (the primary mitigation method presented by Amazon) is going to DDOS your bank account. Personally, I'd rather see some more recommendations along the lines of the "VPC can minimize potential attack surfaces".
I'm dubious about the number of people who actually read this- trying to outspend a bunch of distributed attackers isn't 1) a novel solution that is AWS specific and 2)not an effective or cheap strategy.
If you use BPF, you should be able to filter out the bad traffic effectively. Although, if you're getting flooded with an absurd amount of traffic (faster than the network card can consume), then there's nothing to do.
Another thing just occurred to me: Amazon might just be guilty of recommending to others what worked for them without thinking twice about context, alternatives, etc. Long ago, Anonymous tried a massive DDOS attack on all kinds of sites from Mastercard to Amazon. Of them all, Amazon didn't take a scratch [1]. This was due to their then-new EC2 architecture for handling spikes and a ridiculous amount of spare capacity saved for holidays. Article has the details.
So, maybe it's what's worked for them, their thinking hasn't really changed, and now they're just offering others the same thing? And upselling them in the process? Thoughts?
Quick question: Would this include global load-balancing across different regions? We use cloudflare for a few different services but balancing across regions requires a complicated setup with yet another DNS/traffic layer in between, would be great if this is something CF could offer (or maybe it does at the Enterprise level)?
The simplest option I'm aware of is to point CloudFlare to your load balancer, and have it handle removal of failed backend nodes.
The only big downside is that on AWS you can't have an elastic IP associated with an elastic load balancer, so you either have to run your own HA haproxy/nginx/whatever cluster in EC2 in order to have a single IP to point CloudFlare to.
If you can live with a subdomain you can point that cname to an ELB.
Alternatively, CloudFlare's API is pretty reasonable, so you could home-brew health checks that de-register dead nodes from CloudFlare. Even a simple nagios check handler could do that.
CloudFlare has CNAME flattening so you can still have the apex point to a CNAME and CF will automatically keep up to date with the correct IP using the TTLs and broadcast an A record correct to RFC standards.
Do you know if CloudFlare's apex CNAME support works coupled to Route53's health-check-based RRDNS? I know that AWS's own DNS reflects the health-check-based changes to the round-robin pools instantaneously, but I have no idea what sort of TTLs they emit.
Haven't used Route53 recently but those TTLs should be configurable by you. Obviously the lower the better to keep propagation time low for changes. CloudFlare has more info in the blog post [1] that says they do respect the TTLs that are given for records and then cache the value until expiration so I'm assuming they also send along the same TTL value of the original lookup.
There's no way to ensure the rest of the internet will handle it correctly though with all the proxies and DNS caches in the middle and low TTLs can also add latency to end-users who might have to constantly do a DNS lookup on new connections.
If you're using CloudFlare's full service (instead of just DNS), then it'll be seamless because their IPs don't change.
I'm really surprised that AWS doesn't offer a DDoS mitigation service. They have the capacity in terms of compute and network (N/S and E/W). Why not offer a filtering product that uses BGP offramping? Not saying they need to run out an buy a bunch of Arbor gear, but I'd bet they could write their own rudimentary filter product given their resources.
Edit: Considering that Jeff from BlackLotus is now PM of DDoS at AWS, I'm sure they are working on something.
This seems to ignore the fact that they will null route traffic at levels high enough to cause degradation for other customers on the same physical equipment. How is this even a solution in a world where a 20Gbps DDOS attack for a few hundred dollars? Real DDOS solutions are still big money only because AWS does not invest to solve this problem on the core network. It's becoming common practice to a hire a 3rd party to direct the traffic off their network (cloud flare, akamai, etc) and do the filtering there.
I'd bet that 'one client' getting the ~200gbps DDoS has a massive monthly bill (likely hundreds/thousands of servers) so they're happy to mitigate short term DDoS.
Most dedicated server providers won't go that far if you have a handful of servers.
I know from personal experience that, Digital Ocean, the largest VPS provider null routes your VPS IP for 3 hours minimium for even the tiniest of DDoS's.
I doubt most of the smaller VPS providers can afford to absorb DDoS's even if they don't have overly restrictive policies like DO.
Best way to defend against an L7 DDoS is to have the origin hidden, and to cache everything at a large number of geographically distributed PoPs.
This helps in 99% of cases, and where it doesn't it is simply because there is a resource that cannot be cached and that the edge must revisit the origin for. This is especially true whenever that resource is expensive for the origin to provide (involves database lookups and cannot be cached: shopping carts, login pages, search results), these are the ones which require you to rethink your application design.
If you're an application developer and wondering how to design your application to withstand a DDoS attack, then instead shift to just thinking: How can I make everything that this application does be cached by an edge server?
When you're not under attack using CloudFlare makes sense and saves you money anyway. At least... it does for me. On one of my web applications I use Amazon S3 for user attachment storage within a forum CMS, and my bill used to be upwards of $200 per month for just one of the sites I run. I changed the application so that it proxies the S3 request/response, and then set a CloudFlare Page Rule to sit in front of that path, and configured it to "Cache Everything". The effect of this was to reduce my AWS S3 bill down to $20 per month. After that I did it for every site.
There's a hell of a lot of benefit to using CloudFlare in conjunction with AWS, and not just when you're facing an L7 DDoS.
Disclosure: I work for CloudFlare (last 9 months) and have been a CloudFlare customer for 3 years and I was offered a job by AWS and also been an AWS customer for 3 years.
Any company using the massive amount of bandwidth you are thinking of is probably not getting all the features required for their business on the $20/month plan. Hell, you can't even obtain access logs without the enterprise pricing. Your limit won't come from CloudFlare restricting you, a lack of basic necessities from the product will have you crawling to pay whatever enterprise amount they want.
Rule #1 to surviving a DDoS, keep your DNS TTL's low on those A records. Not super low, but low. 10 minutes (600 seconds) is enough. There's no point in a TTL larger than that if you're on AWS and need to start mitigating an attack. Even if you sign up with a provider like Cloudflare or F5 Silverline, you'll need to send them the traffic. If you're on AWS, you can't have them announce your routes, so the only way around it is to change DNS. If your DNS TTL is at a day, no one who has the lookup cached is going to reach your new "clean pipe".
I can't imagine trying to survive a volumetric attack in AWS. Must be a nightmare. Luckily volumetric attacks are on the out and layer 7 attacks are all the rage these days. They're easier to handle in AWS with a WAF or filter.
tl;dr keep the TTLs on your DNS A records to a maximum of 10 minutes.
I think this could be better summed up as a guide to maintaining uninterrupted service during a DDoS. Which is only something you do in the first place if you run the numbers and find that being down would cost you more in lost revenue than staying up would cost you in temporary overprovisioning. It's another of those "a solution to a problem that only really exists at Amazon/Google/Facebook scale" whitepapers, except in this case the relevant "scale" is economic, not technical.
One thing missing from the quick is operational guidance on how to make this all work smoothly. A big key to using an "all AWS" model (which as most have pointed out isn't the only or necessarily best option) is that scaling isn't instant.
You need to add some additional logic to smooth out the rate of scaling. Most deployments fall down when the rate of scaling can't keep up with the demand.
In general the white paper provides some solid AWS-specific & AWS-centric guidance on how to buy yourself some time. It's not the end-all, be-all but a good start
AWS really needs to step up and provide DDOS mitigation if they want to be a challenger to CloudFlare and other DDOS-protected VPS services. While some companies can certainly just scale up to 1,000 instances to "out-compute" a DDOS, that is not feasible for a majority of AWS users. Although, it's unlikely they will provide such a service, since doing so would mean fewer AWS resources and hence, less money for them.
This would be more convincing if it at least mentioned CloudFlare for the purpose of telling some lie about how AWS's similarly-named service is just as good.
Even if AWS CloudFront isn't as good, I wouldn't expect AWS to go around actively encouraging people not to use CloudFront and and to go to a competitor instead.
I'm still a believer in the value of dial-up, leased lines, satellite, or radio for aiding security. You still have to apply protection to them but don't have whole Internet coming after you with protocols that aid attackers more than defenders. My method is typically to obfuscate identifiers for Internet services and use methods like authentication at packet level (eg port-knocking or VPN). The configuration details are sent over the non-Internet medium. Even dial-up can move basic credentials and some I.P. addresses quickly. Don't need to do it often, either. If you hide it (eg SILENTKNOCK), attackers start getting pretty pissed and desperate wondering why not a single packet gets through.
This method is primarily for intranet sites, though. Web sites or apps facing the public naturally are at high risk. Best to just use Cloudfare or a similar service along with hiring good security folks.