AWS Best Practices for DDoS Resiliency [pdf]

nickpsecurity · on July 4, 2015

I share falcolas's take on it. On other end, Cloudfare says "Pay us $200-5,000 (avg) a month, we'll handle the details, and don't worry about a data bill." AWS method sounds like a step backwards in cloud DDOS protection. Or a step forward in their next annual report. Whichever. ;)

I'm still a believer in the value of dial-up, leased lines, satellite, or radio for aiding security. You still have to apply protection to them but don't have whole Internet coming after you with protocols that aid attackers more than defenders. My method is typically to obfuscate identifiers for Internet services and use methods like authentication at packet level (eg port-knocking or VPN). The configuration details are sent over the non-Internet medium. Even dial-up can move basic credentials and some I.P. addresses quickly. Don't need to do it often, either. If you hide it (eg SILENTKNOCK), attackers start getting pretty pissed and desperate wondering why not a single packet gets through.

This method is primarily for intranet sites, though. Web sites or apps facing the public naturally are at high risk. Best to just use Cloudfare or a similar service along with hiring good security folks.

DannyBee · on July 4, 2015

This whole thing reads like a "please buy everything we make" guide, not a guide to DDOS resiliency.

Trying to outscale a large DDOS doesn't often work. Don't worry though, amazon's happy to help let you try to pay for it!

ultratoast · on July 4, 2015

Having been on the wrong side of this equation, I can say this is exactly what they expect you to do and it's a fool's errand. I've a client on AWS faced with routine DDoS-scale spikes of traffic at certain points in the year.

The client's app almost always goes down in the midst of the fury. The point of failure typically comes down to amazon's load balancers or an auto-scaling failure. In the end our friends at amazon tell us to add more, bigger servers to 'outscale' the traffic and put the blame on us when everything blows up. sigh

cordite · on July 4, 2015

Yeah, the "be ready to scale to absorb the impact" seems to be in their best interest. Maybe not their network ops teams, but their billing teams must like it.

falcolas · on July 4, 2015

So, basically, pull out the pocketbook and we'll hook up our vacuum to it.

Attempting to outscale a DDOS (the primary mitigation method presented by Amazon) is going to DDOS your bank account. Personally, I'd rather see some more recommendations along the lines of the "VPC can minimize potential attack surfaces".

airza · on July 4, 2015

I'm dubious about the number of people who actually read this- trying to outspend a bunch of distributed attackers isn't 1) a novel solution that is AWS specific and 2)not an effective or cheap strategy.

ra1n85 · on July 4, 2015

Sounds like there could be awesome features here.

Remotely triggered black holes for VPC? Elastic Firewall?

Not crazy about firewalls in general, but they would help in the case that you are paying for data-out.

toomuchtodo · on July 4, 2015

Firewalls are useless in a DDOS attack.

shitlord · on July 5, 2015

If you use BPF, you should be able to filter out the bad traffic effectively. Although, if you're getting flooded with an absurd amount of traffic (faster than the network card can consume), then there's nothing to do.

nickpsecurity · on July 4, 2015

Another thing just occurred to me: Amazon might just be guilty of recommending to others what worked for them without thinking twice about context, alternatives, etc. Long ago, Anonymous tried a massive DDOS attack on all kinds of sites from Mastercard to Amazon. Of them all, Amazon didn't take a scratch [1]. This was due to their then-new EC2 architecture for handling spikes and a ridiculous amount of spare capacity saved for holidays. Article has the details.

So, maybe it's what's worked for them, their thinking hasn't really changed, and now they're just offering others the same thing? And upselling them in the process? Thoughts?

[1] http://money.cnn.com/2010/12/09/technology/amazon_wikileaks_...

pavel_lishin · on July 4, 2015

Well, the guide seems to be explicitly for AWS, so that would make sense.

chdir · on July 4, 2015

Kind of agree with all those who say that Cloudflare is still a better option. But how do you tackle their lack of automatic failover ?

https://support.cloudflare.com/hc/en-us/articles/200168916-C...

"the system currently does not have the functionality to automatically select the next available server if one of the servers in the group goes down"

jgrahamc · on July 4, 2015

We're fixing that.

manigandham · on July 4, 2015

Quick question: Would this include global load-balancing across different regions? We use cloudflare for a few different services but balancing across regions requires a complicated setup with yet another DNS/traffic layer in between, would be great if this is something CF could offer (or maybe it does at the Enterprise level)?

tpg · on July 4, 2015

The simplest option I'm aware of is to point CloudFlare to your load balancer, and have it handle removal of failed backend nodes.

The only big downside is that on AWS you can't have an elastic IP associated with an elastic load balancer, so you either have to run your own HA haproxy/nginx/whatever cluster in EC2 in order to have a single IP to point CloudFlare to.

If you can live with a subdomain you can point that cname to an ELB.

Alternatively, CloudFlare's API is pretty reasonable, so you could home-brew health checks that de-register dead nodes from CloudFlare. Even a simple nagios check handler could do that.

manigandham · on July 4, 2015

CloudFlare has CNAME flattening so you can still have the apex point to a CNAME and CF will automatically keep up to date with the correct IP using the TTLs and broadcast an A record correct to RFC standards.

https://support.cloudflare.com/hc/en-us/articles/200169056-C...

derefr · on July 4, 2015

Do you know if CloudFlare's apex CNAME support works coupled to Route53's health-check-based RRDNS? I know that AWS's own DNS reflects the health-check-based changes to the round-robin pools instantaneously, but I have no idea what sort of TTLs they emit.

manigandham · on July 5, 2015

Haven't used Route53 recently but those TTLs should be configurable by you. Obviously the lower the better to keep propagation time low for changes. CloudFlare has more info in the blog post [1] that says they do respect the TTLs that are given for records and then cache the value until expiration so I'm assuming they also send along the same TTL value of the original lookup.

There's no way to ensure the rest of the internet will handle it correctly though with all the proxies and DNS caches in the middle and low TTLs can also add latency to end-users who might have to constantly do a DNS lookup on new connections.

If you're using CloudFlare's full service (instead of just DNS), then it'll be seamless because their IPs don't change.

[1] https://blog.cloudflare.com/introducing-cname-flattening-rfc...

nullrouted · on July 5, 2015

Use runbook.io and load balancers.....done.

scurvy · on July 4, 2015

I'm really surprised that AWS doesn't offer a DDoS mitigation service. They have the capacity in terms of compute and network (N/S and E/W). Why not offer a filtering product that uses BGP offramping? Not saying they need to run out an buy a bunch of Arbor gear, but I'd bet they could write their own rudimentary filter product given their resources.

Edit: Considering that Jeff from BlackLotus is now PM of DDoS at AWS, I'm sure they are working on something.

aschwabe · on July 4, 2015

This seems to ignore the fact that they will null route traffic at levels high enough to cause degradation for other customers on the same physical equipment. How is this even a solution in a world where a 20Gbps DDOS attack for a few hundred dollars? Real DDOS solutions are still big money only because AWS does not invest to solve this problem on the core network. It's becoming common practice to a hire a 3rd party to direct the traffic off their network (cloud flare, akamai, etc) and do the filtering there.

_hyn3 · on July 4, 2015

HTML of the article: https://webcache.googleusercontent.com/search?q=cache:http%3...

ju-st · on July 4, 2015

AWS' competition like OVH and many quality VPS providers offer _free_ (or very cheap) DDOS protection.

aschwabe · on July 4, 2015

If the attack is over 10Gbps good luck with that... This stuff is best effort.

Thaxll · on July 4, 2015

I don't think so, 2years ago they mitigated 180gbps on one client.

https://twitter.com/olesovhcom/status/386563685805617152/pho...

https://www.ovh.com/ca/en/anti-ddos/ddos-attack-management.x...

Also everyone here talk about L7, cloudflare ect.. but a lot of application are pure TCP/UDP based so you can't cache anything.

nacs · on July 4, 2015

I'd bet that 'one client' getting the ~200gbps DDoS has a massive monthly bill (likely hundreds/thousands of servers) so they're happy to mitigate short term DDoS.

Most dedicated server providers won't go that far if you have a handful of servers.

nacs · on July 4, 2015

Which VPS providers are you referring to?

I know from personal experience that, Digital Ocean, the largest VPS provider null routes your VPS IP for 3 hours minimium for even the tiniest of DDoS's.

I doubt most of the smaller VPS providers can afford to absorb DDoS's even if they don't have overly restrictive policies like DO.

ju-st · on July 6, 2015

I was thinking about these providers:

buyvm.net for 3$ per month (100Gbit apparently)

iwstack.com, 8Gbit protection for free

ramnode.com, 20Gbit

jakozaur · on July 4, 2015

Alternative guide, use CloudFlare and hide origin address.

Most of AWS advices (like autoscaling) will help only a bit, but can cost a lot (lots of ec2 machines serving bogus requests).

buro9 · on July 4, 2015

Best way to defend against an L7 DDoS is to have the origin hidden, and to cache everything at a large number of geographically distributed PoPs.

This helps in 99% of cases, and where it doesn't it is simply because there is a resource that cannot be cached and that the edge must revisit the origin for. This is especially true whenever that resource is expensive for the origin to provide (involves database lookups and cannot be cached: shopping carts, login pages, search results), these are the ones which require you to rethink your application design.

If you're an application developer and wondering how to design your application to withstand a DDoS attack, then instead shift to just thinking: How can I make everything that this application does be cached by an edge server?

When you're not under attack using CloudFlare makes sense and saves you money anyway. At least... it does for me. On one of my web applications I use Amazon S3 for user attachment storage within a forum CMS, and my bill used to be upwards of $200 per month for just one of the sites I run. I changed the application so that it proxies the S3 request/response, and then set a CloudFlare Page Rule to sit in front of that path, and configured it to "Cache Everything". The effect of this was to reduce my AWS S3 bill down to $20 per month. After that I did it for every site.

There's a hell of a lot of benefit to using CloudFlare in conjunction with AWS, and not just when you're facing an L7 DDoS.

Disclosure: I work for CloudFlare (last 9 months) and have been a CloudFlare customer for 3 years and I was offered a job by AWS and also been an AWS customer for 3 years.

robbles · on July 4, 2015

What's the limit at which CloudFlare will start billing you at a "enterprise rate" instead of $20 / month? That bandwidth can't be free forever...

developer1 · on July 4, 2015

Any company using the massive amount of bandwidth you are thinking of is probably not getting all the features required for their business on the $20/month plan. Hell, you can't even obtain access logs without the enterprise pricing. Your limit won't come from CloudFlare restricting you, a lack of basic necessities from the product will have you crawling to pay whatever enterprise amount they want.

scurvy · on July 4, 2015

Rule #1 to surviving a DDoS, keep your DNS TTL's low on those A records. Not super low, but low. 10 minutes (600 seconds) is enough. There's no point in a TTL larger than that if you're on AWS and need to start mitigating an attack. Even if you sign up with a provider like Cloudflare or F5 Silverline, you'll need to send them the traffic. If you're on AWS, you can't have them announce your routes, so the only way around it is to change DNS. If your DNS TTL is at a day, no one who has the lookup cached is going to reach your new "clean pipe".

I can't imagine trying to survive a volumetric attack in AWS. Must be a nightmare. Luckily volumetric attacks are on the out and layer 7 attacks are all the rage these days. They're easier to handle in AWS with a WAF or filter.

tl;dr keep the TTLs on your DNS A records to a maximum of 10 minutes.

wildchild · on July 4, 2015

Just offer DDoS mitigation using specialized hardware and don't force people to buy N+1 instances.

kyledrake · on July 4, 2015

The AWS best practice for DDoS, TLDR: Use anything except AWS unless you like going bankrupt in a day.

derefr · on July 5, 2015

I think this could be better summed up as a guide to maintaining uninterrupted service during a DDoS. Which is only something you do in the first place if you run the numbers and find that being down would cost you more in lost revenue than staying up would cost you in temporary overprovisioning. It's another of those "a solution to a problem that only really exists at Amazon/Google/Facebook scale" whitepapers, except in this case the relevant "scale" is economic, not technical.

mandeepj · on July 4, 2015

> Be Ready to Scale and Absorb the Attack

What if you can't absorb the cost that is attached with scaling ?

marknca · on July 4, 2015

One thing missing from the quick is operational guidance on how to make this all work smoothly. A big key to using an "all AWS" model (which as most have pointed out isn't the only or necessarily best option) is that scaling isn't instant.

You need to add some additional logic to smooth out the rate of scaling. Most deployments fall down when the rate of scaling can't keep up with the demand.

In general the white paper provides some solid AWS-specific & AWS-centric guidance on how to buy yourself some time. It's not the end-all, be-all but a good start

cddotdotslash · on July 4, 2015

AWS really needs to step up and provide DDOS mitigation if they want to be a challenger to CloudFlare and other DDOS-protected VPS services. While some companies can certainly just scale up to 1,000 instances to "out-compute" a DDOS, that is not feasible for a majority of AWS users. Although, it's unlikely they will provide such a service, since doing so would mean fewer AWS resources and hence, less money for them.

grhmc · on July 4, 2015

> Scaling buys you time to analyze the DDoS attack and respond with countermeasures.

They're not saying "scale up and just pay for it", they're saying use autoscaling as a tool to give you time to respond, Without first going down.

jessaustin · on July 4, 2015

This would be more convincing if it at least mentioned CloudFlare for the purpose of telling some lie about how AWS's similarly-named service is just as good.

wongarsu · on July 4, 2015

Even if AWS CloudFront isn't as good, I wouldn't expect AWS to go around actively encouraging people not to use CloudFront and and to go to a competitor instead.