Hacker News new | past | comments | ask | show | jobs | submit login
The DDoS that almost broke the Internet (cloudflare.com)
830 points by jgrahamc on March 27, 2013 | hide | past | favorite | 171 comments



In one of the earlier attacks I discovered I was running an open resolver on my home network. I fixed it, and now just get a bunch of 'recursive lookup denied' messages in my logs.

But the key here is the source. And this from the article:

"The attackers were able to generate more than 300Gbps of traffic likely with a network of their own that only had access 1/100th of that amount of traffic themselves."

And this is key, so we could hunt them at their source if there was a way of deducing their launch points (it may be a botnet but it may also just be some random server farm)

I've got log records of the form:

   Mar 27 09:19:21 www named[295]: denied recursion for query from [212.199.180.105].61604 for incap-dns-server.anycast-any2.incapsula.us IN
Which suggests that 212.199.180.105 is somehow being used, and according my latest GeoIP database that is an IP address in Tel Aviv.

   $VAR1 = {
          'longitude' => '34.7667',
          'city' => 'Tel Aviv',
          'latitude' => '32.0667',
          'country_code' => 'IL',
          'region' => '05',
          'isp_org' => 'Golden Lines Cable'
        };
So can we create a service where the recurse requests send the IP trying to do the recursion to a service which then inverts the botnet/privatenet? Everytime this level of co-ordination is undertaken it potentially shines a bright light on the part of the Internet that is compromised/bad.


DNS requests are UDP, so you can just spoof the source address (actually that's what makes the attack work; you spoof the target as the source so replies 10x bigger than the query go to the target). Nobody can really know where they came from except at border routers of the source.


Does this mean that proper egress filtering would solve this? Because that seems like a more feasible solution. Having a bunch of large ISPs and datacenter operators block outgoing packets with sources outside their own IP ranges sounds much easier than having many more home users reconfigure their routers.


Yes, egress filtering is the only solution.

I personally think that the focus on open recursive resolvers is misplaced. All authoritative nameservers have to be "open" to queries for the domains they serve. So instead of the amplifying by doing: $ host -t any random-site.com. ns1.example.org. you can just as well do: $ host -t any example.org. ns1.example.org. Even if all nameservers supported good per-IP throttling (far from the case), there are still enough valid nameservers on the internet to stage a decent amplification I think. So once all of the open resolvers are shut down, the DDoS pricks will just target more important infrastructure to accomplish the same goal.

It might be that we'll have to switch all authoritative DNS requests to be TCP-only but I can't imagine what a pain that transition will be.

The worse news is that egress filtering has been something we've clearly needed for 15+ years and it doesn't seem we've gotten very far. Part of the problem is that these amplification attacks usually don't cause much pain to the real source of the attack. Plus since it's very hard to tell where the true source is, they don't even get publicly shamed for it. It's so much easier to point a finger at the middle man in this case.

For egress filtering to be effective protection, it needs to cover nearly all of the network. As long as a botnet can get their hands on a decent amount of unfiltered bandwidth to amplify it's game on.

I'm not optimistic.


For egress filtering to be effective protection, it needs to cover nearly all of the network. As long as a botnet can get their hands on a decent amount of unfiltered bandwidth to amplify it's game on.

The question is can it be done without the ISP's co-operation. For example, I've got transit service from XO (a Tier-2 provider) and they could conceivably port-mirror the inbound side from my connection, do a source IP check on packets, (even statistical would be fine) and then raise a flag if anything came out smelly.

Since we're comparing a 32 bit number, and we can construct a logic gate which defines 'legal' values for that number, even a modest FPGA implementation could just sink (equivalent to /dev/null) all traffic and generate a pulse when the match failed. Its been a while since I was in a company building DSLAMs and edge gear but up to a 10Gbit pipe that isn't a killer problem.


Near the edge, it's just a matter of using the same silicon you have for routing and ask the question "would I have routed a packet to this claimed source IP down the pipe I'm receiving it from?" In cisco-speak this is "ip verify unicast source reachable-via rx"

Once you move further towards the core, the problem becomes exponentially more complex. Asymmetric routes are common, so it's not weird at all to get a packet handed to you from a different ISP than you would reply to it. Any filtering there would break a large percentage of valid traffic.

The problem is that there are a hell of a lot of "edges", you need nearly all of them to be fixed, and they don't have the motivation to do much about the problem.


If anycast is a valid use of IP, why isn't source spoofing at the edge? Spoofing can be used for load balancing too.


It's not a solution at all because it will never happen. You'd need every network on the internet to spend money and time to fix a problem that nobody is holding them responsible for and they have no monetary incentive to fix.

The solutions are clear. They happen to be doing the laziest of them which is close the open resolvers.


Open resolvers are like open WiFi, and open TOR nodes. They can all be misused, but I would still like to see them around in the future. A strict hierarchicy can be abused too, and it's not at all clear which is the lesser evil. I think you can solve a temporary problem created by an abuse of trust with another temporary use of trust... Ideally one could, if they were the target of a flood, control the flow towards them (egress filters) up the stream (the farther the better, which itself assumes a trust relationship) before it has a chance to concentrate.


Egress filtering is not a solution.

I've had attacks come from Asia and Russia and none of the hosts respond. I've also tried contacting their upstream providers.

The DNS system needs to be changed.


The DNS system needs to be changed.

It needs to change, but I highly doubt it will for at least a few decades. Too many things rely on it, and it would require way too many resources and a miserably long time to change everything to the new thing.


Egress filtering would happen within a few days if someone DDoSed every open resolver with other open resolvers. In fact, saturation of the outbound links would effectively be egress filtering.


Haha this would be kind of awesome. Obviously this would be bad news for the open resolvers' home networks, but could their upstreams handle it (i.e. maybe they would have to drop resolver traffic but not anything else)? If so maybe this Cloudflare should prepare such a response for the next time someone pulls this.


Yes, they do have to be open. Properly configured authoritative only name servers will allow full zone transfers (AXFR) only to their slaves.

The amplification part of the attack comes from the fact that they aks for zone transfers, and zone transfer replies are much larger than the requests.

If they were just doing normal DNS lookups for records you wouldn't have an amplification factor.


Home users reconfiguring routers would not really resolve the issue, but yes, egress filtering would be effective in theory. Unfortunately that's a maintenance nightmare and costly and there's no incentive to do it.

Besides closing down open resolvers, the real solution to this particular DDoS lies in the DNS protocol. The attack's practicality comes from RFC 2671. If that were amended to either return the maximum size of a packet to 512 bytes for UDP packets (but allow TCP packets to be any size), there would be no benefit at all to this attack. For existing resolvers this would require either DNS services being upgraded or firewall rules being put in place to limit UDP packets on port 53 to 512 bytes.

Shutting down open resolvers is about as much work, but unfortunately still leaves the attack possible because it's inherent to the protocol.


If this was part of a DNS Amplification DDoS, the source address is spoofed, and is in fact the target of the DDoS attack.


Fair point, so to make this successful you would have to do UDP packet inspection on egress from your network to localize to the connection into your network where someone is spoofing source IPs. Clearly you can't do that on peering points (whose to say it is or isn't legit) but certainly from the ISP's connections into the Tier2 that could highlight bad ISPs and if those ISPs don't play ball you can cut them off from network access.


Cloudflare always does an excellent job of optimizing their writeups for large, diverse audiences. The prose of this article reminds me of an equally accessible discussion of BGP from a few months ago [1].

[1] http://blog.cloudflare.com/why-google-went-offline-today-and...


Whether intentional or not, this is some of the best PR marketing copy I've ever seen. Large, diverse audiences who are afraid of their blog/app/site getting DDOSed can understand enough to know that Cloudflare has a credible solution to that which they fear.

I really can't figure out from their writing whether this is altruism accidentally leading to great PR or phenomenal PR leading to the appearance of altruism.


Good marketing is honestly communicating about a great product. Cloudflare has good marketing because they have a great product and their communications are honest. Marketing is only slimy when the product can't back up the boasts (which, unfortunately, is way too often).


MMM, I think that honestly communicating about a great product can be great marketing. I don't think that all great marketing is honest, in fact some of the best marketing makes people doubt their own intelligence, see the DeBeers campaign to make a diamond engagement ring an American necessity by equating it with a love that lasts forever (which is silly). Yet, I know for a fact I will buy an engagement ring when I propose.


Yeah, I would call that effective marketing. Incredibly effective, no doubt. Of course, if I'm making the definitions, I can make them say whatever I want. :)

But your point certainly holds: while there is a category of marketing that utilizes a great product and honest communication, it is not the only way to successfully market. In fact, it may not even be the most effective way to market (though I would argue it is the most effective way with a technical audience).


Why would you buy a diamond ring even knowing that? Personally I opted for a silver ring with an inlaid golden pattern, it was beautiful and unique and was very well received by my wife. My sister also received an non-traditional ring and loves it. Sure it can be a conversation starter but often that's a plus not a minus. All in all I find something beautiful and unique is a wonderful token of your love, if it comes from the heart I can't see any girl you'd want to marry rejecting it.


Exactly. Personally, I had to let my wife buy her own ring because I was skint. If it's the right person the ring thing is kind of just a tradition that can be observed as you wish.

In contrast, a friend of mine was ordered by his fiance that she would only accept a ring costing thousands of pounds. It didn't work out. Lucky escape in my opinion (especially as she stopped him drinking beer too).


Heh, certainly sounds like it. Ideally imho a wife should be a partner in all the trials and tribulations as well as happiness that life brings. Once she starts giving orders it seems more like she wants to be a boss, that was usually the point in the relationship I would try to gracefully bow out...


Their great product?

Is this their product [which they charge for and you are championing] the one that can mitigate DDoS attacks but failed to do so?

Cloudflare got caught out offering something they cannot guarantee and failed to deliver - they are now currently blaming the 'Internet' - very sad...sorry it is late but the 'Internet' did not slow down today or almost break [as the title of this post would suggest] because Cloudflare and Spamhaus had a bad day...


Did you read the story? It was more about how this wa a notable attack (because the traffic volume and how that we attained), not because it was particularly bad for CloudFare:

> This allowed us to mitigate the attack without it affecting Spamhaus or any of our other customers. The attackers ceased their attack against the Spamhaus website four hours after it started.


Did you read the story? It was more about how this wa a notable attack (because the traffic volume and how that we attained), not because it was particularly bad for CloudFare:

Yes I did - and still Cloudflare can claim to be able to defend against these 'attacks'?

If they were being 'honest' they would come out and say they cannot defend themselves against the 'Internet'

Edit: This allowed us to mitigate the attack without it affecting Spamhaus or any of our other customers

So did the Internet slow down or break? Is this news?


CloudFare customers are all multicasted and better to survive DDOS attacks. If you were the unlucky router to receive dozens of gigabits per second of fake traffic you probably would have been boned.


It's interesting that you're using the same exact talking points as Sven Olaf Kamphuis uses in his rt.com interview, mainly that CloudFlared failed to mitigate the DDOS, when in reality they successfully mitigated the biggest DDOS attack in known history.

"The attacks have already stopped because CloudFlare worked themselves into the middle of an attack and tried to turn it into a PR stunt for themselves which kind of like backfired because CloudFlare couldn’t handle the attack." http://rt.com/news/spamhaus-threat-cyberbunker-ddos-attack-9...

Suck to be CyberBunker though, they got spamhaused and couldn't even retaliate by launching the biggest DDOS attack ever. And they're still getting spamhaused.


If this is really the biggest DDOS attack in known history, its kind of a low bar. The amount of data used to initiate it (given that its an amplification attack) could easy be as low as 50-500 machines with 30-3Mbit/s connections. As described in a earlier article about ratters, (http://arstechnica.com/tech-policy/2013/03/rat-breeders-meet...) 500 machine is easy achieved daily by an infected torrent, and torrent users tend to have good connections.

Or they could be bought from a botnet for a handful amount of cash. 500 machines, even if it would mean 500 new machines each hour, is still pocket cash. Either the Internet is very weak and will break because of 500 machines, or this attack is not worthy a title of "The DDoS that almost broke the Internet".


If others didn't take direct action [on their own back] this could well have been the 'biggest DDoS attack in known [and unknown] history'. Cloudflare would have melted. Yes 300GBs is a lot of traffic, especially if it is all aimed at you and you do not have the capacity.

To say 300GBs can slow the Internet down and almost break it is laughable, sensationalist FUD and spun by Cloudflare.

Cloudflare have zero control over what comes out of ISPs networks, in the end it was down to the ISPs to mitigate these attacks.


I'd definitely say it's the latter. The non-PR version of the blog post is that they buy transit and peer at public exchanges out of many locations. Just like every other CDN.

I cannot find a single action mentioned in the entire post of something that CloudFlare actually did to help mitigate the attack. Basically, it seems that the Internet network community at large scrambled to deal with the problems, and CloudFlare is taking credit for it.


Basically, it seems that the Internet network community at large scrambled to deal with the problems, and CloudFlare is taking credit for it.

This is true - if the Internet network community didn't identify this issue and see how they could mitigate it Cloudflare would have been toast. And now this blog post is number 1 on Hacker News...


They mitigated the attack against the site itself, forcing the attackers to move up the chain and try and target networks that connect to cloud flare. They also worked with these exchanges to mitigate the attack at that level.

What more could they do?


> I really can't figure out from their writing whether this is altruism accidentally leading to great PR or phenomenal PR leading to the appearance of altruism.

I think it's a little of both, they also have great write ups when they screw up (rare, but it happens).


They only write a blog post, however, if they screw up so badly that the Internet takes notice; if they managed to get by because no one was certain what happened, or if it only affected some of their customers, and so the blame thereby largely got deflected to others, they are careful to remain quiet about the situation. For an example of such an incident, I had to spend a bunch of time debugging why CloudFlare's optimization service (which they apparently do not actually do tests for the code of) was causing WebKit browsers to entirely lock up late last year. http://www.saurik.com/id/14


It beats the blog posts by anyone else with a large network. By a mile. Google will say they had an incident, Facebook will say nothing, CloudFare Will show you a graph of their fail.


I take it you haven't seen the blog posts by Amazon? They are always very interesting, very detailed, and they write them even if the effects only hit some features of a single availability zone. I have written about Amazon's postmortems in the past, with detailed links to examples. http://news.ycombinator.com/item?id=4705110

They also, of course, don't write blog posts for every single incident (as I point out in that previous post), but they do acknowledge incidents on their status pages, and they don't seem to filter their postmortems based on "can I turn this into a PR event" as CloudFlare does; in the example situation I provided that CloudFlare carefully ignored, Amazon would have posted something, even though it only would not have let them come off sounding like heroes.

CloudFlare simply should not be getting happy hacker credit for these exaggerated tales of valor... this is just marketing copy, not a postmortem, and it isn't even clear that we should be believing any of the statements they have made. I mean, can you even imagine an Amazon postmortem requiring a rebuttle by gizmodo, as we see for this CloudFlare article? http://gizmodo.com/5992652


I think it's a virtuous cycle. Their selfish motives (attract top-notch employees, sell cloudflare to security-conscious techies) lead them to do impactful, altruistic things. Of course they are motivated by the PR, but what they do really helps and we are happy to reward them.


I wouldn't really call it PR, it seems to me they just genuinely enjoy writing about this stuff.


Yup. At first we didn't think anyone else would like our geeky ramblings. Always surprised the more technical detail we go into the more people are engaged.


More technical detail I think is always appreciated, it is a great learning experience reading those posts.


Yeah, a writeup where I can go do some research to understand is much better than one that glosses over the details with bad analogies and simplistic explanations.


It's definitely PR, also look at who submitted it to HN

That said it's pretty interesting if you ignore the bits that sound like hype.


Thank you for saying this.. Perhaps both? :) Then it's easy win.


I still found it a bit too high level for me, but on the other the topic doesn't interest me much.


Yes, my DNS server is listed on openresolvers.org. Here's why:

I've a smallish home network, 5 machines, one of them running handful of VMs, some devices (printer, scanner).

I wanted to have a local DNS server to name all these things, but mostly to learn about DNS and how to set up Bind.

So i installed bind on a Debian machine, set up a local domain, promptly named .fia.intra. As an added benefit, I now had a local DNS caching server too, and since my machines use this as their primary DNS server, it needs to be recursive and not just respond to queries for my internal fia.intra. network.

Now, all this is running on an internal 192.168.1.0 network, and bind is set up to only respond to queries from 192.168.1.0/24, and I'm behind an ADSL NAT gateway, so noone from outside should be able to query my internal DNS server.

I ignorantly assumed that the ADSL modem wasn't completely broken and having a moronic way of operating.

Now, I've not set a port forwarding rule in the modem that forwards port 53 to my internal DNS server, the only port forwarding rule I have is one for SSH.

However, I have this setting on the ADSL modem: http://i.imgur.com/dlL9LKV.png

The ADSL modem as shipped from the ISP acts as DHCP server on the LAN side, as most modems would do, and by default the DHCP server hands out a DNS server that is my ISPs DNS server. I changed that to my internal DNS server, 192.168.1.20.

In the image you will see the DHCP server isn't even enabled, I moved that to my same Debian machine and turned it off on the ADSL modem, but didn't erase the DNS settings.

As it turns out, because of that setting the ADSL modem listens on port 53 on the WAN interface(which has a routable IP address), and forwards/reverse-NATs queries to my DNS server at 192.168.1.20. I'd never guessed it to do that.

I did a "dig google.com @<my.public.ip>" from an EC2 instance I have, and indeed it responded nicely..

I've now changed the setting to read "Primary DNS Server= 0.0.0.0" and have verified I no longer respond to DNS queries from the WAN side.

Stuff sucks.


Worth pointing out: I'm presuming that the 300gbps of reported traffic† was not generated by DNSSEC resolvers, because DNSSEC isn't that widely deployed.

Which is bad, because DNSSEC dramatically increases the amplification effect you get from bouncing queries off open resolvers (the DNSSEC RRs are Big).

Adam Langley notes on Twitter that Cloudflare reports 3k DNS responses, apparently containing the zone contents of RIPE.NET; I guess these were EDNS0 UDP AXFR requests? That's worse than DNSSEC.

This has been one of Daniel Bernstein's big critiques of DNSSEC. It's not one of mine, but I'm still happy to see his argument validated.

(at a tier 1 Cloudflare doesn't have a business relationship with, which makes this kind of a "my cousin's best friend told me" number, but still)


How does this validate DJB's criticism? He's saying we shouldn't adopt DNSSEC because of traffic amplification. But we're already in an untenable situation with the amplification caused by bog standard DNS -- according to your quote, it's even worse than DNSSEC -- so we need to solve the problem anyway.


I am batting .000 on analysis for this situation and debated removing the comment and just replacing it with a link to DJB's talk. We don't have all the details but there's a notion that the attack already implicates DNSSEC because of RIPE's RRSIG records. I don't know, either way.


Correct. DNSSEC just makes amplification worse.

It's like: Do you want to be shot with 50 bullets or 100 bullets. DNSSEC is 100 bullets, regular DNS might be 50 bullets. Either way, you're going to die.

I don't like DNSSEC, but amplification isn't my argument against it. The fact that it doesn't provide encryption, puts keys in the wrong hands, and is bizarrely complex for reasons that don't fit with the model of DNS is why I don't like DNSSEC.


DNSSEC amplification is still pretty high. When I ran dnssecamp[1] last year, I got similar numbers to the example run cited (2000+ servers providing 30x amplification, scaling up to 95x amplification for the worst offenders).

[1] http://dnscurve.org/dnssecamp.html


I think what it means is that if we're going to deploy DNSSEC, we might as well give up this fruitless war on "open resolvers*" and focus our energies on other solutions.

Substitute 'routers' for 'resolvers' to see how this only plays into the hands of those who don't like the openness of the internet.


OK. I was present at djb's CCC talk and also heard Dan Kaminsky's rebuttal, hence my interest. A link to DNSSEC would be interesting if not hing else.


News is that the dutch hoster CyberBunker[1] is responsible[2] for the attack.

[1]: http://en.wikipedia.org/wiki/CyberBunker

[2]: http://translate.google.com/translate?sl=nl&tl=en&js...


I'd still keep an open mind -- it could just as easily been someone seeing the opportunity to deflect blame away from themselves.


Well, it's their own spokesman admitting it. Although they later rescinded it, probably after realising that there could very well be legal implications to it.


As a programmer with little knowledge of internet-scale networking, this was a very interesting read. Thanks !


So, given that there is already a list of open resolvers and the problem is that they can be used to DDoS a server - why doesn't someone just make them attack each other? From what I have read one could easily forge packages appearing to come from DNS A and send it to DNS B-Z. Rinse, repeat and take down the servers one by one.

Obviously this is probably illegal, but there would definitely be a beautiful irony to it. :)


These open resolvers are largely run by ISPs and resolve addresses for their customers. Taking them down would cripple the Internet access of their peers.

They do not need to be taken down; they need to be reconfigured. An open DNS resolver is (arguably) misconfigured, not malicious.


By being made public they can be used maliciously by third parties. If such ISPs are slow to respond and reconfigure, they are a liability to the whole internet and the fault is theirs. A suggestion such as GPs, while probably illegal, sounds like an interesting way to hasten them a bit. Better to have their customers on their heels urging them to get a grip, than the whole internet in jeopardy.


This is why I pay CloudFlare each month. They repeatedly publicly show that they know exactly what they're doing - and they do it without any sense of smugness.


I'm a bit confused about the 'open resolvers' bit. I searched for the static IP range assigned by my ISP, and a number of results came up:

http://openresolverproject.org/search.cgi?mode=search4&s...

This range has a description of "Static IP Pool for xDSL End Users", so is it also home users who have open resolvers?


Yep, my ISP has three open resolvers in my assigned range, and another 6 in the alternate range. It it grounds to give them a slight prod?


Your ISP doesn't have 3 open resolvers. Your neighbors have.

Your ISP could terminate the contract with those 3 customers, but they won't - they're customers. They could block inbound trafic on port 25 to those 3 customers (if they knew about it), but they have no incentive to. Or they could block all inbound traffic on port 25, which will likely break DNS for a lot of customers.


you meant port 53, i presume?


I don't believe he/she did as for my ISP blocked port 80 and 443 making me resort to port 25 to send emails~


ups. ofcourse.


I meant that they exist within the ISPs ranges, not that the ISP itself owns them.


I was confused too. It seems the openresolverproject link is reporting more than just open resolvers. Those are active nameservers. I checked two nameserves I had set up at my previous dayjob and was surprised they were on the list. I then checked them manually with dig and found they were in fact not open. Confirmed with this tool: http://dns.measurement-factory.com/cgi-bin/openresolvercheck...

I think the RCODE is important. I also checked 8.8.8.8 but got a result I wasn't expecting.


Very possible that the router provided by the ISP is mis-configured.


Doing a DDoS attack in the cause (however questionable the commitment to that cause is let's put it to one side for now) of Internet freedom is a ridiculous strategy. The more this sort of thing becomes inevitable the more TPTB will clamp down on such things and eventually we'll find ourselves on an Internet with far fewer freedoms and it'll all be far more locked down.

Whether you like it or not society tends to react like high-school - when enough people abuse a privilege eventually that privilege gets taken away. You can argue that a free Internet is a right (as some do) but you won't win that argument in the public sphere if that right is used to stop everyone else from getting done what they want to do online.


I suspect the people DDoSing spamhaus aren't doing it for internet freedom.


It wouldn't matter anyway. DDoS is not "freedom of speech".


Or perhaps they are?

We really need places like cyberbunker to keep internet free and open.

The day the the last piece of w4r3z, pr0n and other 1337 stuff is taken away from the internet the infrastructure would have to be in place to pretty much remove anything you want at will.

I wonder what will be next thing to get removed after that?

"First they came..."


Cyberbunker hosts spammers. They're free to make that choice. Spamhaus runs an IP blacklist of known spammers and put Cyberbunker on that list. They're free to make that choice. Companies who don't like spammers subscribe to the blacklist and use it to block offenders. They are free to make that choice.

I don't see where the internet is becoming less free and/or open.


I'm all for internet freedom, and I have no objection to whatever information cyberbunker may be hosting. I agree that they provide a valuable service.

I dread the sterile showroom that some wish the internet to become. But at the same time, there's a difference between objectionable content being available on the internet to those looking for it, and people flooding communication channels with unsolicited and possibly objectionable content, in the hope of making a quick buck.

I support pull-freedom, not push-freedom. The recipient should be the one deciding what they do and don't want to read/be exposed to.

Would you support someone DDoSing the maintainers of the EasyList for AdBlockPlus because their ad-serving domain was added to it?

If SpamHaus had the power to remove CyberBunker from the internet, this would be a different story, and I'd support CyberBunker. But SpamHaus only says "There's lots of spam originating from there". Which is a fact. I don't think they should be punished for stating the truth. A list of suspected spammers has just as much of a place on the free internet as the w4r3z and pr0n we all love so much.


I don't think anyone is DDoSing cyberbunker, so why are you concerned about them?


Spamhaus adding their IP address to a list is not 'removing' them from the internet. And what about Spamhaus' freedom to create whatever lists THEY want?



I don't really know about these things, but I know enough to trust cloudflare over gawker.


Thanks for a really cool lesson about the nature of the internet :)

I was, however, interested to see no mention whatsoever of cloudflare in other reports of this[1]. Is this something that bothers you?

[1] e.g http://www.bbc.co.uk/news/technology-21954636


No. When we're doing our job right, no one should know that CloudFlare even exists.


"..but first a bit about how the Internet works"

My favorite part of a Cloudflare post.


What are the incentives for the maintainers of open DNS recursors? How can we alter their incentives so that they can no longer be used in DNS amplification attacks?


There are some benefits (incentives) to running your own DNS server with the ability to perform recursive queries, but best practices dictate that these servers should only accept queries from "trusted clients". So, it's not so much that there are incentives to run an open recursor as it is there are very few negative incentives to running an open DNS recursor. I keep highlighting the word "open", because the open state of a DNS server is often not an intentional decision.

I'd speculate the most common reason for using DNS recursion is to allow a non-authoritative name server to return results for any query. This non-authoritative name server usually sits on a network that serves clients with low latency and high bandwidth, like a LAN. The network is typically private, but private is not always the same as secure/closed. Some benefits of running your own DNS are:

* The ability to cache DNS lookup results (speed increase)

* Placing the name server closer to clients (speed increase)

* The ability to blacklist certain zones (security)

And many more, most of which relate to control, speed, and security.

The thing is, none of these advantages are related to running an "open" DNS with recursive queries enabled. I think the core problem is twofold:

1) Many amateur sysadmins don't recognize that running a name server with recursive queries enabled is a security issue.

2) Enabling recursion doesn't automatically require any configuration to secure the client-trust relationship.

Unfortunately, I'm not really smart enough to propose any changes that would help the situation, but I think this represents a high-level overview of the most common problem scenario.

EDIT: It's also worth noting that some DNS servers enable recursive queries by default. Anyone running their own DNS for their zone, but who don't have knowledge of the issues related to recursive DNS will likely be running an open DNS recursor as well. These are commonly servers at web hosts, which have much faster internet connections as well. So it's a matter of getting everyone on board for changing defaults.


why not just use opendns.com? you get all of these `benefits` and more without you having to configure and maintain a DNS server.


Does OpenDNS still hijack addresses that don't resolve for ads?


Yes, they fuck with NXDOMAIN.

I don't understand why anyone puts up with them. I suspect because they have the word "Open" in their name.


One reason I don't use OpenDNS.com or Google DNS is that it means I might not hit the optimal CDN for me. Maybe they have improved things but last time I used it, I often ended up hitting CDN end points that were not optimal. For example, downloading MSDN content was slow. If I switched to my internet service providers DNS servers, I would hit a more optimal CDN end point that was faster. I had a similar experience with Netflix.

Note my example with MSDN took place a while ago so it might be replicable today.


It might be worth trying again, at least in theory, you should get correct CDN endpoints whatever happens. I suppose there might be an exception to this if a CDN has edge nodes within your ISP though. There's a bit more detail at https://developers.google.com/speed/public-dns/faq#cdn


My solution is to use my provider's DNS servers first and Google DNS after those.


Using OpenDNS does not provide all the benefits of running your own DNS server where "benefits" includes fine grained control.

Additionally OpenDNS has some behaviors that network admins aren't crazy about. OpenDNS will return an answer for queries with no authoritative match. For example, if you query `dig noexist.example.com @208.67.222.222` (that's an OpenDNS resolver IP), you'll get answer: 1 and an IP address. This is considered a "feature" by OpenDNS, but from a purist's perspective, it is a breakage.

We do not run our own DNS server for client name-resolution, because we don't have any need for the control it provides. I acknowledge that some people do, however.


They set up a caching server for their organization so the DNS lookups are faster, but they forget to restrict the recursive queries only to their organization so anyone else can use them.

That makes them work like Smurf ampliefiers (http://en.wikipedia.org/wiki/Smurf_attack) in the past.


I did a search on the /24 block my DSL connection is part of, on openresolve.

There were 14 open resolvers. Prodding a bit around at them , many of them are just linux machines people in my area put on the internet, and have just installed a DNS server on it, likely for caching purposes, but it isn't set up properly.

Ofcourse these are DSL connections, so the upload rate is likely just 512kbit, but all you need is enough of them.


And there are many times more similarly configured *nix boxes in data centers (with data center bandwidth). Google's gigabit service could be an interesting block to search...


There isn't much incentive save for the fact that it is easier to configure. For some versions of DNS servers you need to consciously disable recursive lookups and then configure the list of servers that are accepted.

If a DNS server doesn't perform lookups for any outside server it isnt very useful.

If you think about a web host, with X number of servers that host websites, these are considered the Master DNS servers when the physical domains being hosted reside on those servers.

The public listed DNS servers on your domain WHOIS records are actually the Secondary DNS servers, and these perform the DNS lookups when someone accesses the hosted website on your Master server.

If the Secondary server accepts requests from anyone, even domains it isn't explicitly responsible for, then it is performing recursive lookups.

A more secure configuration is for the Secondary servers to only accept lookups for its own Master servers.


For some it's a matter of knowing that they shouldn't be running a recursive server. We recently contacted a number of customers asking them to disable their recursive server, and most didn't really know they were running it, or that it was a problem.


In my a-few-guys-in-one-office location we have a Windows domain server which has also a DNS server to which our computers are connected. I don't have the slightest idea if that DNS server is recursive or not and what wouldn't work if that setting would be changed. Any clever link where something really useful can be found? Wikipedia doesn't help me:

http://en.wikipedia.org/wiki/Name_server#Recursive_query

Thanks in advance!


It is recursive, and I honestly have no idea how to fix it. We're in a similar situation, and the only answers I can find for this involve disabling the DNS server (which would break AD).

Honestly, your best bet is to firewall off UDP port 53 to all hosts except ones that are using it as a DNS server.


It seems my server schouldn't be a problem, it's behind the NAT and there's no port forwarding of port 53. If I would have to do the resoluion for the public nodes of my domain I'd anyway have a separate Linux or BSD node just for that, replying only the queries about my nodes. Anybody knows if I'm missing something in such a solution?


This was an immensely interesting read even though only high-level details were discussed. Always impressed with CloudFare architecture. Thanks for a great read!


Did I miss something in your article (I skimmed), who are the gentlemen in the photo at the top?


I didn't write TFA, but it appears that that is an image of the group Massive Attack.


After seeing the headline and the photos, I kept expecting to read that it was some sort of composite "sketch" of suspects in the DDoS attack compiled from real photos. I am not familiar with what the members of Massive Attack look like and did not make that connection. Seems like poor photo usage.


They were trying to be clever. "Largest DDoS attack"..."Massive Attack". But I love Massive Attack but yes, most people wouldn't recognize the muscians and assume that it was the suspects behind the attacks.


I recognized them, nontheless, it would be nice to see the credits/licenses of the images.


It's a composite image of the members of Massive Attack.


I'm not a networking expert, but how would turning off recursive DNS queries mitigate this kind of attack? A nameserver must still answer queries for the domains it is authoritative for, so what prevents the bad guys from using only authoritative queries for their attack? Wouldn't it be much better to just add some rate limiting to every DNS (recursive or not)?

On a side note, I think that especially in times where messing with DNS is used as a censorship tool by a lot of governments and regulators, there is some value in being able to ask someone else's DNS for any domain, but that's a different issue.


I'm not a networking expert either, but I do believe the recursive nature turns this amplification DDoS to just a "normal" DDoS where the only target would be the nameservers of the target service. At the end of the day, if there wasn't a single nameserver that was turning around and asking 2, 10, or 100 other nameservers the same fraudulent question they didn't have the answer to, the only thing you could do with the same exact attack would be to take out someone's authoritative servers. No collateral damage by slowing down the internet connections of home users in California when the attack is being targeted at some server in Germany.


I don't think the nameserver (or the upstream NS) is the target of the attack, but the (spoofed) IP which appears to be sending the queries:

* Attacker sends (small) query to a lot of resolvers, spoofing the source adress to be the IP of the target

* Each NS replies with a (large) response, thus flooding the target with a lot of data

As long as there is no rate limiting in the nameservers used for the attack, this would work regardless of whether the answer is authoritative or not.


Thanks for the post Clouflare. It's way more interesting to read from your perspective than from a random journalist.


I don't understand something...

How is 300GBPS a lot? If we take London which has 8million and say roughly half of them are on the internet (4million). Wouldn't that mean that if everyone was using 78kbps we would reach 300 gbps(roughly)?

I just don't understand how a tier1 or internet exchange router can only handle 100gbps. That seems extremely low to me considering I have like 1mbps for just my house?


You have 1mbps, but you very, very rarely actually use all of that. It doesn't actually really matter if everyone had 1Gbps internet connections, because most everyone is requesting data so infrequently that the points that must bear all the load can handle it.

Pretty much, 100Gbps is about 100 thousand times your internet connection. So, a single 100Gbps line could handle 100 thousand people like you requesting a 1mb file at the EXACT same moment. Spread that out by even a few seconds and that router can handle way more people than that 100 thousand.

This is a GROSS oversimplification, but the idea stands.


100Gbps is the fastest ethernet connection that you can buy (for six figures). So that is always going to be a limitation on the router. They can be aggregated into larger links but I think that is pretty uncommon. The internet is made up of lots of connections between ISPs not a few giant ones, and you can attack each one of those links separately.


Your calculation is assuming that all of the traffic from those 4 million London-based users is travelling through LINX. While LINX is the IX for London, not all traffic originating from London would have to travel through it; Tier 2 and Tier 3 networks would route some portion of the traffic as well.

This diagram from wikipedia shows some of the sorts of alternate routes that might be used: http://en.wikipedia.org/wiki/File:Internet_Connectivity_Dist...

As an aside: does anyone know of a good resource that gives an example of the rough percentage of traffic that would be handled in each of the various ways?


I may have misunderstood this but I think this saying that a single IP (or endpoint) is receive this bandwidth. So there is this much going through the peer network AND regular internet use.


I feel like there's a shocking amount of laziness and incompetence rife in the industry. How else would so many open resolvers exist? It's like the thousands of nodes with default passwords that were used for the IPv4 census.

How exactly do we combat this?


Laziness? Possibly.

Incompetence? More likely.

Incompetence due to having 45 #1 priorities and developing a deep understanding of secure configurations is a #3 priority? Most likely.


I think ISPs should block things like mail servers and DNS servers on their clients by default, unless explicitly enabled and verified to be properly configured.


I wonder what the political implications of this type of "collateral damage" might be.

Governments have recognized the need to defend against direct attacks on their networks and develop their own offensive attack capabilities from a national security perspective but I haven't seen the same level or response to these sorts of events.

When the damage spills over to impacting services that tens of millions of people rely on and cause economic damage, should we treat this as the same as an attack on our critical infrastructure (electricity, water, etc.) like an act of terrorism? If that becomes the case, would it warrant the use of deadly force against the attackers?


All interesting questions, and now quite pressing ones.


Today one guy in my local news (Monterrey, Mexico) talk about it, that "tech guy" said that netflix and other services were intermitent or fail to access, and also that this "war" were between cyberbunker and Spamhaus. I think this is not completly true, but he spoke if this kind of things would be the next kind of wars in the following years and recommendo to the people a good antivirus and a firewall.

I really admire what cloudflare did, and help the not too tech guys to understand how this things works, if cloudflare were promoting himself with this posts well they need to eat and educate their children is normal their behavior.


I thought this was an excellent writeup and I would like to learn more more.

Are there any recommended books on learning about the Internet/DNS on a global scale?


http://www.amazon.com/Cisco-Essentials-Press-Networking-Tech... (Networking) http://www.amazon.com/DNS-BIND-5th-Cricket-Liu/dp/0596100574 (DNS)

There are always the standards texts, Computer Networks by Tannenbaum and Internetworking by Radia Perlman for the theory.


I understand the issues with DNS reflection, but why are open resolvers the issue? Isn't the point of DNS to respond to requests with correct information?

Surely if random people can't connect to DNS resolvers and get information, they can't surf the net either? Someone has to resolve DNS for people for the internet to function, don't they?


DNS runs over UDP, which means the source of requests for information can be spoofed. Also, the amount of data of a response is significantly larger than the request, so you can use DNS resolvers to send significantly more data to a victim than you yourself need to generate by sending DNS queries with your victim as the source IP.


Yes, I understand the attack. My point is that somewhere, there have to be DNS servers that respond to public requests ... otherwise the internet will not work.

Hence, some DNS servers have to be open. By saying it's openness that's the problem, we're blaming the victims, rather than the issue, which is that DNS is flawed. Simply moving to TCP would be better, surely?


UDP doesn't require a handshake hence is easy to spoof unlike TCP where a full-duplex connection must be established for a successful connection.


Yes, that's my point. If we move to TCP we fix the issue. At the moment I can't see how closing open servers is a real fix.


Moving from UDP back to TCP on large packets is a mixed bag. TCP is slow, very slow. At one time DNS packets were limited to 512 bytes and had to use TCP for more data, but over time the number of UDP packets over 512 bytes increased greatly. Going back to the smaller packet size would impact a large number of users with longer load times, especially on wireless devices.

Closing open DNS servers isn't a real fix. The people who need to fix it are the lest likely to have a clue there is a problem in the first place.


The open resolvers are the issue because they respond to anyone, which makes it possible to use them in an attack. Most DNS servers of ISPs will only respond to requests coming from their clients. Google (8.8.8.8) and OpenDNS are exceptions, but they will have measures to mitigate against these attacks.


My point is that I think there's no way round some DNS servers having to respond to public requests.

Do you have any details on how google or opendns solve the problem? I'm particularly interested in why their servers can't be used for reflection, because if there's an easy way, shouldn't we be asking the owners of open servers to do that rather than limiting IP access?

In case this comes across as argumentative, please let me say that I am actually interested in this and am asking for information, not trying to make a point here. I really would like to understand the issue.


Well as I said, the majority of DNS servers are meant for a specific group of users, of a particular ISP or company, which can be filtered by IP. Google & OpenDNS are exceptions; I wouldn't know about their defense mechanisms and I'd imagine they prefer not to disclose them, but I suppose it might be similar to how credit card companies try to detect fraud by looking for unusual transactions. There's also non-recursive servers answering queries about a particular domain, they also answer to anyone but they answer only about their domains (authoritative).

I think your point about trying to find a way for the open DNS resolvers to be protected but still open is moot, because there's really no reason for servers to be open to everyone (except for Google or OpenDNS who know what they're doing), these are just misconfigured servers of people who have no idea they can be abused. In my opinion the solution is for ISPs to actively block such open resolvers among their clients.


I take the point that a lot of these servers are just doing it by mistake, but do you not think that the more we allow only a few large conglomerate DNS servers, the more we fall prey to other issues with central directory services?

Eg https://en.wikipedia.org/wiki/DNS_blocking#Criticism

http://torrentfreak.com/how-to-unblock-the-pirate-bay-111004...

One of the big advantages of DNS at the moment, surely, is its ability to be run by anyone?


That's about blocking specific domains using DNS. I'm talking about blocking DNS resolvers by default. I'm not talking about only allowing "a few large conglomerate DNS servers", but merely about protecting people from unwittingly allowing their computers to be used as attack vectors.


This page has some details about how Google's public DNS mitigates some of the problems with open resolvers.


So, that´s why HN has been doing 503´s for the last week?, I am from Spain and it has been failing all the time.


HN isn't hosted on CloudFlare, and a slowdown wouldn't cause 503s like that. HN's likely just having server trouble.


From the article: "If the Internet felt a bit more sluggish for you over the last few days in Europe, this may be part of the reason why."

Hm, as opposed to normal days, when our Internet is just normal sluggish. Not that fond of that phrasing. And I must be bored to even comment on that.


I don't think the attack has stopped, it will come back even bigger and would take down many networks.

Hope that kind of pubic outcry and media visibility will get the networks and governments to take notice and fix the core Internet infrastructure of the known vulnerabilities


http://m.cnet.com/news/egypts-military-arrests-divers-cuttin... now it feels like a conspiracy to takedown Internet


What I don't understand is why someone doesn't finally write a piece of malware that destroys botnets? These unmaintained machines cause the entire world so much grief.

It doesn't have to be mean and destroy data, just incapacitate the machines and force the users to upgrade.


Existing botnet malware already protects itself against removal or interference from other malware and anti-virus software. And it usually has auto-update functionality. It would just be extended to defend against this new "anti-malware" as well.

Besides, if this hypothetical "anti-malware" actually incapacitated machines, it would be highly illegal itself.


And here's another interesting one from the past (7 / 13 root servers got shutdown or blocked): http://c.root-servers.org/october21.txt



I'm happy to see CloudFlare actually include some technical details in this post, though as always, I'd be happy to see more. It's always more interesting when we can follow along technically.


"Tier 1 networks don't buy bandwidth from anyone, so the majority of the weight of the attack ended up being carried by them."

"We're proud of how our network held up under such a massive attack."

wat?!


This site http://www.internetpulse.net/ is useful to check if you suspect a global slowdown.


I know very little on this subject, so please excuse my ignorance, but is that a world list or US list - just reading the names makes it seem US centric. Thanks


You are right--just US. Sorry for misdirection.


Pretty sure those are just primary US ISPs


Great post, and thanks for introducing me to the Open DNS Resolver Project.

http://openresolverproject.org/


No wonder I had difficulty accessing Google a few times during that time period.


Hold on, isn't it the time to force DNS to be TCP only ?


No. Currently a significant amount of latency in opening a web page is the DNS resolution (for half a dozen domains). TCP adds a lot of overhead compared to UDP.

But a case can be made that large requests should be TCP only.


> TCP adds a lot of overhead compared to UDP.

Not if you simply keep a TCP connection open and use SYN cookies on the servers. Many DNS servers already support TCP (though I'm pretty sure you need a new connection for every request).


That'd likely result in increased traffic and slowdowns even in normal cases.


nic.fr is down so most .fr domains are down due to unresolved hostname.


Mine was OK. In India


Unfortunately, we in India are severely affected by the Mediterranean cable cut, which coincidentally also happened at roughly the same time


Gizmodo is calling out Cloudflare's claims that this affected anything more than Dutch networks: http://gizmodo.com/5992652

They point out, for example, that the IX's in question routinely see 2.0+Tbps peaks, so a 0.3Tbps attack would not be likely to shake a single IX, little like "the internet" itself.

Interesting rebuttal, although certainly not comprehensive.


Cloudflare is crying: "the sky is falling". Gizmodo is taking the other extreme: "nothing to see here; move along".

The truth is in the middle.

There were repercussions in Denver/Colorado with a Tier 1 network provider, mainly in the form of greatly increased latency. This impacted downstream providers, as well as a couple of my clients in Colorado. I informed those clients yesterday my opinion was the issue was either a network engineering/backhoe issue, or there was a major concerted cyber attack targeting the Denver area.

The reality? There was an attack. It was measurable. It did have a noticeable impact. But it was far more annoying and irritating than the headline "Internet killed. Film at 11." would lead one to believe.


While the IXs may routinely process 2.0+Tbps through their routing ASICs, they are unlikely to be prepared for receiving 0.3+Tbps to their router control planes. By saturating the control planes, you could destabilize routing protocols and cause link flaps, even if the underlying ASIC switching fabric is unharmed.


Akamai has more bandwidth at AMS-IX than this entire attack.


Most of the IXP around the globe have public statistics and graphs, so you can check by yourself the impact around the globe.

Here is a nice collection for the interested readers: https://www.euro-ix.net/resources-list-of-ixps


Indeed they should, Cloudflare offers to protect you [and charge you for the privilege] from DDoS attacks like this.

If they really know how the Internet works they shouldn't make claims like this.....


That's called marketing.

They are crafting the stories methodically (and successfully I'd say, given the number of times we all talk about CloudFlare on HN) to drive more exposure to their brand. It's too optimistic and hyperbolic sometimes, but no really different than any product announcement from Apple or Facebook.

In all fairness to them, a positive side effect is that we're all discussing now how to solve the root cause of this decades-old - but still threatening - problem (open networks and dns attacks). I guess it will really take an apocalyptic event to convince millions of operators to configure better their own networks.

This is not the first time CloudFlare uses hyperbolic headlines. Case in point: "Why Google Went Offline Today and a Bit about How the Internet Works" https://news.ycombinator.com/item?id=4747910


How is that at all hyperbolic?


Headline: "Why Google Went Offline Today"

Article: "Looking at peering maps, I'd estimate the outage impacted around 3–5% of the Internet's population."




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: