Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Weaponizing Middleboxes for TCP Reflected Amplification (umd.edu)
176 points by dredmorbius on Sept 16, 2021 | hide | past | favorite | 29 comments


> The second cause we found for infinite amplification was victim sustained loops. When a victim receives an unexpected TCP packet, the correct client response is to respond with a RST packet. We discovered a small number of amplifiers that will resend their block pages when they process any additional packet from the victim - including the RST. This creates an infinite packet storm: the attacker elicits a single block page to a victim, which causes a RST from the victim, which causes a new block page from the amplifier, which causes a RST from the victim, etc.

Wow. Just wow. “Rigorous QA” and “spec compliant” aren’t the first things to come to mind when I think of middlebox vendors, but this is just next level negligence.


The attack is initiated with packets that are outside the bounds of normal expected operation. While an ideal engineer set up would provide a thorough suite of negative tests to ensure those edge cases are covered, it’s also fair to say that these kinds of bugs are the easiest ones to miss. So I think describing it as “next level negligence” is rather uncharitable. From experience writing software and running software development teams, bugs like these are inevitable and always more obvious in hindsight. The problem with hardware is you can’t always just patch fault.


To me your position would be defensible if you were talking about, say, a crashing bug in a phone app. For a security product with privileged network placement, it’s like getting salmonella from a restaurant and then being told that food safety is easy to forget when you’re busy.


> The problem with hardware is you can’t always just patch fault.

That is the part that is "next level negligence". If you're making "security products" and you are not able to cover basic security - i.e. have a working update process and communicate to your customers that they should actually use that working update process and not actively disable it - then you failed at your job.


They're not making security products. They're making "security" products.

Imagine you're working on products for airport security. Should you focus on stuff that might actually be useful to improve the security of the airport and planes? Or, should you focus on useless security theatre for the well-funded TSA and similar entities?

You won't find these "security" products in environments where actual security was crucial and the operators understood what they were doing. What you see there is Zero Trust, low friction but effective authentication, and occasionally actual real air-gapping, because somebody says "It would be easier if this wasn't air-gapped but that's inherently unsafe and the risk is unacceptable, so we're doing it the hard way".

But lots of places want to play pretend and for them a "security" product is perfect, it checks off the box for the relevant CxO role and causes some level of irritation while making no practical difference to actual security. Perfect.

Reminds me of the first question you ask about a "Secure" site (e.g. a data centre) to understand if they really mean "Secure" or not. "Who cleans this place?"

If the answer is "Nobody, it's a mess" or "sigh we all have to take turns" then maybe it's actually secure. But if there's some bullshit about "vetting" of staff paid minimum wage to go wherever they want, unremarked, carrying large black bags of unknown objects then the facility is not, in fact, secure.


Serious DPI vendors should really implement a proper state machine, so that they can't be fooled that easily. But middleboxes are not "security products", they can't be.

"Insertion, Evasion, and Denial of Service: Eluding Network Intrusion Detection" was published in 1998 [https://apps.dtic.mil/dtic/tr/fulltext/u2/a391565.pdf]. We should know that DPI is not reliable.

In fact Geneva is a research project that expands and extends the concepts of fragrouting, applying a genetic algorithm to automatically find flaws in censoring middleboxes [https://raw.githubusercontent.com/Kkevsterrr/geneva/master/e...].

It is expected that a research project of this type exposes these kind of bugs. And the reason why they can research these things is because "there isn't enough information on the wire".

Bugs of this type are egregious for their danger and simplicity, but patched these there will always be.


I would characterize that differently: they are security products — that’s how they’re marketed – but they aren’t perfect and are only effective against certain behavior. That means you can’t rely on them alone for everything but it doesn’t mean they don’t have a security function.


My issue is exactly with how they're marketed and sold.

Information theory proves there's an infinite number of ways in which you can codify something. The subset of encodings that meets the rules imposed by any middlebox is in turn infinite.

> they aren’t perfect and are only effective against certain behavior

This means that they are only effective against default behavior.

Anything else is out of scope for these products, which I think is what @laumars was referring to with "outside the bounds of normal expected operation".

Marketing sophisms can be fun, but defining something as a "security product" when it is mathematically proven that there are infinite ways to bypass the provided "security guarantees" is ... simply something I refuse to do.


> the easiest ones to miss. So I think describing it as “next level negligence” is rather uncharitable

On the contrary, mitigating amplification attacks is security 101.

And the middleboxes are sold as security products.


We are talking about following well understood and published standards, such as TCP and IP. The people implementing those stacks were either negligent, or were consciously cutting corners and safeguards that those open standards had already in place. The result: lots of network pipes can be subverted by crackheads into flooding innocent netizens.

No, I got no sympathy for the people who built and sold those devices.


Have you actually gone through the TCP/IP specification and implemented everything securely?

It’s nearly as simple of specification to get right as your post suggests.


Getting the 3-way TCP handshake and decrementing TTL is quite easy to get right. Those are very much foundational properties of the respective protocols. We are not talking about obscure edge cases.

Also, building a bridge is not simple either, but it's a well known and well solved problem. When a bridge crashes we don't just shrug and wave at the construction company with "It's OK, bugs happen".


It’s also very easy to get wrong.

How many bridges are built to survive abnormal conditions like earthquakes, tidal waves, or even just a lorry driver smashing into the roof of a low bridge? Some of my closest mates are actually structural engineers specialising in bridges so coincidentally I happen to know a lot on this topic and the answer is, outside of surprisingly few countries, most bridges aren’t designed to carry any more than expected load. Some bridges aren’t even strong enough to carry heavy goods vehicles of present day. Hence why so many bridges have instructions upon approach detailing to drivers about safe and correct usage.

However you can’t really compare software to bridge building. There’s thousands of reasons why the two aren’t the same.


You can’t really compare this to bridge building. There’s thousands of reasons why the two aren’t the same.


I listed to the Heavy Networking packet pushers episode on this (https://packetpushers.net/podcast/heavy-networking-596-weapo...) was fascinating! shout out to this podcast, really great quality content!


Given that this is now published, it seems like it’s just a matter of time before bad actors discover the vulnerable middleboxes (trivial since they operate on TCP port 80 on public IPv4 addresses) and we start seeing some really nasty DOS attacks. The infinite loop variants seem particularly bad - a nation state may not even notice that one of their censorship middleboxes is spamming a hapless victim server into oblivion.

This doesn’t look pretty. Kudos to the researchers for discovering it, but I’m definitely afraid that it’ll be hard or impossible to mitigate for anyone except the biggest CDNs. I certainly would not expect the middleboxes to remediate this.


> Completely fixing this problem will require countries to invest money in changes that could weaken their censorship infrastructure, something we believe is unlikely to occur.

...


> Why don’t you have a logo or fancy name for your attack? Drop us a line if you have any suggestions!

I suggest the name Midway.


> We discovered a small number of amplifiers that will resend their block pages when they process any additional packet from the victim - including the RST

Facepalm


Whoever thought of putting TTL on the packets (decrementing on each hop) saved the world.


Originally time-to-live was to be measured in seconds, but it was modified to mean 'maximum number of hops to transit'.

This is https://datatracker.ietf.org/doc/html/rfc760#page-11 Jan 1980, a document by Jon Postel.

Maybe Don Hopkins can find out more about this, I'll page him.

edit2:

So, some more searching I found this article:

https://www.alertlogic.com/blog/where-is-ipv1-2-3-and-5/

Which claims that the split happened in 1978:

"At this point, TCP and IP were split, with both being versioned number 3 in the spring of 1978."

But I can't find any protocol spec for IPV3.

edit3:

https://www.rfc-editor.org/ien/ien41.pdf

has the TTL field, that's June of 1978, and it's IPV4.

IPV3 does not seem to have had a TTL field:

https://wander.science/articles/ip-version/


Here is a thread about TTL on the Internet History mailing list:

https://elists.isoc.org/pipermail/internet-history/2020-Sept...

A good summary from Jack Haverty:

https://elists.isoc.org/pipermail/internet-history/2020-Sept...

>My recollection, from the discussions when the IP4 header was being defined, was that TTL was included for two reasons:

>1/ we couldn't be sure that packets would never loop, so TTL was a last resort that would get rid of old packets. The name "Time To Live" captured the desire to actually limit packet lifetime, e.g., in milliseconds, but hops were the best metric that could actually be implemented. So TTL was a placeholder for some future when it was possible to measure time. TTL did not prevent loops, but it did make them less likely to cause problems.

>2/ TCP connections could be "confused" if an old packet arrived that looked like it belonged to a current connection, with unpredictable behavior. The TTL maximum at least set some limits on how long such a window of vulnerability could be open. Computers had an annoying tendency to crash a lot in those days, but took rather long times to reboot. So the TTL made it less likely that an old packet would cause problems. (Note that this did not address the scenario when a gateway crashed, was subsequently restarted possibly hours later, and sent out all the packets it still had in its queues.)

/Jack

https://elists.isoc.org/pipermail/internet-history/2020-Sept...

>I just realized that I should have also said:

>3/ Although TTL did not prevent loops, it was a mechanism that detected loops. When a packet TTL dropped to zero, an ICMP message (something like "TTL Time Exceeded") was supposed to be generated to tell the source of the failure to deliver. Gateways could also report such events to some monitoring/control center by use of SNMP, where a human operator could be alerted.

>4/ TTL was also intended for use with different TOS values, by the systems sending voice over the Internet (Steve Casner may remember more). The idea was that a packet containing voice data was useless if it didn't get to its destination in time, so TTL with a "fastest delivery" TOS enabled the sender to say "if you can't deliver this in 200 milliseconds, just throw it away and don't waste any more bandwidth". That of course wouldn't work with "time" measured in hops, but we hoped to upgrade soon to time-based measurements. Dave Mills was working hard on that, developing techniques for synchronizing clocks across a set of gateways (NTP was intended for more than just setting your PC's clock).

>I've noticed that researchers, especially if they're not involved in actually operating a network, often don't think about the need for tools to be used when things are not going well - both to detect problems and to take actions to fix the problem.

>Without that operational perspective, some of the protocol functions and header fields may not seem to be necessary for the basic operation of carrying user traffic. TTL is one such mechanism. Source route was another, in that it might provide a way to get control messages delivered to a misbehaving host/router/whatever that was causing routing failure. This involved using source routing as a somewhat "out of band" signal mechanism that might not be affected by whatever was going on at the time.

>All of this was considered part of the "Internet Experiment", providing instrumentation to monitor the experiment to see what worked and what didn't, and evolve into tools for use in dealing with problems in operational networks.

>At one point I remember writing, sometime in the mid 80s, a rather large document called something like "Managing Large Network Systems" for some government contract, where these kinds of tools were described. But I haven't been able to find it. Probably in a dusty warehouse somewhere....

>/Jack


> We found amplifiers that, once triggered by a single packet sequence from the attacker, will send an endless stream of packets to the victim. In our testing, some of these packet streams lasted for days, often at the full bandwidth the amplifier’s link could supply.

That would certainly make you feel like you were a victim under direct attack. And getting state actors on-board to fix this seems like it is never going to happen.


Most states aren't monoliths. When the right lizard can't watch his Netflix when he wants, the shitboxes will be fixed.


Sadly the cheap solution is to exempt the Lizard from the shitbox.

One of the things MPs asked for in the UK when it passed metadata retention laws was an exemption... for MPs. They were told it was impossible, although not with much detail. In reality it is impossible because (although it's often framed as a law about retaining specific metadata about specific individuals) the reality is the only order signed says retain Everything about Everybody, much simpler but just a little harder to sell as proportional to the supposed threat.


If the shitbox is sending garbage packets at line rate, it doesn't really matter if it's sending to you or someone else, now links in those directions are at least a lot more congested and that's likely to impact everyone else's experience.

From past experience with endpoints that got into 'infinite' packet states, if you've got any monitoring at all, you notice when you're sending line rate garbage, even if you don't monitor packet sending directly, it'll spike your CPU and reduce your throughput/increase your latency for other tasks. That was, thankfully 1:1 packet generation, but sometimes got triggered by hosts (or middleboxes) with low latency and a 10G pipe; also thankfully it was a bug on our machines, so fixing it was not too bad (actually we had first seen the problem on localhost, when it showed up again on public interfaces, having seen it before made diagnosis pretty quick).


Dan Kaminsky would have loved this.


Damn, now I'm stuck playing out how that discussion would go.


Good, perhaps this will drive another nail into the middlebox coffin. A middlebox is just a computer networking device that transforms, inspects, filters, and manipulates traffic for purposes other than packet forwarding.

middleboxes are cancer used to spy on traffic.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: