> The second cause we found for infinite amplification was victim sustained loops. When a victim receives an unexpected TCP packet, the correct client response is to respond with a RST packet. We discovered a small number of amplifiers that will resend their block pages when they process any additional packet from the victim - including the RST. This creates an infinite packet storm: the attacker elicits a single block page to a victim, which causes a RST from the victim, which causes a new block page from the amplifier, which causes a RST from the victim, etc.
Wow. Just wow. “Rigorous QA” and “spec compliant” aren’t the first things to come to mind when I think of middlebox vendors, but this is just next level negligence.
The attack is initiated with packets that are outside the bounds of normal expected operation. While an ideal engineer set up would provide a thorough suite of negative tests to ensure those edge cases are covered, it’s also fair to say that these kinds of bugs are the easiest ones to miss. So I think describing it as “next level negligence” is rather uncharitable. From experience writing software and running software development teams, bugs like these are inevitable and always more obvious in hindsight. The problem with hardware is you can’t always just patch fault.
To me your position would be defensible if you were talking about, say, a crashing bug in a phone app. For a security product with privileged network placement, it’s like getting salmonella from a restaurant and then being told that food safety is easy to forget when you’re busy.
> The problem with hardware is you can’t always just patch fault.
That is the part that is "next level negligence". If you're making "security products" and you are not able to cover basic security - i.e. have a working update process and communicate to your customers that they should actually use that working update process and not actively disable it - then you failed at your job.
They're not making security products. They're making "security" products.
Imagine you're working on products for airport security. Should you focus on stuff that might actually be useful to improve the security of the airport and planes? Or, should you focus on useless security theatre for the well-funded TSA and similar entities?
You won't find these "security" products in environments where actual security was crucial and the operators understood what they were doing. What you see there is Zero Trust, low friction but effective authentication, and occasionally actual real air-gapping, because somebody says "It would be easier if this wasn't air-gapped but that's inherently unsafe and the risk is unacceptable, so we're doing it the hard way".
But lots of places want to play pretend and for them a "security" product is perfect, it checks off the box for the relevant CxO role and causes some level of irritation while making no practical difference to actual security. Perfect.
Reminds me of the first question you ask about a "Secure" site (e.g. a data centre) to understand if they really mean "Secure" or not. "Who cleans this place?"
If the answer is "Nobody, it's a mess" or "sigh we all have to take turns" then maybe it's actually secure. But if there's some bullshit about "vetting" of staff paid minimum wage to go wherever they want, unremarked, carrying large black bags of unknown objects then the facility is not, in fact, secure.
Serious DPI vendors should really implement a proper state machine, so that they can't be fooled that easily. But middleboxes are not "security products", they can't be.
It is expected that a research project of this type exposes these kind of bugs. And the reason why they can research these things is because "there isn't enough information on the wire".
Bugs of this type are egregious for their danger and simplicity, but patched these there will always be.
I would characterize that differently: they are security products — that’s how they’re marketed – but they aren’t perfect and are only effective against certain behavior. That means you can’t rely on them alone for everything but it doesn’t mean they don’t have a security function.
My issue is exactly with how they're marketed and sold.
Information theory proves there's an infinite number of ways in which you can codify something. The subset of encodings that meets the rules imposed by any middlebox is in turn infinite.
> they aren’t perfect and are only effective against certain behavior
This means that they are only effective against default behavior.
Anything else is out of scope for these products, which I think is what @laumars was referring to with "outside the bounds of normal expected operation".
Marketing sophisms can be fun, but defining something as a "security product" when it is mathematically proven that there are infinite ways to bypass the provided "security guarantees" is ... simply something I refuse to do.
We are talking about following well understood and published standards, such as TCP and IP. The people implementing those stacks were either negligent, or were consciously cutting corners and safeguards that those open standards had already in place. The result: lots of network pipes can be subverted by crackheads into flooding innocent netizens.
No, I got no sympathy for the people who built and sold those devices.
Getting the 3-way TCP handshake and decrementing TTL is quite easy to get right. Those are very much foundational properties of the respective protocols. We are not talking about obscure edge cases.
Also, building a bridge is not simple either, but it's a well known and well solved problem. When a bridge crashes we don't just shrug and wave at the construction company with "It's OK, bugs happen".
How many bridges are built to survive abnormal conditions like earthquakes, tidal waves, or even just a lorry driver smashing into the roof of a low bridge? Some of my closest mates are actually structural engineers specialising in bridges so coincidentally I happen to know a lot on this topic and the answer is, outside of surprisingly few countries, most bridges aren’t designed to carry any more than expected load. Some bridges aren’t even strong enough to carry heavy goods vehicles of present day. Hence why so many bridges have instructions upon approach detailing to drivers about safe and correct usage.
However you can’t really compare software to bridge building. There’s thousands of reasons why the two aren’t the same.
Given that this is now published, it seems like it’s just a matter of time before bad actors discover the vulnerable middleboxes (trivial since they operate on TCP port 80 on public IPv4 addresses) and we start seeing some really nasty DOS attacks. The infinite loop variants seem particularly bad - a nation state may not even notice that one of their censorship middleboxes is spamming a hapless victim server into oblivion.
This doesn’t look pretty. Kudos to the researchers for discovering it, but I’m definitely afraid that it’ll be hard or impossible to mitigate for anyone except the biggest CDNs. I certainly would not expect the middleboxes to remediate this.
> Completely fixing this problem will require countries to invest money in changes that could weaken their censorship infrastructure, something we believe is unlikely to occur.
> We discovered a small number of amplifiers that will resend their block pages when they process any additional packet from the victim - including the RST
>My recollection, from the discussions when the IP4 header was being
defined, was that TTL was included for two reasons:
>1/ we couldn't be sure that packets would never loop, so TTL was a last
resort that would get rid of old packets. The name "Time To Live"
captured the desire to actually limit packet lifetime, e.g., in
milliseconds, but hops were the best metric that could actually be
implemented. So TTL was a placeholder for some future when it was
possible to measure time. TTL did not prevent loops, but it did make
them less likely to cause problems.
>2/ TCP connections could be "confused" if an old packet arrived that
looked like it belonged to a current connection, with unpredictable
behavior. The TTL maximum at least set some limits on how long such a
window of vulnerability could be open. Computers had an annoying
tendency to crash a lot in those days, but took rather long times to
reboot. So the TTL made it less likely that an old packet would cause
problems. (Note that this did not address the scenario when a gateway
crashed, was subsequently restarted possibly hours later, and sent out
all the packets it still had in its queues.)
>3/ Although TTL did not prevent loops, it was a mechanism that
detected loops. When a packet TTL dropped to zero, an ICMP message
(something like "TTL Time Exceeded") was supposed to be generated to
tell the source of the failure to deliver. Gateways could also report
such events to some monitoring/control center by use of SNMP, where a
human operator could be alerted.
>4/ TTL was also intended for use with different TOS values, by the
systems sending voice over the Internet (Steve Casner may remember
more). The idea was that a packet containing voice data was useless if
it didn't get to its destination in time, so TTL with a "fastest
delivery" TOS enabled the sender to say "if you can't deliver this in
200 milliseconds, just throw it away and don't waste any more
bandwidth". That of course wouldn't work with "time" measured in hops,
but we hoped to upgrade soon to time-based measurements. Dave Mills
was working hard on that, developing techniques for synchronizing clocks
across a set of gateways (NTP was intended for more than just setting
your PC's clock).
>I've noticed that researchers, especially if they're not involved in
actually operating a network, often don't think about the need for tools
to be used when things are not going well - both to detect problems and
to take actions to fix the problem.
>Without that operational perspective, some of the protocol functions and
header fields may not seem to be necessary for the basic operation of
carrying user traffic. TTL is one such mechanism. Source route was
another, in that it might provide a way to get control messages
delivered to a misbehaving host/router/whatever that was causing routing
failure. This involved using source routing as a somewhat "out of band"
signal mechanism that might not be affected by whatever was going on at
the time.
>All of this was considered part of the "Internet Experiment", providing
instrumentation to monitor the experiment to see what worked and what
didn't, and evolve into tools for use in dealing with problems in
operational networks.
>At one point I remember writing, sometime in the mid 80s, a rather
large document called something like "Managing Large Network Systems"
for some government contract, where these kinds of tools were
described. But I haven't been able to find it. Probably in a dusty
warehouse somewhere....
> We found amplifiers that, once triggered by a single packet sequence from the attacker, will send an endless stream of packets to the victim. In our testing, some of these packet streams lasted for days, often at the full bandwidth the amplifier’s link could supply.
That would certainly make you feel like you were a victim under direct attack. And getting state actors on-board to fix this seems like it is never going to happen.
Sadly the cheap solution is to exempt the Lizard from the shitbox.
One of the things MPs asked for in the UK when it passed metadata retention laws was an exemption... for MPs. They were told it was impossible, although not with much detail. In reality it is impossible because (although it's often framed as a law about retaining specific metadata about specific individuals) the reality is the only order signed says retain Everything about Everybody, much simpler but just a little harder to sell as proportional to the supposed threat.
If the shitbox is sending garbage packets at line rate, it doesn't really matter if it's sending to you or someone else, now links in those directions are at least a lot more congested and that's likely to impact everyone else's experience.
From past experience with endpoints that got into 'infinite' packet states, if you've got any monitoring at all, you notice when you're sending line rate garbage, even if you don't monitor packet sending directly, it'll spike your CPU and reduce your throughput/increase your latency for other tasks. That was, thankfully 1:1 packet generation, but sometimes got triggered by hosts (or middleboxes) with low latency and a 10G pipe; also thankfully it was a bug on our machines, so fixing it was not too bad (actually we had first seen the problem on localhost, when it showed up again on public interfaces, having seen it before made diagnosis pretty quick).
Good, perhaps this will drive another nail into the middlebox coffin. A middlebox is just a computer networking device that transforms, inspects, filters, and manipulates traffic for purposes other than packet forwarding.
Wow. Just wow. “Rigorous QA” and “spec compliant” aren’t the first things to come to mind when I think of middlebox vendors, but this is just next level negligence.