I'm starting to think that TCP is past its prime and UDP based protocols (like QUIC) are the way forward.
The problem is exacerbated by TLS encryption. It requires six packets, or three round trip times, to establish a TLS over TCP connection. That is hundreds of milliseconds before a single byte of payload can be transferred.
The QUIC protocol combines the three way handshake and the three way key exchange to half the time I takes to set up a connection and provides a fast start (0rtt) for resuming a previously configured connection.
Additionally TCP connections do not survive a changed IP address like when changing from wifi to mobile data.
The stream model of TCP with head of the line blocking is conceptually easy but ultimately the wrong model for transmitting blobs of data. Rejecting a valid packet because the order got shuffled is wasteful, when we could just as easily reassemble the packets on the receiver side (even outside the sliding window).
There are a lot of times when the stream model is a good choice (dynamically generated data) but I would argue that sending sized blobs (like images) is a larger volume of traffic in terms of bytes transferred.
But the OP is right, writing custom UDP protocol is fraught with peril. DDoS amplification is just one way to screw up.
For multihoming, MPTCP is an interesting alternative and it's full of interesting features. Once it's more spread out it'd be interesting to know which features are really used.
I haven't seen SCTP mentionned in here so I'll just say that learning and using it, first in trying to find 'a better UDP' and then in anger to solve a multihoming/high-availability problem (which ended up in TCP + checkpoint/restore anyway...) was an eye opening experience on 'what could have been', especially having it work over UDP.
Now QUIC is there, gathering a big chunk of the SCTP featureset and adding proper encryption support (where IIRC SCTP only had TLS per stream).
WebRTC uses SCTP over UDP, so there are Free implementations out there you can grab if you want to give it a try. It's a shame that the Internet has ossified to the point that we'll never have any protocols riding directly on top of IP besides TCP, UDP, and ICMP, but tunneling SCTP over UDP doesn't add ridiculous overhead.
Protocols that aren't HTTPS (like SMTP) are still alive. The reason for no new protocols is that some network infrastructure will consider non-TCP/UDP/ICMP packets badly formed and reject them.
Lol. Good one. These protocols are completely independent of the content that you can send over them.
For the most part they reuse TLS so the encryption is the same (and uses the same certificate authorities) so any man-in-the-middle proxies, like those used by corporations, are technically able to intercept them to monitor content just like they can for https. (Though personally those sorts of things make me feel ill )
Good luck banning these protocols as well. They're perfectly valid IP packets so identifying them is going to be hard.
I don't think skipping the handshake is much of a benefit to QUIC. TCP was designed to have long-lived connections. Mobile networks are moving toward keeping your external IP consistent between handoffs, handling the routing internally. Fixed networks will connect to servers once every couple of minutes at most. QUIC mostly affects time-to-first-paint, which is mostly of interest for the slowest connections, which is nice, but becoming less and less relevant over time. It's not like speeding up the connectivity time is going to matter when all gains in connectivity time will be used for additional advertising and tracking against the end-user.
The main value propositions of QUIC for me are in the encryption earlier in the connection and in the reduced infrastructure usage on the server side.
> QUIC mostly affects time-to-first-paint, which is mostly of interest for the slowest connections, which is nice, but becoming less and less relevant over time.
You give no reasoning for it being less and less relevant. It is very relevant to me, where every connection to the US has 300ms RTT.
And assuming that's because you're on another continent, that problem is intractable because it's a consequence of the speed of light. You're never going to have a 10ms round trip between Los Angeles and Australia.
You really didn't understand the point they were making. 300ms rtt is much less bad when quic merges handshake and key exchange, requiring less round trips even before 0rtt session resumption
Of course it is, that's the point. You need to reduce the number of round trips because physics doesn't allow you to reduce the round trip time no matter how sophisticated your transistors.
I mean, yes, it is but many (not all) protocols built on top of it are generally still “connection-oriented”. QUIC, for example, still results in a “connection” it’s just implemented at the application layer instead of as part of the UDP protocol itself.
That still isn't even comparable. The routers along the path of a UDP packet may not even be tracking the flow (unlike in TCP), and keeping it up. A "connection" in TCP is a very big beast, with routers and computers tracking them (conntrack and friends) to keep them alive (or not) and ensure everything runs smoothly. There is none of that with UDP. So yeah, there might be an application-level concept of a connection, but that isn't guaranteed by a single bit of infrastructure on the internet.
If the equipment does connection tracking for TCP, then it most certainly does for UDP. Raw UDP does not have the notion of stream, but it most certainly does have flows. If the equipment does it for TCP, id does it for UDP.
However most routers don't do connection tracking at all. That is way too expensive at scale for zero benefits. Take your transit provider, it has tens of terabits of data per second, and those can be handled by a single router. That's a buttload of different stream for a single device, and connection tracking is expensive to do. I can say for experience that connection tracking tables for a single hypervisor with VMs can reach gigabytes. For a single node. Now extrapolate that to a backbone router. It's not happening.
Internet routers at large are not doing connection tracking.
If a device is dropping the ball on connection tracking, it's almost certainly in your own network.
Your ISP may have carrier-grade NAT. The content provider network probably has stateful load balancers. Those devices need to keep state, but they would need to do so with QUIC and UDP quasi-connections too.
Most (if not all) ISP routers (outside your business/house) employ connection tracking. This is used to monitor bandwidth usage, implement firewalls (aka, port 25/80/443), among other things.
Yeah, 100% agree. The main place that connection tracking is happening is whenever there's NAT involved and that's going to need to happen for both TCP and UDP.
> UDP based protocols (like QUIC) are the way forward.
Sure, but QUIC was written by experts who probably had been burned in the past by thinking they could just throw away TCP and profit. Things like slow start and congestion control are there for really good reasons and you can't just casually toss them away like anachronisms. Modern networks are really big, but ultimately things like switch buffers are still finite.
Rolling your own UDP based protocol is kind of like rolling your own cryptography.
Amazon in 2001 was using lots of UDP on top of Tibco Rendezvous when I got there and it absolutely melted the fuck down in Christmas of 2003. All of the FAANG-class companies have probably rediscovered this kind of lesson independently, maybe multiple times. I certainly remember chuckling at some Facebook press release in ~2007 when they were bragging about switching to UDP and wonder how that turned out.
TCP is overkill for the vast bulk of modern data transmission by volume; "short hops" within and between data centres on backbones that have near zero error.
TCP was designed for higher error rates on less reliable paths; from the clustered data centres out to the more remote home computers today is likely better than 1980s internet.
There are good proposals for low error high bandwidth TCP-lite replacements that are gaining ground.
I'm not sure I'd want to switch to a pure UDP world just because there's lots of bandwidth in the data center. Any situation where Moore's Law works against you will run into problems unless there's back-pressure built into the protocol -- IE whenever you've got 100 clients talking to one aggregation point you'll get kicked in the teeth, regardless of if the endpoints are 100mb with the aggregation point being bonded 1gb links or the common media is 100gb and the aggregation point is ECMPed 400gb -- there are more of them than there is of you...
And -- building back-pressure into protocols is not trivial.... Getting it "for free" with TCP has been a net gain.
Error rates are still high. You quickly find that if you have latency sensitive information that you need to ship over the public internet, like live video.
Also, you can trade off bandwidth for error rate at a lower level of the networking stack, so if you know your applications will use error- correcting protocols like TCP, you can make your switches and routers talk to each other a little faster.
TLS does have session IDs / session tickets, which I believe can speed up session resumption.
There is also TCP fast open, but I don't think it is used in practice. I don't think I've ever noticed it used when looking at a TCP capture.
Though I'm sure either TLS session IDs or tickets are.
I can't remember if one is more common that the other, it's been a while since I worked on that stuff directly.
>The QUIC protocol combines the three way handshake and the three way key exchange to half the time I takes to set up a connection and provides a fast start (0rtt) for resuming a previously configured connection.
UDP is okay and has some problems. QUIC is not the universal solution for them.
The lifetime of that QUIC setup will only be ~90 days. Because with every QUIC implementation in existence use of self-signed certs is not enabled by default and TLS-less connections are not allowed in the standard. This hold especially true for QUIC based HTTP/3 libs. So as soon as something in the TLS breaks or drops you your QUIC based service is inaccessible to 99% of people on Earth. That's fine, good even, for Microsoft or Google. But that kind of built in fragility, transience, and corporate centralization/dependence is really bad for hosting a personal website or service. QUIC is for corporate persons usage and it's design reflects that. It's terrible for human persons and makes no concession for human person use cases.
Because human person traffic is a tiny percentage of actual internet traffic.
Where is most internet traffic? To places like Netflix or other streaming services.
Then you have the top X sites that serve most of the rest of the traffic, a significant percentage of that to cell phones and all of it needing encrypted.
You'll just have to accept no realistically secure protocol will be universal. This last tiny wedge of traffic is either going to disappear in the noise of bots in automated attacks, or we'll have to find some reasonably secure way of letting it co-exist.
It's really not that hard to setup a domain with letsencrypt. If you use dns verification it's even almost trivially easy to use it for internal hosts as well.
It is till it isn't. When ACME2 is sunset and ACME3 comes along. The next time LE's root cert changes. The next time any part of the incredibly complexity hidden by and within acme clients breaks because of an update on your system. The time you host your abortion clinic site and get sued in Texas. Or your scientific journal article mirror gets attacked by big publishers, they won't go after you when they can just put pressure on LE to drop you and you become unvistable.
Putting all your chips in one basket, even such a benevolent and great one as LE, is a bad idea. Even dot .org was almost lost to private equity and went nasty. LE as it becomes more important becomes a juicier target for all forms of pressure and less useful to human persons.
protocols like quic and srt (used for video) are great, forward error/erasure correction is something I would also mention as a large part of the rise of UDP over TCP based transfer/protocols.
> Rejecting a valid packet because the order got shuffled is wasteful,
This is not what happens. The data isn't dropped as long as there are only a few reordered segments. And with SACK the receiver can also acknowledge them. It's just not provided to the application layer because something in-between is still missing. This adds some latency if you're multiplexing multiple things over a single stream but it's not a loss in throughput.
> The stream model of TCP with head of the line blocking is conceptually easy but ultimately the wrong model for transmitting blobs of data.
A problem of google's own creation.
> The problem is exacerbated by TLS encryption. It requires six packets, or three round trip times, to establish a TLS over TCP connection. That is hundreds of milliseconds before a single byte of payload can be transferred.
Arguably the problem got exacerbated by the NSA and user-hostile ISPs. If they didn't push the entire internet into a low-trust equilibrium we could stick to simple protocols instead of having to expend more resources and increase complexity to ultimately get to the same state of not-being-intercepted as we had been before.
A better response to a red queen's race is to dethrone the queen, not to build better running shoes.
But yes, yes... these sure are nice shoes.
A couple commenters have disagreed with the author saying “That’s one reason for the TCP three-way handshake”. But I’ve been implementing uTP, the UDP protocol that torrents use (https://www.bittorrent.org/beps/bep_0029.html) and it seems to avoid amplification attacks thanks to a three-way handshake.
So the author seems correct to say it’s one (good) reason for the three-way handshake.
(It turns out when you want to spider libgen for all epubs for training data, the most straightforward way to do this, apart from downloading >40TB, is to write your own torrent client that selectively recognizes epub files and fetches only those.)
The uTP spec (BEP 29, linked above) also gives a good overview of why UDP is sometimes the correct choice over TCP in modern times. uTP automatically throttles itself to yield excess bandwidth to any TCP connection that wants it. Imagine trying to write an app that uses all the bandwidth on your network, without impacting anyone else on the network. You’d find it quite hard. uTP does it by embedding time deltas in the packets and throttling itself whenever the timestamp goes over ~100ms, which indicates either a connection dropout or bandwidth saturation.
I.e. if your ping suddenly spikes, it’s because someone is hogging all the bandwidth. Normally you have to track down who’s doing it, like a detective hunting a murderer. But uTP knows that it’s the murderer, so it throttles itself back. Presto, perfect bandwidth utilization.
But why bother with UDP? Why can’t you do this with a TCP connection? Just measure the time deltas and throttle yourself if it spikes, right? Good question, and I don’t have a good answer. Perhaps one of you can give a persuasive one, lest you agree that ISPs should just throttle UDP by design.
It’s certainly simpler to solve this at a protocol level, but one could imagine a BEP that adds time deltas to the torrent spec and prevents sending pieces too quickly (“if the deltas spike, send fewer blocks per second"). It might even be simpler than bothering to reimplement TCP over UDP. But perhaps there’s a good reason.
One idea that comes to mind is that the goal is to throttle sends and receives. You control your sends, but you’re at the mercy of the counterparty for receives. You’d need to keep throttle info for every peer, and notice when their pings spike, not just yours. Then you’d send fewer packets. But that’s what uTP does, and that doesn’t answer “Why do it in UDP instead of TCP?”
The reason for uTP using UDP is clear. When you are sending a complete file from one computer to another, you can send it in any order. With TCP the packets need to arrive in order (at least within the sliding window) or they get rejected which is wasteful.
Another reason is the custom flow control. The uTP protocol uses a scheme called LEDBAT (RFC 6817) which is a high bandwidth, high latency "scavenger" protocol that uses all the available bandwidth but tries to take a lower priority than TCP connections. This allows your web browsing to be smooth while your OS updates or Torrents are chugging along in the background. Microsoft and Apple both use a LEDBAT based UDP protocol for updates.
On the other side of the spectrum there are video games and audio/video live streams. They use UDP protocols but with a flow control that minimizes latency at the expense of bandwidth. The polar opposite of what uTP/LEDBAT does.
With TCP you get the flow control algorithm of the protocol. While there is a little configurability in the kernel, you can't adjust it from the application.
UDP protocols also need a handshake not unlike TCP three way, but you can include payload bytes or crypto key exchange in those packets to avoid TCP+TLS "six way handshake".
The LEDBAT flow control scheme is absolutely brilliant, using just some simple timestamps to evaluate network congestion. If you're looking for a nice Saturday geek reading, RFC 6817 is a good candidate.
Latency due to head-of-line blocking is a big issue with TCP. Any packet loss causes a backup of all other data while waiting for retransmit. Window negotiation would make your control spotty because you’d be layering two different/incompatible flow mechanisms
> But why bother with UDP? Why can’t you do this with a TCP connection? Just measure the time deltas and throttle yourself if it spikes, right? Good question, and I don’t have a good answer.
Well if you're not using no-delay you are not really on control of the actual TCP sending rate, the kernel might assemble or split the buffer you're send()ing.
Head-of-line blocking has nothing to do with Nagle (default delay behavior, that is disabled by every single TCP application I have ever seen). Head of line blocking is stochastic pile-up of data that happens during packet loss
I was answering the specific question of GP: "Why can’t you do this with a TCP connection? Just measure the time deltas and throttle yourself if it spikes, right" which wasn't limited to head of line blocking but more precise handling of transfer-rate of packets. Still, TCP is controlled by receiver so if you have a have on both hides you might be able to do what GP suggests, but you'd still pay the jitter of sender buffer and the noise of fragmentation of your messages.
Which might be a good thing if you're sending a bunch of tiny little packets to a consumer on wifi (cough lots of golang projects) instead of assembling/splitting the buffer yourself.
It uses UDP instead of TCP because it does not care about in-order delivery of the data, only that it eventually has all of the data.
Fragmenting and coalescing the data happens at a higher layer, so there is no reason to duplicate that effort at the transport layer (which is needed to provide a efficient in-order stream abstraction).
TCP (and other protocols) presenting a in-order stream abstraction is beneficial in a lot of ways, but it impacts the design and guarantees you can make in a way that is harmful to a application that only cares about eventual completion.
> But why bother with UDP? Why can’t you do this with a TCP connection? Just measure the time deltas and throttle yourself if it spikes, right? Good question, and I don’t have a good answer. Perhaps one of you can give a persuasive one, lest you agree that ISPs should just throttle UDP by design.
The issue is microsoft windows. It doesn't have pluggable congestion controllers. On linux you could insmod a ledbat.ko and have userspace applications choose that as congestion controller for their TCP connections.
Another reason for uTP is that nat traversal is easier with UDP than it is with TCP. Especially if the NAT gateways use endpoint-idependent mappings for UDP but address-dependent mappings for TCP.
But that's more an argument for IPv6 than for UDP.
> One idea that comes to mind is that the goal is to throttle sends and receives. You control your sends, but you’re at the mercy of the counterparty for receives. You’d need to keep throttle info for every peer, and notice when their pings spike, not just yours. Then you’d send fewer packets. But that’s what uTP does, and that doesn’t answer “Why do it in UDP instead of TCP?”
You can do this in bittorrent actually by throttling your request messages. But request pacing is tricky because userspace applications will have a difficult time estimating the receive-bottleneck-buffering across multiple TCP streams. Though maybe it could be done on linux via socket timestamping, I haven't looked into that.
BiglyBT implements a crude version of request throttling, but it's not adaptive, you have to set a download rate limit. But when it does kick in it manages to converge its requests on a few high-bandwidth/low-latency peers which should result in less long-distance traffic and shallower queues.
I suspect video games use UDP for the same reasons bidirectional real-time communications protocols (WebRTC, VoIP, etc) use them.
Generally speaking when you have an interactive session with > 1 parties (especially humans) when/if packet loss or out of order packets occur there's no reason to attempt re-transmission because the clock, people, and application continually move forward in time. Consider a VoIP call - by the time a packet is retransmitted the conversation has moved on. Most audio codecs and implementations have a default packetization interval of 20ms of audio per packet so this is pretty granular.
This has been mitigated somewhat with audio codecs like Opus that have packet loss concealment and forward error correction techniques that provide relatively high robustness to this issue. Variable latency (jitter) and out-of-order packets are typically handled by dynamic jitter buffers on each receiving side that can detect when the finely-clocked streams are wobbling. They then buffer incoming frames, potentially with re-ordering, and play them out smoothly with the side-effect being slightly higher latency to perform this (that's why they're dynamic).
This is right observation but not so right reasoning.
Loosing a few pixels or missing some object updates won't be noticable in high fps games, but where correctness and completeness of data is needed, UDP based applications end up implementing TCP features in application layer.
These application layer implementations can be more performant (low latency and/or high throughput) with UDP, utilizing the special cases, than TCP, which needs to provide generic facilities to every application running on the same host. And then it becomes a question of available resources vs benefits.
Example:
DNS started as UDP and later used TCP when expectations on correctness grew.
That said, We have high capacity and high quality backbones that UDP looks right choice for many cases, and should be used where it makes sense.
DNS supports TCP largely because of packet fragmentation issues with UDP. I don't have a lot of experience with it but from what I understand DNS attempts to switch to TCP (with varying degrees of success by implementation) when the size of the response exceeds or is expected to exceed the MTU.
Video games have a very good reason to use UDP: latency and lack of head of the line blocking. This also applies to video and audio live streams.
But it's not the only reason to use UDP. It can open connections faster, accept out of order packets and generally offers a low overhead datagrams for building custom protocols.
TCP's throttling mechanisms are quite coarse compared to what is described above, and they don't measure packet latency, only packet loss (after suitable timeout).
Yes, TCP has flow control and congestion avoidance.
But it's a one size fits all model designed for unreliable connections of yesteryear. It's a trade off between latency and throughput and not particularly good at either when network conditions get bad.
UDP requires the application protocol to implement a flow control mechanism, but offers an opportunity to emphasize latency or throughput at the expense of the other.
You can avoid being used for amplification attacks by simply requiring the request packet be at least as large as the response. Burns some extra inbound bandwidth and doesn't protect you from DoS attacks on your server, but at least avoids being used as an amplifier. Or you can implement a three-way handshake with a cookie or hash of a secret and the requestor's IP address, but just requiring large request packets is simpler if you're designing a really simple protocol.
Being used for amplification attacks isn't bad if the amplification factor is low (ie. <5), and there is no way to chain amplifiers.
You can also protect further against it by limiting the amount of un-acked data per second to each AS. Well behaved connections generally will ack nearly all data sent to them, whereas attacks will ACK none. That way, users have to wait one extra roundtrip if an attack on them is in progress - doesn't sound so bad.
Amplification was only a really bad problem when amplification factors over 100 were possible.
TCP is like automated driving or a taxi. you get in your car, tell it the destination, and it takes you there.
UDP is driving yourself. you have to adjust the speed and direction as you go. if you go to fast accidents can happen.
so obviously, when using UDP you have to add safeguards against attacks, but, you get more flexibility to adjust things along the way. you are not locked into the way TCP does it, and that is the benefit of creating a UDP based protocol instead of just using TCP.
and it is a good idea to not just use UDP blindly, but implement the necessary saveguards on top of it.
but claiming that all UDP based protocols are bad because of UDP is like saying, cars are bad because of bad drivers. (well, they are, but that's a different topic)
so UDP based protocols aren't bad. only using UDP without safeguards is bad.
This is a bit similar to roll your own crypto conversation. It is fun, and a great way to learn. And once done, you should do well to never deploy your own crypto in any real life application.
Friends don't let (or shouldn't at least) friends roll their own protocol over UDP either (or IP; I mean why not?). There are very few apps that have such an explicit requirements that they can't do away with TLS over any number (==2) of existing, and well known transport protocols (e.g. TCP, maybe QUIC). For presentation layer, instead of implementing their own spec, most should probably stick to Protobuf[0] (or some of it's derivatives[1]), and encode the application data into JSON.
i don't quite agree because there are legitimate uses for UDP, but there are no legitimate uses for hand rolled crypto (except learning). also the issues with UDP mentioned in the article can easily be mitigated, which is not at all true for issues with hand rolled crypto.
and there is one important use for UDP that is not well served with TCP or QUIC, which is tunneling. tunneling TCP over TCP does not work very well.
also, the feature of mosh to keep a connection alive over network changes may not be easy to implement over TCP, i am not sure.
"any developer building protocols on top of UDP has the responsibility to appropriately handle the various attack vectors which come from udp being "connectionless""
This also means you have to add functionality to avoid various applification attacks against you (e.g. package forging/cloning/reply variations) but also much more important against others caused by you (e.g. you mustn't provide any form of unauthenticated UDP replay server which could be used for amplification attacks, this includes protocols which through bad design can be used as such replay servers even if they are not meant to be such servers).
Note that this always stays in the territory of "appropriately".
Still this means that developing acceptable protocols on top of UDP if they are meant to be used on the internet (e.g. not just inside of an VPN) is kinda way harder then doing so on top of TCP instead of "simpler" like some of the sources the article refers to argue.
We once had to pronounce a company product dead-on-arrival due to a naive UDP based cellular product, and regrettably had to pass on the mission difficult. The problem wasn't the lousy data link (Iodine tunnels exist), but rather the ridiculous scaling-cost server side. Think about a large fleet of vehicles all trying to report at the same time. Sure one could rate-limit the bandwidth... but then the link throughput is even more probabilistic. =)
Sometimes it is better to run... rather than walk from Doomed projects. The contract-manufacturer ended up with a warehouse of products no one could fix. All thanks to some under-powered-module manufacturers naive understanding of networks. Its a shame really, the people were good folks with a viable business plan.
I have a bag of popcorn ready for the idealists rolling out HTTP/3 with QUIC into their data-centers and NOC. I remain skeptical most infrastructure can survive the traffic asymmetry reliably at scale. Hard pass for now...
There's an even simpler solution I've used ever since I became aware of the first amplification attacks during the 00s: either implement your own simple 3wh, or if you really want a simple request/reply protocol with no state associated, pad the request so it matches the reply in size. Sometimes your reply is large though and needs a couple dozen packets so then you're usually forced to use the first solution.
I suggested the certificate solution but it's far from being lightweight. I found a simpler solution that may fit some use cases.
The client sends a request, the server returns a response that may be big that also contains a random value. The client must return this random value in a thank you message.
If the server doesn't receive the thank you message, it slows down responses to that ip address and eventually blacklist it if it's repeated.
From the client perspective, the answer is obtained in one round trip time. The price to pay on the server is the need to keep track of the expected thank you messages, and the throttled or blacklisted addresses.
You’d have to keep track of which proofs you’ve already seen, to prevent re-use. You’d have to make sure the (source, source port, destination and destination port)-tuple is a cryptographic part of the algorithm, to prevent me using the same proof of work for like a million machines.
Then there’s the issue of low power and/or battery-powered devices. You want people to be able to use your protocol without draining the battery. But by lowering the computational cost to accomodate that, you’re enabling the bad guys that can afford a fast GPU to go back to abusing the protocol cause the cost isn’t sufficiently high to deter them anymore.
And I’m sure there are plenty of other things I haven’t thought of.
Yes, you can for example require the first datagram in a connection to have a cryptographically signed blob of data (also useful for key exchange). But then you need some mitigation against a simple replay attack (client sends same packet twice).
The server's first response should be much smaller in size than the client's request.
The basic idea is to make the client (attacker) use more bandwidth and CPU time than the server, at least during the connection handshake phase. This makes the server unappealing as a target for attack, even through it does not downright prevent the attack.
Most protocols already employ such schemes in connection handshake and crypto key exchange.
The problem result from the ability to forge a fake origin IP address. This can be avoided by adding a certificate for the IP address. It adds a processing and size overhead, but it also preserves the single round trip transaction.
It doesn't really work. You need to use fairly strong cryptography, and this means a lot of CPU power from your side to validate it. You also need mechanisms to notify users about certificate invalidation and expiration, you need ways to synchronize "time", etc.
At which point, it makes sense to just give up and use a normal TLS/QUIC connection. QUIC also has 0-RTT resumption, which is functionally similar to what you want.
Really, raw UDP makes very little sense in today's Internet. It might have been marginally more useful if BCP38/RFC2827 were more widely adopted.
> Really, raw UDP makes very little sense in today's Internet. It might have been marginally more useful if BCP38/RFC2827 were more widely adopted.
I might agree if the only purpose of UDP was to avoid the handshake. But this issue alone only affects some usecases.
Naive workaround/thought, require the client to pad the first packet to the point where there you can't use it for amplification attacks (not an absurd amount, just 1k or something. Of course depends on the context).
And possibly embed the source IP in the first response so that the indirection isn't as effective either.
I guess, I should have specified that I meant raw UDP for simple request-response protocols.
The other major use-case for UDP is for protocols where loss is preferable to retransmission delay, it's still very much valid. But in this case, UDP is used within a stateful context, with multi-stage handshakes and everything.
I didn't mean to generalize the use of certificate. It would be for a specific protocol for a specific application. I just wanted to justify that we are not required to use three way handshake.
Revocation is indeed a weak point of this solution as it would take time, probably a transaction, to check. This problem might be mitigated by shortening the certificate validity duration.
I don't see why time synchronization would be critical if the validity periods are slightly overlapping.
We probably don't need certificates for IP addresses,and just ensure edge routers sending packets to the internet ensure they are sending pockets with source defined as a network IP only within defined CIDRs.
There have been several plans for deploying IPSec everywhere as a evolution of IP with different key management strategies. There needs to be a way to look up a public key for a ip address, there's more than one way to do it.
In prior configuration, the UDP client must get a certificate which uses three way handshake to verify the IP address. Once a client has it's certificate, it can perform transactions with a simple two way transactions.
When NAT gets involved things get very complicated very quickly for that. For many networks and ISPs this would need to happen at the IP egress level and couldn’t happen on the end device, since the end device doesn’t even know its own IP and neither does the on-prem router.
Thank you. It's the best argument against the certificate suggestion I have read so far. It's a problem I overlooked.
Edit: If the server creates the certificate with a three way handshake, it will use the remote IP address. So the client doesn't have to know it's IP address
Clients could use the same certificate for every server, so there is only a one-time setup. Analogous to how clients need to be "configured" with an IP address, the certificate could be given to them by their internet gateway if desired.
Quic = TCP inside UDP and its great for e.g. streaming, instead of sending a bunch of Acks all the time it tells the other side which packet to resend in case of loss. Great for a buffered Video stream.
Streaming is pretty much, by definition, not latency sensitive. Getting a video stream to display in a browser or on a TV requires transferring a lot more data than saving 1 or 2 round trips can achieve.
Personally, the bigger impact on user experience is all tha Javascript crap. My banking website changed to an infinite scrolling page for transactions and the user experience was ruined as a result. Static content scrolls way faster.
> you will be exploited for amplification attacks!
Is that the gripe? Am I missing something?
Your bellybutton is not subject to these amplification attacks, it is strangers on the internet and this is an altruistic appeal: you will be exploited for amplification attacks directed at others.
What is the root cause of this vulnerability? The root cause is that anonymous[0] packets on the internet contain a destination address the packet is to be delivered to, and a source address to which responses should be directed. That might be a different construction than the average programmer internalizes. In the absence of countermeasures either address can be rewritten and this is commonly done in firewalls with e.g. NAT.
The attack is to specify a source address "on behalf of" some other party, causing replies to be delivered there. Are there other ways to validate or render packets as not anonymous? Yes, to varying degrees. One is to establish a dialogue with the source address before sending payloads; in fact that's pretty much the good which abnegates perfection.
There has been a BCP for many years that ASNs should validate that source addresses are valid.[1] Amplification is possible both in velocity and volume. Volume is when a reply packet is larger than the source packet. Velocity occurs when mulitple replies are sent. Even when no amplification occurs, attacks are mounted with e.g. ICMP. UDP amplification typically occurs when the reply packet is larger than the initiating packet. TCP amplification occurs when a SYN can elicit multiple SYN/ACKS.[3] (Both TCP and UDP are capable of eliciting ICMP backscatter.)
The simple way to avoid amplification with UDP is not to send a reply. This is done a lot. For instance streaming data over UDP with a TCP pipe for control information. A bonus for this approach is that UDP is conducive to multicast[4], for which specific address ranges are reserved.
Changing tuning parameters is an effective way to mitigate the TCP SYN/ACK problem, barring the appearance of flying monkeys.[5]
DNS may be illustrative in a number of ways. First off clients aggressively retry queries due to "happy eyeballs", so the servers constantly face an identify friend or foe operating environment. DNS only ever sends one response to a particular query. A common server (if not protocol) extension is the implementation of response rate limiting, which drops a certain portion of UDP replies and returns the rest with TC=1.[6]
Congratulations on reading this far. Now go get yourself a job as a cybersecurity analyst.[7]
[0] If you trust the source in some other fashion and mark or validate packets in some way based on that, they're not anonymous.
WebRTC generally goes through a probing process to verify two-way connectivity before sending media.
I not familiar with WebTransport.
My understanding of QUIC was that the handshake was designed so the client would pad their initial datagram such that the response was of similar size, avoiding a small request/large response pattern. And that typical connection establishment did a key exchange on the first round trip; primarily for privacy, but also to confirm two way connectivity. I could be wrong though; especially around resumption and/or connection agility, where a client moves addresses and the session moves with the client.
> WebRTC generally goes through a probing process to verify two-way connectivity before sending media.
To expand on this WebRTC is extremely complex but it certainly has one thing in common with more traditional VoIP systems - the UDP flows for media are negotiated dynamically.
You establish a signaling channel using whatever (WebRTC famously doesn't mandate a signaling protocol). This signaling channel with any authentication, verification, etc you want transports SDP (session description payload) bodies between each party (typically via a web server and websocket/datachannel/HTTP). What you end up with is a set of dynamically negotiated send/receive ports on each side and the application establishes the socket and flow between them with the receiving sockets on each side typically making sure the source IP:port pair matches what was negotiated. The listening socket on each side discards any junk that may show up from an unverified source IP:port pair.
The process you're referencing to is called ICE (Interactive Connectivity Establishment). ICE is very complex but basically what ends up happening is a series of candidates of all potential IPs, source ports, etc are evaluated for bidirectional connectivity between the parties. Once a reliable session candidate is verified the rest of the media session is established using the same source and destination IP:port pairs.
At the risk of getting too into the weeds WebRTC also does UDP port multiplexing so everything related to the session from ICE to the media, datachannel, etc are sent and received on the same socket. This is how you can have relatively high certainty you can establish a bi-directional flow between parties - because you only get this far when ICE has confirmed the IP:port flow on each side can reach the other.
WebTransport requires an established HTTP3 (and therefore QUIC) connection or an established HTTP2 (and therefore TCP) connection, so its vulnerability to reflection attacks corresponds to the vulnerability of the underlying handshakes.
QUIC requires the initial handshake packets to be at least 1200 bytes, and sets the anti-amplification limit of 3x [0]. This means that the server can typically send up to 3600 bytes in response (unless the client's handshake message exceeds one packet, which usually only happens if there is a post-quantum key share in it). 3600 bytes is usually enough, unless your certificate chain is too large, in which case you'd need to compress it. [1] is a nice overview of the problem.
QUIC explicitly mentions that it is vulnerable to amplification attacks in the RFC. It suggests sending an extra packet to mitigate, at which point I believe the advantage of QUIC is lost as far as establishing connections is concerned.
I believe you're referring to the stateless retry mechanism? It's designed to be used only when the server is actually under attack (either as an amplification vector or exhaustion of its own connection capacity). The idea is that the server establishes some threshold for pending QUIC connections, and only if that is exceeded does it start requiring clients to complete the extra stateless, SYN-cookie style roundtrip to validate their source address. So it maintains the advantage of one less roundtrip than TCP+TLS unless the server is receiving large amounts of connections that are not progressing in a timely manner.
Couldn’t most of the UDP packet spoofing be mitigated if we mandated ISPs and networking equipment to drop packets with addresses that couldn’t have possibly originated from downstream?
Most firewalls and gateway routers already do. The issue is more for internet connected services, where backbone routing can’t do that for scale/redundancy routing reasons.
I just checked Beej's Guide and there's no mention of amplification. I wouldn't be surprised if TCP/IP Illustrated doesn't teach it either. This means most people won't know about it and they won't know that they don't know.
The mitigation of requiring a "SYN" style packet to be MTU-sized sounds pretty good to me. It obviously uses a little more bandwidth but the network may be underutilized on the upstream path anyway.
> I just checked Beej's Guide and there's no mention of amplification. I wouldn't be surprised if TCP/IP Illustrated doesn't teach it either. This means most people won't know about it and they won't know that they don't know.
The issue isn't a lack of understanding of networking, it's a lack of understanding of the threat model.
> Seriously folks, if you don't already know this you shouldn't be designing any protocols. Datagram or stream-based.
I find this kind of gatekeeping distasteful. Knowing about various ways network protocols can be exploited is important, but this can be communicated without trampling curiosity.
None of this would be an issue if ISPs actually dropped UDP packets. They should run at a lower QoS by design. After all, the UDP spec explicitly calls out that it's unreliable. Yet IRL nobody drops them, ever, except during mitigation.
> None of this would be an issue if ISPs actually dropped UDP packets. They should run at a lower QoS by design. After all, the UDP spec explicitly calls out that it's unreliable. Yet IRL nobody drops them, ever, except during mitigation.
There's a difference between unreliable best-effort and "just throw out packets indiscriminatly".
> Just drop them already. It's been decades.
... Decades since what? UDP is still very much in use
The "U" doesn't stand for Unreliable. It's basically the simplest possible packetization that doesn't offer much of anything that the underlying IPv4/IPv6 layer doesn't, besides filtering out packets based on port numbers.
It's very commonly used for games and video services, where applications can recover from the odd dropped or out-of-order packet. As the sibling comment mentions, what does "It's been decades?" supposed to imply? It's also been decades since TCP was invented, are you calling for its discontinuation too?
That is not a good idea at all. For example: the new HTTP/3 over QUIC runs on top of UDP. There's a good chance most modern websites you use regularly use UDP, and most games and other real-time programs rely solely on UDP.
UDP is unreliable because the raw transport mechanisms on the internet are unreliable. TCP helps control the transport of these packets, but if you need more speed or want to implement it in a better way, then UDP is necessary.
Amplification attacks would still be a problem even if ISPs were to de-prioritize UDP traffic (for which there is exactly 0 reason, but that's another matter). They would still impact anyone talking over UDP, effectively saturating the "UDP link" even if they leave TCP unaffected. And there are some pretty important protocols people actually care about that work over UDP - DNS and RTP în particular.
Finally, in an ideal internet, ISPs would not have actually cared whether they are transporting UDP or TCP or SCTP or whatever else. The fact that it's basically impossible to send an IP packet that is not one of the blessed protocols over the Internet is already a major corruption of the stack model. Further gimping UDP to make TCP the only viable option would be crazy.
Quote from Wikipedia: "WireGuard uses only UDP, due to the potential disadvantages of TCP-over-TCP. Tunneling TCP over a TCP-based connection is known as "TCP-over-TCP", and doing so can induce a dramatic loss in transmission performance (a problem known as "TCP meltdown").
Wireguard also has a mechanism where it may choose (under high load) to respond with a cookie, which is the MAC of the requesting IP address. The sender then needs to resend his request with that cookie.
If you could put yourself in the shoes any position/s in any organisation/s necessary to implement your grand vision, what would you do? How would you do it?
This just sounds like the sort of absurd proposition that can only be made by people that don’t have a seat at the table.
The problem is exacerbated by TLS encryption. It requires six packets, or three round trip times, to establish a TLS over TCP connection. That is hundreds of milliseconds before a single byte of payload can be transferred.
The QUIC protocol combines the three way handshake and the three way key exchange to half the time I takes to set up a connection and provides a fast start (0rtt) for resuming a previously configured connection.
Additionally TCP connections do not survive a changed IP address like when changing from wifi to mobile data.
The stream model of TCP with head of the line blocking is conceptually easy but ultimately the wrong model for transmitting blobs of data. Rejecting a valid packet because the order got shuffled is wasteful, when we could just as easily reassemble the packets on the receiver side (even outside the sliding window).
There are a lot of times when the stream model is a good choice (dynamically generated data) but I would argue that sending sized blobs (like images) is a larger volume of traffic in terms of bytes transferred.
But the OP is right, writing custom UDP protocol is fraught with peril. DDoS amplification is just one way to screw up.