Why We Love QUIC and HTTP/3

drewg123 · on March 24, 2019

QUIC costs something like 2x to 4x as much CPU time to serve large files or streams per byte as compared to TCP. This is because the anti-middlebox protections also mean that modern network hardware and software offloads that greatly reduce CPU time cannot work with QUIC. When combined with the fact that QUIC is userspace, that's just deadly for performance. I'm talking about TSO, LRO (aka GRO), kTLS, and kTLS + hw encryption.

Let's compare a 100MB file served via TCP to the same file served via QUIC.

  TCP:
  - web server sends 2MB at a time, 50x times, via async sendfile (50 syscalls & kqueue notifications)
  - kernel reads data from disk, and encrypts.  The data is read once and written once by KTLS in the kernel.
  - TCP sends data to the NIC in large-sh chunks 1.5k to 64k at a time, lets say an average of 16k.  So the network stack runs 6250 times to transmit.
  - The client acks every other frame, so that's 33333 acks.  Let's say they are collapsed 2:1 by LRO, so the TCP stack runs 16,666 times to process acks

  QUIC:
  - web server mmaps or read()'s the file and encrypts it in userspace and sends it 1500b at a time (1 extra memory copy & 66,666 system calls)
  - UDP stack runs 66,666 to send data
  - UDP stack runs 33,333 number of times to receive QUIC acks (no idea what the aggregation is, lets say 2:1) 
  - kernel wakes up web server to process QUIC acks 33,333 times.

So for QUIC we have:

  - 4x as many network stack traversals due to the lack of TSO/LRO.
  - 1000x as many system calls, due to doing all the packet handing in userspace
  - at least one more data copy (kernel -> user) due to data handling in userspace.

Some of these can be solved, by either moving QUIC into the kernel, or by using a DPDK-like userspace networking solution. However, the lack of TSO/LRO even by itself is a killer for performance.

Disclaimer: I work on CDN performance. We've served 90Gb/s with a 12-core Xeon-D. To serve the same amount of traffic with QUIC, you'd probably need multiple Xeon Gold CPUS. I guess that Google can afford this.

toast0 · on March 24, 2019

In addition to those downsides, the QUIC spec points out middleboxes tend to time out UDP streams pretty agressively, so it recommends a ping timer of 10 seconds.

Additionally, since QUIC streams allow for client IP mobility, that creates an additional challenge for IP level load balancing as well as handling at the host level. In a well configured host, TCP packets for a given stream will always arrive at the same nic queue, on the same CPU, allowing the TCP data structure to be local to that CPU and avoid cross-cpu locks. In QUIC, the next packet can come from a new IP, which could be ECMP routed to a different host, or arrive on a different NIC queue and a different CPU. Perhaps, your ECMP router and NIC can be taught to look for the QUIC connection IDs, but that doesn't seem at all certain.

Misdicorl · on March 24, 2019

That's not really a fair comparison. In the case that the IP changes for quic, tcp would have to completely re-establish the connection. A cross core memory access is tiny in comparison.

geuszb · on March 24, 2019

Thanks for sharing the insight... This being HN, it doesn't necessarily read to me like a disadvantage for QUIC the protocol as much as an opportunity for someone to come up with a way to do hardware-assisted QUIC in the networking interface...

codetrotter · on March 24, 2019

My first thought as well. So much so that I fully expect that we are going to see multiple companies pop up in the coming next few years that will take a shot a making said hardware.

the8472 · on March 24, 2019

> and sends it 1500b at a time

sendmmsg (or the upcoming io_uring) let you send multiple UDP packets with a single syscall.

codys · on March 24, 2019

While this is useful, I don't think it would completely resolve the noted "tons of send syscalls" issue. QUIC performs it's own flow control and I don't think it can just send all the packers composing a file at once (all the time, at least)

the8472 · on March 24, 2019

If your server handles many connections simultanously you can still bundle a lot of packets in a single sendmmsg syscall, it can dispatch to a different destination address for each packet.

drewg123 · on March 24, 2019

But I think each of these UDP packets will still travel separately from the syscall layer to the NIC (eg, no TSO). So you're still a factor of 40 or so behind TCP + TSO

Matthias247 · on March 24, 2019

I think in general I agree. However the overhead numbers are exaggerated, and we should be fair with that. E.g. it was already mentioned that multiple UDP packets can be transmitted via a single syscall, and reasonable implementations can make use of it. I haven't read the Quic spec (yet), so I don't know how much data can be aggregated without waiting for ACKs or interleaving other data - but if it's anything comparable to HTTP/2 then it should be configurable and support >= 64kB chunks.

I also don't think a QUIC server would read the whole file into user-space at once - that's just a giant memory waste. Rather it would be streamed and chunks would get encrypted. That process requires of course an extra copy (likely even two for the unencrypted and encrypted version), but that's the same for all user-space file serving and encryption options and nothing new due to QUIC. For KTLS it would need to get investigated whether the kernel solution doesn't also perform a copy somewhere (I honestly don't know).

drewg123 · on March 24, 2019

Of course it is not going to read the entire file at once.

Having written the FreeBSD kernel TLS, I can assure you that there is no copy. Data is brought into the kernel via DMA from storage into a page in the VM page cache. When the IO is done, it is then encrypted into an connection-private page. That page is then sent and DMA'ed on the network adapter. So we have in the kernel tls case:

  - memory DMA to kernel mem from storage.
  - memory READ from kernel mem to read plaintext for crypto
  - memory write to another chunk of kernel mem to write encrypted data
  - memory DMA from kernel mem to NIC

In the case where the NIC supports inline TLS offload, the middle 2 steps are skipped, and it devolves to essentially the unencrypted case.

For QUIC you have:

  - memory DMA to kernel mem from storage
  - memory read from kernel mem via mmap
  - memory write to userspace mem to write encrypted data
  - memory read from userspace mem to copy to kernel
  - memory write to kernel mem
  - memory DMA from kernel mem to NIC

So you go from 3 "copies" to 4 "copies", which increases memory bandwidth demands by 33%.

Right now, we can just barely serve 100g from a Xeon-D because Intel limited the memory bandwidth to DDR4-2400. At an effective bandwidth limit of 60GB/sec, that's on the edge of being able to handle the kernel TLS data path. So even if everything else about QUIC was free, this extra memory copy from userspace would cut bandwidth by a third.

Matthias247 · on March 25, 2019

Good to know. Thanks for the explanation and all the insights!

lclarkmichalek · on March 24, 2019

What does the TLS vs QUIC look like though?

mjoras · on March 25, 2019

There's no reason the offloads can't work with QUIC. Linux already has UDP GSO (https://lwn.net/Articles/752184/). There's no technical reason I can think of that kTLS cannot be implemented for UDP on Linux, it's just not there today.

There are also more general efforts underway on Linux to reduce the system call and copying overhead of processing packets in userspace. TPACKET_V3 is an easy way to vastly increase the scalability of UDP recv processing with minimal application changes. AF_XDP is much more extreme, but it is going to be more implementable than the older DPDK-style semantics. It effectively will put packet buffer management into userspace with the transport. But once you're doing that have recaptured much of the advantages that TCP has by running in the kernel.

irq-1 · on March 24, 2019

Two questions: can't large files continue to be served on HTTP2? and won't https://www.dpdk.org/ allow user-space network stacks to do segmentation, etc...? (Maybe it's too immature?)

Matthias247 · on March 24, 2019

Regarding the first question:

Even HTTP/2 involves some of the issues the parent mentions. HTTP/2 is not really helpful for large files, and might likely perform worse than HTTP/1.1 due to the additional insertion and parsing of control flow headers. HTTP/2 helps small files most, by avoiding the overhead of connection establishment for those.

vfaronov · on March 24, 2019

How much of this applies to 1~100KB responses?

drewg123 · on March 24, 2019

1k, not so much since there is no aggregation that can happen there anyway.

100k is not that much different than 100mb, except the TCP window will not be as far open, so TSO will not be as effective.

Note that I work on a CDN that serves large media files, so I'm biased towards that workload.

teacpde · on March 24, 2019

Awesome analysis. This is first time that I read about the downsides of QUIC, curious that whether implementing it in userspace was a conscious trade off knowing the performance downside or Google/IETF wasn’t aware of the problem at all?

mcguire · on March 24, 2019

Implementing new transport protocols in kernelspace has significant downsides for adoption. In fact, it has been a long time since anyone tried it.

simfoo · on March 24, 2019

Wouldn't pretty much all of that overhead compared to TCP vanish if QUIC was implemented in the kernel?

drewg123 · on March 24, 2019

No, due to the lack of TSO/LRO. Its my understanding that QUIC is designed to encrypt packet medata so that middle boxes cannot re-segment traffic. This same feature prevents NICs from doing TSO.

simfoo · on March 24, 2019

Ok thanks, that makes sense. For anyone else wondering what TSO is, see https://en.wikipedia.org/wiki/Large_send_offload

But again, couldn't there be NICs with offloading QUIC capabilities? Maybe this could even be done with firmware updates (I don't know how much of the TCP offloading is done in real hardware)

londons_explore · on March 24, 2019

If the NIC is given the key for the connection, it can do the segmenting, encrypting and retransmissions.

manigandham · on March 24, 2019

This is just normal technological progress. CPU time is cheap and scalable, and the protocol will keep getting more optimized with better software and hardware. Similar issues were brought up with HTTP2 using TLS everywhere and messing with proxies but that's no longer a problem.

QUIC/HTTP3 as a protocol is a great improvement to actual internet performance for users which is what really matters.

kev009 · on March 24, 2019

Picking your comment as the newest instance but this is one of the dumbest memes I see in this thread.

Things don't automatically get better. It is hard work, it sucks, and it's not for everyone. It will take years to undo the damage of this transition. We will still be working on it in a decade. There are some very subtle gains like HOL-blocking. I'm not convinced that outweighs current actualized improvements in TCP congestion control (BBR), and for any application I can think of the places that really need something message-oriented seem better covered by WebRTC.

What you are really talking about is Full Employment Theorem.

manigandham · on March 24, 2019

Yes, progress obviously takes effort. What part is a "meme"? Leave that nonsense out of HN.

What "damage" are you talking about? The only issues are compatibility and increased resource utilization on the server-side, both of which will get better as usage increases. It's not a problem. We go through these cycles all the time with all kinds of technology and there's nothing special here.

scurvy · on March 25, 2019

It's thinking like that which leads to web page bloat. CPU resources aren't free, especially in an environmental capacity.

manigandham · on March 25, 2019

No it's not. Webpage bloat is a developer issue, not a technical problem.

QUIC is a new protocol is to make user experiences better. There's a tradeoff in more server CPU but that's cheaper, more scalable, and will only be short-term as things quickly improve. The actual comparison would be rendering engines and Javascript runtimes that have become more complicated to build and run but are faster and more functional in return.

Nobody would return back to the 2010 tech days just because some people decided to make fat websites.

seanwilson · on March 24, 2019

> To serve the same amount of traffic with QUIC, you'd probably need multiple Xeon Gold CPUS. I guess that Google can afford this.

Can you explain more about how the negatives you mention weigh up against the positives? There isn't a net benefit somewhere? If not, can something be changed to give a better balance like a hybrid solution?

windexh8er · on March 24, 2019

Personally I believe that the majority of positive caters to privacy. That being said there are other positive things about IETF QUIC that will, likely, play into new functionality over time.

A good document outlining considerations can be found here: https://http3-explained.haxx.se/en/

galadran · on March 24, 2019

tl;dr - QUIC doesn't have kernel or hardware support (yet).

These aren't intrinsic problems with QUIC, they're common to all new protocols.

move-on-by · on March 24, 2019

As far as I’ve been able to determine, QUIC suffers from the same SNI data leak that existing TLS versions with TCP has. I understand that ESNI is being (or is already?) included in the TLS 1.3 spec, but it’s obviously optional at this point.

Anyways, since QUIC is being touted everywhere as being very secure:

> [QUIC] protects both the data and the transport protocol itself

It seems like missing ESNI as a required feature is a bit of a glaring omission. Does anyone have a better understanding? To me, it seems like a great opportunity to make ESNI required for HTTP/3. Much like how browsers made TLS required for HTTP/2. I would love further insights if anyone has any.

toast0 · on March 24, 2019

The ESNI spec I've seen has clients request DNS TXT record to get the public key for the encryption. My pessimistic assumption is that a majority of clients are configured to use recursive DNS servers that will be unable to serve TXT results because or network issues.

In any case, it's hard for a client to determine if a TXT record is not present, or unavailable because the authoritative server has no such record, or because something in the middle has blocked it (due to network incompetence or active malice), so if you want that to work, you're going to need to specify dns over https to a trusted third party, and de-decentralize DNS.

That said, from a brief look at the spec, the ESNI extension includes a digest of the key record, so while an observer can't directly read the SNI, given a sufficient effort to find the keys, they could correlate the digests with matching hostnames.

Reaching key agreement to exchange identity information without disclosing the identities in the clear is somewhere between really hard and impossible.

tialaramex · on March 24, 2019

> That said, from a brief look at the spec, the ESNI extension includes a digest of the key record, so while an observer can't directly read the SNI, given a sufficient effort to find the keys, they could correlate the digests with matching hostnames.

I think you've probably misunderstood what's going on here

ESNIKeys, the data structure you're talking about, isn't a key for a specific name, it's the key for the frontend server that's agreed to do ESNI and can be used for ALL names offered on that frontend server.

Whatever name you're asking about, you get the same ESNIKeys values, whether you wanted cat-photos.example.org or nazi-death-squad.example.net or boring-corporate.example.com if they are all hosted on 10.20.30.40 (or that's a TLS load balancer for perhaps different backends that don't face the outside world) they all have the same ESNIKeys.

The fact they're shared is why there's a length field. We can only safely protect names by padding them. Otherwise it doesn't take a genius to spot that this-very-long-name.subdomains-matter-too.example.com encrypt to a far larger structure than short.example. The length field says I promise all the names I'm protecting with ESNI will fit in a name structure this long, just pad the shorter ones.

The digest is in the ESNI setup because this way a server can go "Oh, you've got last week's keys somehow. No, those won't work" or equally "Those are my Cloudflare keys! We only use Cloudflare in North America, this is a European server, we do AWS here, why have you got those?". Without a digest you have no clue why this idiot client is sending you gibberish and you can't do diagnostics.

patrickmcmanus · on March 24, 2019

These are essentially seperable features - and given that QUIC is at a later stage than ESNI there is not a compelling reason to create a blocker to getting an open QUIC standardized.

QUIC uses the TLS 1.3 client hello (and its extension mechanism) for the handshake, so evolutions to TLS 1.3 like ESNI will be automatically valid in QUIC too as they come down the pike.

londons_explore · on March 24, 2019

A few people have been very vocal about not encrypting the SNI. Mostly firewall makers who obviously want to sniff it...

vbezhenar · on March 24, 2019

I have no idea how one could deploy website with encrypted SNI. A lot of companies and countries block websites. If they can't determine the website, they will block IP addresses and that will cause a lot of other websites to break. It might work for simple websites which don't share IP address with other websites (but why encrypt SNI then), but it won't work for CDNs.

bitt · on March 24, 2019

Blocking IP addresses can be very problematic though.

If I understand it correctly, I think that countries might start to block ESNI altogether. If it is not widely implemented, websites/apps using it will standout which sadly could limit its adoption. For instance, if Signal decided to use ESNI, it will probably get blocked in those countries, but this can change if big companies wanted to use it. However, I still don't know how it will work exactly.

izacus · on March 24, 2019

That sounds more like a feature than a bug.

vbezhenar · on March 24, 2019

Broken website is a bug.

felixhandte · on March 24, 2019

This is a timely post, since IETF 104 is happening this week in Prague[1]. The QUIC working group will be meeting on Tuesday and Wednesday to make progress on standardization[2].

[1] https://datatracker.ietf.org/meeting/104/agenda.html

[2] https://datatracker.ietf.org/doc/draft-ietf-quic-transport/

jabl · on March 24, 2019

What about QUIC and L4S / TCP Prague? Are people working on something equivalent for QUIC as well, or are they reimplementing TCP Reno in QUIC?

patrickmcmanus · on March 24, 2019

tl;dr; congestion control is basically pluggable.

Much like in TCP, congestion control really isnt something required for interoperation between peers. Given the userspace nature of QUIC I would expect to see a lot of iteration on this front - for good and bad. (but hopefully the bad iterates quickly).

The current drafts describe newreno is detail, but also explicitly call out the ability to run other things. I've seen reno, cubic, and bbr all run with quic and anticipate others to happen as well. That's one of the exciting things here.

scurvy · on March 24, 2019

QUIC will also usher in a new era of volumetric DDoS attacks. No longer can content providers use upstream ACLs to block udp garbage and fragments. The only option will be to use Fastly, AWS, or Cloudflare to ride out attacks.

QUIC is the tool to bring about the next phase of Internet centralization by the mega players.

lclarkmichalek · on March 24, 2019

QUIC actually requires that request packets must be larger than the responses, until a handshake has been performed, in order to prevent reflection attacks.

mcpherrinm · on March 25, 2019

The previous post's point is not that QUIC can be used for reflection attacks.

It is that it uses UDP, which means UDP cannot be blocked by simple ACLs. Blocking all UDP is a simple technique for avoiding other protocols that allow for reflection DDoS attacks.

I haven't got experience with how effective blocking UDP is as a mechanism for avoiding DDoS, but it does sound pretty simple and easy to deploy.

skybrian · on March 24, 2019

Do you mean larger or smaller? I thought the problem was amplification.

lclarkmichalek · on March 24, 2019

Woops, fixed, thanks :)

scurvy · on March 24, 2019

Does QUIC allow for UDP fragments? If so, we're screwed since most traffic is still IPv4.

londons_explore · on March 24, 2019

It's mostly DNS amplification attacks your provider will be filtering out on UDP. They can still filter UDP port 53 to do that.

scurvy · on March 24, 2019

Fragments are still fragments. If QUIC allows for IPv4 router based fragmentation, you're still susceptible to attack. V4 frags don't carry port information (there's no header).

IPv6 is better since there is no fragmentation. Maybe QUIC/http3 should be IPv6 only?

collinf · on March 24, 2019

> These interposing network elements, called middleboxes, often unwittingly disallow changes to TCP headers and behavior, even if the server and the client are both willing.

There is nothing worse than finding out that someone not even at the company anymore decided years before to deploy some crap like this. Drives me absolutely crazy to impose stuff like this where silos in companies means transitioning involves on the order of 4-5 different "components" need to change.

marcosdumay · on March 24, 2019

Oh, modern networks are basically just a single huge middlebox with servers on one side and intra|internet on the other side.

There isn't much opportunity for people to plug random stuff between your server and the middlebox (the main middlebox would disallow it, like anything else), but there is still plenty of crappy rules everywhere and nobody knows why they exist or what they are. And you can't even call your ex-coworker and ask for help, because it's an ex-employee of the middlebox company, not yours.

mcguire · on March 24, 2019

"TCP Fast Open is a stellar example of one such modification to TCP: eight years after it was first proposed, it is still not widely deployed, largely due to middleboxes."

Anyone remember TTCP?

londons_explore · on March 24, 2019

Fast Open is a bad idea for a bunch of other reasons, mainly the client spoofing their address yet still being able to use a lot of resources on the server.

tialaramex · on March 24, 2019

Where would the client get a valid cookie from if they are "spoofing their address" ?

If they don't have a valid cookie Fast Open costs the same as regular TCP in the face of adversaries trying to DOS you. You examine the packet, it doesn't have a valid cookie, you discard it. No further work, just like ordinary TCP.

ioquatix · on March 24, 2019

aka let's just put everything in the application layer because solving it at the protocol layer is too difficult.

patrickmcmanus · on March 24, 2019

The way I look at it a lot of what we logically think of as the network layer often exists in userspace anyhow. That's the point of DPDK/snabb/netmap and other kinds of driver bypass.

The important design distinction about the layers is what element has access to what data. (e.g. routers need to see IP addresses to do their job, port numbers help kernels segment permission models, etc..) The rest of it is just about logical models, real-world workarounds, and luck...

HTTP/3 will be able to get high bandwidth, buttery responsive restarts to connections far too long idle to keep "open" because it integrates security, application, and transport. That's thoughtful design, not a workaround.

I'm going to paraphrase (and maybe bungle, because I don't have it at hand) from my favorite networking book of all time - the underappreciated _Network Algorithmics_ by George Varghese. He describes layers as a lovely way to model and think about a protocol or design, but often a terrible way to build one. I've spent a lot of time thinking about that, and I think QUIC gets it right - the layers are clear in how they inter-relate but they do so without being independent.

cdmckay · on March 24, 2019

The article mentions gives an example of why it’s difficult to improve TCP further.

TCP Fast Open was standardized 8 years ago and is barely used. This is because updating TCP requires kernel updates, which just isn’t going to happen on most mobile devices.

Thus, moving the protocol to userspace makes a lot of sense.

zamadatix · on March 24, 2019

> Thus, moving the protocol to userspace makes a lot of sense.

Raw IP sockets are accessible from the same userspace facing APIs as e.g. UDP sockets and don't require climbing up the stack. Unfortunately operating systems started to consider custom protocol implementations security risks but rather than reverse that thinking we've just continued to abstract up past it.

In reality I think "where it is implemented in code" was a small portion of QUICs design choices compared to "IPv4 NAT & external firewalling has ossified protocols" which is a similar story of "just abstact up to avoid the issues". Unfortunately in that case I don't think abstracting up isn't as permanent a solution as it was on the OS side.

bdonlan · on March 24, 2019

Raw sockets don't really allow for multiple applications to use the same custom protocol. If, for example, chrome and firefox were both running, which gets packets destined for the QUIC transport protocol? The kernel wouldn't know; without the UDP header it can't distinguish flows.

Likewise NAT devices typically support UDP flows today due to their prevalence in games, but if you introduce a new transport protocol at the IP layer, they wouldn't be able to identify which flow (and therefore which NATed endpoint) the packet is destined for.

zamadatix · on March 28, 2019

> Raw sockets don't really allow for multiple applications to use the same custom protocol. If, for example, chrome and firefox were both running, which gets packets destined for the QUIC transport protocol? The kernel wouldn't know; without the UDP header it can't distinguish flows.

In reality raw sockets work in a way that the question is the reverse of what you describe. The kernel will check 2 things: - Which raw sockets are bound to the protocol number seen in the packet - Which raw sockets have issued "connect" to the sending IP

Any and all raw sockets that match these will receive the packets. In such a sense the protocol (QUIC) needs to have some way to identify streams so that if e.g. both Chrome and Firefox browse to the same server they don't interfere with each other. QUIC innately has this functionality due to the way it implements encryption. Ideally the OS would allow a raw socket to register something akin to a BPF filter though as that would make it equally as efficient as UDP socket tracking even in the edge cases.

> Likewise NAT devices typically support UDP flows today due to their prevalence in games, but if you introduce a new transport protocol at the IP layer, they wouldn't be able to identify which flow (and therefore which NATed endpoint) the packet is destined for.

This is actually what I was referring to when I said:

> "IPv4 NAT & external firewalling has ossified protocols" which is a similar story of "just abstact up to avoid the issues"

We continue to make non choices to build up the stack rather than implement systems that are interchangeable.

vbezhenar · on March 24, 2019

Chrome and Firefox could develop standardized system service which will deliver package to a proper application. NAT is not needed in a bright IPv6 world of the future.

Though I don't know what's wrong with UDP. 8 bytes of overhead for 1450 bytes IP payload is 0.5% bandwidth. Checksum overhead should be negligible.

brozaman · on March 25, 2019

> system service which will deliver package to a proper application

That's expensive. Even if you avoid copying the packages by sharing memory between processes, there are still a lot of context switches...

daurnimator · on March 24, 2019

I think TCP fast open is a bad example for this. None of the common socket libraries that I know (never mind http libraries) have gained support for TCP fast open yet.

Avamander · on March 24, 2019

It's a bad example because it's badly adopted?

daurnimator · on March 24, 2019

It's a bad example of middle-boxes causing ossification, as the adoption has been more limited by library/framework support than middle-boxes blocking it.

jyounker · on March 24, 2019

Chicken and egg problem. It doesn't tend to work reliably because of middle boxes so there is no push to implement it widely. It is not widely implemented so there is little pressure to update middle boxes.

kijin · on March 24, 2019

That doesn't sound like a particularly convincing reason. In order for mobile devices to benefit from HTTP/3, commonly used HTTP client libraries will have to be updated. Which usually happens on a similar timescale as kernel updates anyway.

lclarkmichalek · on March 24, 2019

Much easier to update an app and its HTTP library than mobile device kernels. Particularly given how a large proportion of mobile devices are unsupported and won't get kernel upgrades any more.

michaelt · on March 24, 2019

The thing is, there are two possible extremes:

1. Our design is complete, error-free and designed to stand the tests of time. It will be in common use, largely unchanged, in 40 years time. The TCP of the 2010s. There is widespread industry support. We want to move to the application layer to ease the initial roll-out.

2. Our design will need to change every few years, even we authors don't think it's finished. This is a Google-only project and most vendors are refusing to support it as they think it's badly designed crap. The ActiveX of the 2010s. We want to move to the application layer so we can force it through without anyone else's support.

Where are we on the spectrum between those two options? I don't know.

chrisweekly · on March 24, 2019

Great question!

cdmckay · on March 24, 2019

An app can control which HTTP library it’s using and even bypass the built in one on the mobile device. That’s not possible in the case of TCP.

So no, it’s not the same.

Another issue they mention in the article is middleboxes, which basically will never be upgraded, and will never support new TCP features.

josteink · on March 24, 2019

Nor will they support new HTTP-features.

vbezhenar · on March 24, 2019

Some people implement network stack in user-space. Abstractions are good, but they incur performance penalty or other restrictions and sometimes you need to remove those abstractions. I guess, web is too important today, so optimizing it might worth it, even if that requires unconventional measures.

ralphm · on March 24, 2019

Is this not a valid approach then? The issues of ossification and strict allowance for just known protocols appear to be big enough to cause things like SCTP to not have a viable, widespread use in their future.

yardstick · on March 24, 2019

I believe that we will see ossification of QUIC eventually too. TCP has been around for decades, anything around that long is going to have issues rolling out new changes in a backwards compatible way. TLS 1.3 and the lengths it had to go to with backwards compatibility with middleboxes is another good example.

I hope that QUIC has used these lessons from TCP and TLS to make changes in the future as easy and effective as possible, but I’m sure it’ll still have its limits.

agrover · on March 24, 2019

The IETF QUIC working is well-aware of this, and is attempting to save design room for future QUIC versions as much as possible. The "QUIC invariants" spec documents everything that is guaranteed not to change, but other than than everything in a future QUICv2 could be updated (e.g. tls version, features, large parts of packet header layout).

yardstick · on March 30, 2019

I fear that unless they are already actively using different values for those updatable values that middleboxes will implement a de facto version of QUIC that “just works” with what is out now, but no regard to gracefully handling forwards-compatibility. Example is the SSL version field which has become static because many implementations didn’t handle an unknown value gracefully.

blackflame7000 · on March 24, 2019

It runs over UDP so they did go back to the protocol layer when designing a solution. Thats the whole point is to fix the tcp halt and retransmit delay when a packet gets lost.

josteink · on March 24, 2019

So it will be firewalled by most corporate firewalls then?

Hardly sounds useful.

blackflame7000 · on March 24, 2019

Why would it get firewalled? DNS runs over UDP on port 53

carnagii · on March 24, 2019

It is not necessarily bad. Moving complex stuff to user space can be good sometimes. There are things this would break like splice but also allows much tighter integration with applications because you can get and set state without system calls. As you layer on more complex app level adaptations to connection failures having it in user space means the client has the same capabilities irregardless of platform which is important so that both client and server can make mutual complimentary adaptations to the same conditions. This stuff is not really needed for web, it is mostly video conferencing type stuff that gets the biggest benefit. Googs new game service for instance.

tyingq · on March 24, 2019

Hopefully vendors like Forcepoint are trying to keep up. The first rollout of QUIC worked terribly in a lot of corporate environments because these MITM content filtering solutions didn't pay attention.

zzzcpan · on March 24, 2019

What reason MITM solutions have to support more client protocols that terminate on a local network?

tyingq · on March 24, 2019

If it just drops QUIC requests, then the browser has to do 2 parallel requests, one HTTP, one QUIC, and pick the winner. I believe that's what Chrome does now.

zzzcpan · on March 24, 2019

So, no reason then. That overhead is like microseconds and you can disable it if it bothers you (you have to configure web browsers for MITM anyway).

I actually disable QUIC myself, because I've noticed it slows everything down too much on some home routers.

windexh8er · on March 24, 2019

> I actually disable QUIC myself, because I've noticed it slows everything down too much on some home routers.

I'm curious how you measured and came to this conclusion since the design and most all metrics claim the opposite? And are you sure it isn't a bufferbloat issue rather than QUIC?

zzzcpan · on March 24, 2019

I was just browsing websites that were loading unusually slowly, so did the usual ping/mtr to investigate, which pointed to the router. From there and a bit of tcpdumping the cause turned out to be UDP traffic to Google from another person, who was watching videos I think.

the8472 · on March 24, 2019

your router shouldn't introduce noticable latency. If it does it either has a weak CPU or network queues that are too large. In the former case you need to upgrade hardware, in the latter you need a firmware that supports CAKE.

londons_explore · on March 24, 2019

Some routers have hardware NAT for TCP, but use the CPU for UDP.

Also, some routers prioritize all UDP packets because they treat them as VoIP. Then all TCP suffers.

Some service providers traffic shape UDP because uTorrent uses it (UTP) on random port numbers.

the8472 · on March 24, 2019

If the router can't handle a scenario where you replace all TCP with UDP then it's a cheap plastic toy in my eyes. Don't blame the protocol, blame the router. Sending MTU-sized UDP flows to a dozen targets at most is not even the most extreme, non-malicious scenario you can encounter on networks.

Similar arguments go for the prioritization.

londons_explore · on March 24, 2019

Quic measures delay so won't fill big buffers.

tyingq · on March 24, 2019

No reason for an MITM to try and support HTTP/3 on a reasonable timeline, and avoid making their customers figure it out on their own? Er, okay.

Edit: Ahh...read some of your other posts. You aren't a fan of HTTP/3, which puts your comment in context for me.

windexh8er · on March 24, 2019

It will be interesting because it's a problem for all middleboxes that do any sort of deep packet inspection. Most of the devices that fall into this category today leverage many performance gains made by the assumption that 1) the majority of network layer traffic is TCP and 2) they have access to certain levels of metadata for free.

Things are changing and getting a lot more difficult with HTTP/3 (IETF QUIC) and TLS1.3. Many vendors are claiming TLS1.3 support today, but the interesting thing is nobody is talking about the dismal performance implications it has on packet processing. With TLS1.3 and without HTTP/3 all sessions must use PFS for transform selection. And on top of it with 0RTT if a client gets to the server before the middlebox does - then, I believe, it becomes a failure scenario at the end user experience. Security vendors like Fortinet, Forcepoint, Palo Alto Networks and Cisco are all up against the wall long term. Consider they sell these devices for millions of dollars per device in larger variations. Now we're moving back to taking a device that claims tens to hundreds of gigabits of deep packet processing to, hundreds? They won't share the performance impact with customers - because that will impact financials, which will flow down to stock price, etc, etc. I feel as though companies that bank on the middlebox (ie NGFW) know of the impending apocalypse but are choosing, collectively, to stay quiet. Cisco did have an article that indirectly admitted this but only in context of TLS1.3 and not HTTP/3 [0].

What is the general consensus of others as we see HTTP/3 gain popularity? None of the aforementioned vendors do MitM decrypt with Google properties riding Google QUIC today, as ultimately they can't. The "security" coverage then moves to software / endpoints to pick up the pieces (where plaintext traffic is still available). But in the meantime I feel like the consumer of these products is being told nothing for the sole upside of financials. I used to be a huge proponent of NGFW and the visibility they brought. However I feel as though those devices now give a very high false sense of security as they are only able to catch very low hanging fruit and are simple to bypass [1]. I'm curious what the collective here thinks about the futures of hardware network security, and with that even SaaS based (eg ZScaler).

TL;DR If you're a CISO/CSO is it now a fools errand to continue to invest money in middleboxes with the strong stronger crypto enforcement on the horizon?

[0] https://blogs.cisco.com/security/tls-1-3-and-forward-secrecy... [1] https://http-evader.semantic-gap.de/

tyingq · on March 24, 2019

"you're a CISO/CSO is it now a fools errand to continue to invest money in middleboxes with the strong stronger crypto enforcement on the horizon?"

I wouldn't mind a progression in http transports that made corporate MITM unworkable. In the past, though, some kind of crappy loophole always makes it possible.

tialaramex · on March 24, 2019

In the cases I've noticed the middlebox vendor claims TLS 1.3 only meaning that now their product isn't critically insecure in the face of TLS 1.3. It can't actually speak TLS 1.3 it just knows to say "Sorry, TLS 1.2 only" without breaking everything.

In my country we had many televisions labelled HD Ready when HD television first became available. Were these actually ready to play HD television? Er, no. They could however tolerate existing in a world with HD while not being HD themselves and this was what they marketed as "HD Ready".

Do you have examples where they actually do TLS 1.3?

tyingq · on March 24, 2019

Good article on that topic, "TLS 1.3 and Proxies": https://www.imperialviolet.org/2018/03/10/tls13.html

HN discussion here: https://news.ycombinator.com/item?id=16564935

tialaramex · on March 24, 2019

Months after that post, at least two famous brand middleboxes were found to be incompatible with the finished TLS 1.3 because somebody cut corners as follows:

The specification says: YOU must choose RANDOM numbers otherwise bad things could happen.

[ TLS 1.3 final hides a downgrade signal in those random numbers if you appear to only speak TLS 1.2. The TLS 1.2 specification says nothing about a downgrade signal, so if you recognise the signal that means you wanted TLS 1.3 but the server has been told you wanted TLS 1.2, a downgrade attack is being attempted. Abort! ]

These famous brand middleboxes were too lazy to make random numbers, they'd just take the exact numbers the real server picked and use those. Those are random right? What could go wrong?

The result was that the TLS 1.2 Downgrade signal would get copied into supposedly "fresh" TLS 1.2 connections and trip the abort mechanism.

Just an incompatibility right? Nope. For the years that this idiocy was in those products they weren't actually delivering security, the requirement that you pick RANDOM numbers is there for a good reason - if sophisticated bad guys knew this "bug" was present in the famous brand middleboxes they could definitely have exploited this to snoop connections.

windexh8er · on March 24, 2019

I haven't found any real implementations to test (I'd like to). But it seems Fortinet is making bold claims that, on the surface, feel like lip service thus far:

https://www.fortinet.com/blog/business-and-technology/tls-is...

Keep in mind this is just TLS1.3 and they make no claims around IETF QUIC / HTTP/3.

tialaramex · on March 24, 2019

They say

> The good news for Fortinet customers is FortiOS 6.2 fully supports TLS 1.3 for effective and high-performance MITM inspection.

and in contrast

> The latest version of FortiOS 6.0 not only fully supports TLS 1.2 MITM, but it also does not break TLS 1.3 when it has to negotiate down to TLS 1.2.

[ The "break TLS 1.3" they're talking about is the phenomenon I described in a cousin post in this thread, several of their competitors screwed up here ]

So that suggests that in Fortinet's case products running their 6.2 release (it's unclear to me if this is merely in Beta or actually a finished product) will actually do TLS 1.3. It's sad that they feel they can boast about the earlier 6.0 product actually working correctly (compatibility with TLS 1.3 by downgrading to TLS 1.2 is literally how everything would work if you just correctly implemented the specification, yet) when it was released in 2018, many years after the TLS 1.2 specification was finalised and in wide use.

Still you know "Better late than never".

windexh8er · on March 25, 2019

I read the article as they were fully supporting it through downgrade. Then again, as you noted, 6.2 doesn't appear to be out... So more lip service by these network security vendors. I think the most interesting marketing aspect of this is they never claim the methods they often use to meet performance claims is by weakening overall security. Hypocrisy at it's best.

Improvotter · on March 24, 2019

I'm currently working with HTTP/2 (more specifically HAS with HTTP/2 Server Push) and it's just a huge pain to find a high-level library that can help with this. I fear that it'll take even longer for HTTP/3 to be adopted or HTTP/2 might just be skipped altogether. Why are there so many server-side implementations available for a variety of languages though many still lack some features or a client-side implementation altogether?

vbezhenar · on March 24, 2019

You can implement HTTP with a few hundreds LoC. It's an extremely simple protocol. TLS is not simple, but it's independent of HTTP, so you can use a separate implementation. HTTP/2 seems much harder.

Matthias247 · on March 25, 2019

The short answer: It's hard!

I implemented a HTTP/2 library for .NET (https://github.com/Matthias247/http2dotnet). It took quite a lot of time and dedication to get it spec conformant. I doubt that most employers (apart from some CDN) would have allowed me spending the time to get it to that level. And yet it still has lots of potential for improvement.

HTTP/3 might be even harder (I haven't read the spec yet, but the whole UDP assembly and inclusion of encryption sounds more complicated).

Compared to that building a small HTTP/1.1 library or a framework around it is much more approachable and might be also more rewarding.

ignoramous · on March 24, 2019

/offtopic

Good folks at fastly, I hope you're reading... I've been waiting very patiently for part 3 of this series for a good part of 2yrs now: https://www.fastly.com/blog/building-and-scaling-fastly-netw...

ex3ndr · on March 24, 2019

Are there ready to use mobile libraries for QUIC?

charleslmunger · on March 24, 2019

https://developer.android.com/guide/topics/connectivity/cron...

cagenut · on March 24, 2019

so the arista's are gonna be able to ecmp on it?

wmf · on March 24, 2019

It still has UDP headers with port numbers that can be used for ECMP. https://tools.ietf.org/id/draft-ietf-quic-manageability-00.h...

http333 · on March 24, 2019

It seems that QUIC is a new transfer protocol created to replace TCP. QUIC uses UDP and LTS/3 and solves the head of line problem addressed in HTTP2. Furthermore, sending the data encrypted allows QUIC to begin transfering data earlier. An experiment by google shows that in connections with high latency or loss QUIC gives a 15% reduction in highest latency.