Hacker News

jsnell · on March 14, 2024

Here's a third option: the five month old anonymous HN account claiming to know what the H2 designers were thinking of is wrong. How would you compare the likelihood of that to your two options?

The main problem you're talking of is head of line blocking due to packet loss. But packet loss as a congestion signal is nowhere near as common as people think, and that was already the case during the original SPDY design window. End-user networks had mostly been set up to buffer rather than drop. (I know this for an absolute fact, because I spent that same time window building and deploying TCP optimization middleboxes to mobile networks. And we had to pivot hard on the technical direction due to how rare packet loss turned out to be on the typical mobile network). The real problem with networks of the time was high and variable latency, which was a major problem for H1 (due to no pipelining) even with the typical use of six concurrent connections to the same site.

Second, what you're missing is that even a marginal improvement would have been worth massive amounts of money to the company doing the protocol design and implementation. (Google knows exactly how much revenue every millisecond of extra search latency costs). So your "marginally better" cut isn't anywhere near as incisive as you think. It also cuts the other way: if SPDY really had been making those metrics worse like one would expect from your initial claims about H2 performing worse than H1, it would not have launched. It would not have mattered one bit whether the team designing the protocol wanted it deployed for selfish reasons, they would not have gotten launch approval for something costing tens or hundreds of millions in lost revenue due to worse service.

Third, you're only concerned with the downside of H2. In particular HPACK compression of requests was a really big deal given the asymmetrically slow uplink connections of the time, and fundamentally depends on multiplexing all the requests over the same connection. So then it's a tradeoff: when deciding whether to multiplex all the traffic over a single connection, is the higher level of vulnerability to packet loss (in terms of head of line blocking, impact on the TCP congestion window) worth the benefits (HPACK, only needing to do one TLS handshake)?

Aloisius · on March 14, 2024

> The main problem you're talking of is head of line blocking due to packet loss. But packet loss as a congestion signal is nowhere near as common as people think

Surely packet loss due to poor signal quality is rather common over mobile networks and that packet loss still affects TCP's congestion window.

Admittedly anecdotal, but I just connected to a 5G network with low signal strength and it certainly seems to be the case.

rstuart4133 · on March 14, 2024

> Surely packet loss due to poor signal quality is rather common over mobile networks and that packet loss still affects TCP's congestion window.

Two points:

1. It's not commonly realised the TCP is terrible on lossy networks, where terrible means gets less than 10% of the potential throughput. It only becomes apparent when you try to use TCP over a lossy network of course, and most real networks we use aren't lossy. Engineers who try to use TCP over lossy networks end up replacing it with something else. FWIW, the problem is TCP uses packet loss as a congestion signal. It handles congestion pretty well by backing off. But packet loss can also mean the packet was actually lost. The right responses in that case are to reduce the packet size and/or increase error correction, but _not_ decrease your transmission rate. Thus two responses to the same signal conflict.

2. Because of that, the layer two networks the internet uses have evolved to have really low error rates, which is why most people don't experience TCP's problems in that area. As it happens just about any sort of wireless has really high error rates, so they have to mask it. And they do, by having lots of ECC and doing their own ACK/NAKs. This might create lots of fluctuations in available bandwidth - but that is what TCP is good at handling.

By the by, another reason we have come to depend on really low error rates on layer 2. That's because TCP's error detection is poor. It lets roughly one bad packet through in every 10,000. (Adler32 is very poor on small packets.) You can send 100,000 packets a second at 1Gb/sec, so you need to keep the underlying error rate very low to ensure the backup you are sending to Backblaze isn't mysteriously corrupted a few times a year. <rant>IMO, we should have switched to 64 bit CRC's decades ago.</rant>

jsnell · on March 14, 2024

It isn't. Or at least wasn't back in the UMTS / early LTE era that's being discussed; I got out of that game before 5G.

The base stations and terminals are constantly monitoring the signal quality and adjusting the error correction rate. A bad signal will mean that there's more error correction overhead, and that's why the connection is slower overall.

Second,the radio protocol doesn't treat data transmissions as a one-and-done deal. The protocols are built on the assumption of a high rate of unrecoverable transmission errors. Those rates would be way too high for TCP to be able to function, so retransmissions are instead baked in at the physical protocol level. The cellular base station will basically buffer the data until it's been reliably delivered or until the client moves to a different cell.

And crucially, not only is the physical protocol reliable but it's also in-sequences. A packet that wasn't received successfully shows up just as one (radio) roundtrip latency blip during which no data at all shows up at the client, not as packet loss or reordering that would be visible to the TCP stack on either side.

Other error cases you'd get are:

- Massive amounts of queueing, with the queues being per-user rather than per-base station (we measured up to a minute of queueing in testing). The queue times would translate directly to latency, completely dominating any other component.

- A catastrophic loss of all in-flight packets at once, which we believed was generally caused by handover from one cell to another.

pi-e-sigma · on March 14, 2024

Everyone who is using a phone knows that what you are saying is not true. Otherwise we would not experience dropped calls, connection resets and mobile data being unavailable. Mobile networks are unreliable and you can't paper it over with some magic on TCP or HTTP2/3 level. EDIT: better yet, anyone can just use network tools on their smartphone to see for themselves that mobile networks do drop TCP packets, UDP packets and ICMP packets very freely. Just check yourself!

jsnell · on March 14, 2024

Huh? I'm not talking about papering over it on the TCP or HTTP/2 level. I'm talking about the actual physical radio protocols, and my message could not be more explicit about it.

If you don't understand something, it'd be more productive to ask questions than just make shit up like that.

pi-e-sigma · on March 14, 2024

You made a claim that packet loss in mobile networks is not a common occurrence. This claim is patently wrong and anyone with a smartphone can see for themselves.

jsnell · on March 14, 2024

In reality it's quite hard for somebody to observe that themselves using just their smartphone. The only way they can do it is by getting a packet trace, which they won't have the permissions to capture on the phone, nor the skill to interpret. (Ideally they'd want to get packet traces from multiple points in the network to understand where the anomalies are happening, but that's even harder.)

In principle you could observe it from some kind of OS level counters that aren't ACLd, but in practice the counters are not reliable enough for that.

Now, the things like "calls dropping" or "connections getting reset" that you're calling out have nothing to do with packet loss. It's pretty obvious that you're not very technical and think that all error conditions are just the same thing and you can just mix them together. But what comes out is just technobabble.

pi-e-sigma · on March 14, 2024

Modern mobile networks use exactly the same protocol to carry voice and data. Because voice is just data. When your call is fading or being intermittent then the packets are being dropped. In such situation packets of your mobile data for instance a web page being loaded by a browser are also being dropped. Mobiles drop packets left and right when reception deteriorates or there are too many subscribers trying to use the shared radio channel. And HTTP2 or 3 can't do much about it because it's not magic, if you lose data you need to retransmit it which TCP and HTTP/1.1 can do just as well. BTM UMTS which you claim you were so professionally involved in also uses converged backbone and carries both data and voice the same way so you should have know it already lol :)

jsnell · on March 14, 2024

But I am not saying that HTTP/2 or HTTP/3 are magic that fix packet loss in mobile networks.

I'm saying that from the point of view of either endpoint, there is very little packet loss in mobile networks, because of error correction and retransmissions being handled at the physical layer. This is the third time I've written it. Both previous times you've not answered that, and instead made up a strawman about HTTP/2 and magic. Why do you keep doing that?

Do you not believe that cellular radio protocols do error correction? Or that they do retransmissions at that level, rather than just try transmitting each packet once and then give up?

pi-e-sigma · on March 14, 2024

The parent is just moving goal posts. The whole idea behind multiplexing data streams inside a single TCP connection was that in case of a packet loss you don't lose all your streams. But it doesn't work in practice which is not really surprising when you think about it. When you have multiple TCP connections it's less likely that all of them will get reset due to connectivity issues. Whereas with data multiplexing when your single TCP connection is reset all your data flow stops and need to be restarted.

kikimora · on March 16, 2024

A problem with radio signal or with a wire would affect all TCP connections at the same time. It does not matter if it is one or many, the outcome will be the same. I believe in real life this is a majority of cases. A problem affecting just one TCP connection out of many on the same link must be related to the software on the other side, not network itself.

pi-e-sigma · on March 14, 2024

So again, why HTTP3 is pushed when HTTP2 was meant to be the holy grail? Seems that even Google doesn't consider HTTP2 to be so great.

jsnell · on March 14, 2024

Umm... Like, pretty clearly H2 wasn't meant to be the Holy Grail? Not sure where you're getting that from. (Though as an aside, it feels like you've now backtracked from "H2 is a failure that's worse than H1" through "H2 was a marginal improvement" to "H2 wasn't the holy grail".)

It didn't need to be the Holy Grail to be worth creating. It just needed to be better than H1 was, or better than what H1 could be evolved to with the same amount of effort. And likewise, it's totally possible for H2 to be better than H1 while also H3 is better than H2.

You appear to be confused by the idea that somebody would ever create something other than the absolute best possible thing. Why create H2 first, rather than just jump straight to H3?

One obvious reason is that H2 is far less complex than H3 thanks to piggybacking on TCP as the transport mechanism. The downside is that you then have to deal with all the problems of TCP as well. At the time, there would have been a lot of optimism about eliminating those problems by evolving TCP. An extension for connection migration, an extension for 0-RTT connection setup, an extension for useful out of order data delivery, etc.

It was only a bit later that it became clear just how ossified TCP was. Up until then, one could have told a story about how Microsoft controlled the primary client operating system, and were not really motivated to implement the RFCs in a timely manner, and that's why the feature rollouts were taking a decade. In the 2010s, it became clear that evolution was impossible even when all the coordination and motivation was there. See TCP Fast Open for a practical example.

So around 2015-ish you see big tech companies switch to UDP-based protocols just so that they can actually do protocol evolution.

The other plausible reason is that it's going to be far easier to get the initial work for making H2 funded, since the scope is more limited. And once you show the real-world gains (which, again, would have been there since H2 is better than H1), you then have the credibility to get funding for a more ambitious project.

KaiserPro · on March 14, 2024

> It was only a bit later that it became clear just how ossified TCP was.

It has almost _always_ been tied to OS, but moreover the OS of every node in between you and the webpage. That was the most frustrating thing, there were and are solutions for making TCP more latency resistant but also get better throughput and deal with "buffer bloat" which was a big thing at the time.

I was working in transcontental bulk transfer which used to mean that things like aspera/fasp was the defacto standard for fast/latency/loss resistant transport. So I had seen this first hand. I suspect it was probably why I was dismissed, because I wasn't a dyed in the wool webdev.

chucke · on March 14, 2024

That's quite the history lesson, thx for the info.

I agree that H2 is defacto better than H1, and easier to implement when compared to H3. However, I'll call out the 2 biggest time sinks of the RFC: stream prioritisation and server push. Both of which had narrow application, and incomplete/inefficient specification. H3 seems to have ditched both. My question is, how did this ever end up in the final RFC? As both seem like the kind of thing that could have been easily disproved in SPDY deployments, as well as just asking people doing HTTP for a living.

pi-e-sigma · on March 14, 2024

Oh so typical moving goal posts. Once your little project failed to deliver you claim that it wasn't really meant to provide revolutionary improvement.

a1369209993 · on March 14, 2024

> -highly accomplished and highly experienced engineers were actually too stupid

> -these highly experienced guys actually knew what they were doing

> What seems more likely?

Well, when you put it that way... the former. By a large margin.

foofie · on March 15, 2024

> You have two options: -highly accomplished and highly experienced engineers were actually too stupid (...)

You're making quite clear you are the type of person who is extremely quick to accuse everyone and anyone of being incompetent in the absence of evidence or in spite of evidence.

You do not need to Google too hard to find tons of open-source benchmarks of real world servers showing off performance gains from switching to HTTP/2 and HTTP/3.

But here you are, claiming everyone is incompetent and that their work was bad. In spite of all the evidence.

It's clear that you have nothing relevant to say about the topic and no evidence to even suggest your beliefs have a leg to stand on.