Hacker News new | past | comments | ask | show | jobs | submit login
Discovering Issues in HTTP/2 with Chaos Testing (twilio.com)
110 points by gregorymichael on Feb 13, 2018 | hide | past | favorite | 10 comments



I'm not sure if the behaviour they encountered in testing generalises to actual production environments.

Specifically:

> Penalty #2: TCP window size will drop dramatically, and all streams will be simultaneously throttled down.

This implies that, for both HTTP/1 and /2 connections, throughput per connection can be limited by min(throughput) for that connection. With a http/2 connection being used for multiple streams, and therefore more packets, the connection is more likely to get hit by a dropped packet.

That scenario doesn't quite seem to be realistic, because it assumes that dropped packets are independent (3% in the test). The rate of injected errors was apparently also not adjusted for any of the factors that would affect errors in the real world, such as package size and (attempted) speed.

But reducing sizes and throttling are direct countermeasures intended to provide maximum throughput even on flakey connections. Since any connection problems are bound to be highly correlated per client, it would seem that http/2 may actually perform faster than /1 because the "information" about connection troubles, and the countermeasures, are applied to a larger number of streams.


hmm... interesting thought. Any idea how one would actually test that?


I think making each connection ameliorable to throttling and the the other countermeasures clients typically use is most important, although I don't know enough about networks to be more specific. If it's easier, making every connection error exactly once may give a first idea of possible changes to the results.

A client-specific, randomised error rate would be the second step. Without this, the test is still meaningful. But it measures the system's response to internal failures, not to a typical production environment where most clients see 0 dropped packets, and a small number encounter most of the errors. I'm not sure how dropped packages are distributed, but I would guess 99.9%+ clients have none, yet those that see any might actually get an error rate far higher than 3%.


Looks like they are encountering the issue of head-of-line blocking. Since HTTP/2 multiplexes streams on a single tcp connection, all streams suffer when packet loss occurs. This is a well-known flaw in HTTP/2. HTTP/1 creates multiple tcp connections; if one connection needs to wait for retransmissions, the others are not impeded.


I suspect that QUIC would address some of these issues, however I also get the feeling that QUIC is just a reimplementation of TCP/IP in user land. Sure QUIC has some nice features that TCP/IP either doesn't have or implements poorly; priority, server initiated stream push. But others like different streams have different packet loss recovery (this is done in TCP normally, or UDP when packet loss isn't critical) is already achieved with the layer 3 / layer 4 split.

The one benefit I can see with QUIC is that because it is a user level protocol it is more likely to be updated. Many embedded or IoT devices never get kernel updates, although they sometimes get user level software updates.


Interesting find. On the other hand, if anyone like to test H2O web server as well, I’m curious how its handle.

https://github.com/h2o/h2o

There is even a issue reques to bring HTTP/2 feature implement in HTTP/1 for better performance.


Sorry, I have missed out the link:

https://github.com/h2o/h2o/issues/1601


I wonder if BBR congestion control could mitigate the effect of packet loss here.


Do we think the likelihood of packet loss is the same for HTTP/1 and /2 ?


Packet loss is happening at a lower layer (ethernet, wifi, mobile etc.). It's a property of the physical medium, not whether HTTP1.1 or HTTP2 is running at the higher layer.

The point of this blogpost is that design of HTTP/2 (specifically, multiplexing multiple http transfers over a single TCP connection) behaves badly under packetloss conditions.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: