This isn't remotely a bug, it's not even really anything to do with browsers. The browser will be doing (roughly) the following:
while pending_requests():
send_request()
read_response()
But what send_request and read_response are doing is putting data on the OS's outbound queue and then attempting to get data from the inbound queue. If the data is already in the inbound queue before the request is put on the outbound queue it doesn't matter - the browser is not aware of this fact. So long as the "responses" don't come in faster than the browser is sending requests and causing the queue to overfill, and so long as the responses send out come in the order the browser is sending requests this technique will work. In general this is just an "optimistic strategy".
I sure would have been happy to have you on our team in '95, it wasn't obvious to me at all. Afterwards of course it was obvious. But I'm always a little bit suspicious of myself when I think after I am presented with how something works that it is obvious.
Glad to hear you say that. The part of RFC 1945 you quoted seems pretty "clear" to me that the server can't send a response prior to receiving the request. It's pretty "obvious" to me that if the client validates responses and can't find an associated request you're going to have problems, and relying on undocumented behaviour is a bad idea.
In my mind the fact it works seems like dumb luck and I would never have thought to try it -- all of which is pretty depressing seeing as how evidently it made you lots of money, which is certainly something I could do with :) In hindsight, it makes perfect sense to exploit a simple solution with low implementation costs even if it has an unknown lifespan or potential risks - if it breaks, you're no worse off than you were before (well maybe not if your clients have come to depend on it and you don't have a backup), if it doesn't, great!
I can't tell you how many nights of sleep I lost whenever a new browser release by one of the larger browser manufacturers was announced. Every time I was sure our house of cards would come tumbling down but it never did!
Yes, I think this is a good example of the fact that "obvious" and "obvious in retrospect" are two very different things. In other words, hindsight is 20:20.
How would it even validate the request was sent first? It'd have to keep looking at the receive buffers and make sure they're always zero until it knows it actually sent the last packet. Right?
It sounds like it'd be hard to implement and for what benefit?
That was the million dollar question and I gambled that I understood enough of the implementation details that it would be impossible to close the hole. That didn't stop me from living in fear of just that :)
Well coming up with the strategy (and actually implementing it, edge cases abound) is the hard part. However the mechanism by which it works isn't a bug.
So once the initial request is made you can push anything you like to the browser using this pipelining method? What is stopping the responses from coming in too quickly and "overfilling" the queue? To make it work doesn't seem too hard but aren't there possibilities for exploits if you're loading in to memory unrequested data?
It doesn't work that way. All this buffering happens at the OS level, which is 100% unaware of HTTP. The OS just sees a TCP connection, which is simply a pair of unidirectional streams. It buffers data in both directions to decouple the application from the network; this is necessary to keep data flowing at a reasonable rate.
"Early" responses just sit in this buffer until the application (the browser) gets around to reading them; presumably after it's finished sending the request. The size of this buffer is advertised by the client's OS to the server's OS. A well-behaved server will stop sending data when the client's buffer is full. If the server is not well-behaved, the client just drops future packets (which the server will re-send later). The client is not aware of any of this.
If on the other hand, the server sends a wholly unsolicited response, well, it will still sit in the buffer if the client only reads data after sending a request. But say the client is designed to process incoming responses regardless of whether it sent a request, sure, there could be an exploit there, but that's no different from any other buggy network code.
Potentially, you could imagine a poorly-written web browser that could be fooled by an extra HTTP response, confusing it with a later request to a different web site.
So you could follow up a HTTP response for http://empty.website.with.no.other.files.to.request.com/ with a HTTP response containing malicious javacript. When the user then tries to view a different website (say their facebook page), they get 'served' your javascript that is now running under the context of the new site. Cookie stealing and other attacks could be run.
In practice it would be unlikely to happen. If the client doesn't read the 2nd request then it is likely to be sitting in a network buffer assigned to the first connection, which will probably be thrown away when it comes to open a connection to the new site.
That's not how it would work. The later request would be made on a fresh TCP connection (since it's a different URL). Your previous unsolicited response is sitting in the buffers for the old TCP connection. They would not mingle.
I think it's only checking the incoming buffer for responses that match the request they just sent out. Assuming you know what the request is going to be, you can craft the response packet beforehand and assume that the request will be made before the response actually arrives. I'm not super familiar with the nitty-gritty details, but at the very least I think you'd need to hijack the TCP connection to inject malicious responses.
What happens if you request something and get something different in response? The client has no idea about the buffering so it's just a case of whether it's smart enough to handle a misbehaving server.
I'm not sure which implementation you're referring to; in stevejones's example, incoming data remains buffered by the OS until the client specifically requests one response's worth in read_response(). If the OS's buffer ever gets full, it will signal the server's OS to stop sending data; if it continues to get packets it will simply drop them (thus minimizing resource usage).
It's of course totally possible for one to make a client that reads responses regardless of whether it send a request, but that's rather silly as giving up flow control like that immediately opens your application up to a DoS attack. (Of course just because it's silly doesn't mean no-one does it!)
Ah, I see, you're right, this assumes 'blocking' code.
I wonder if that's really how browers work or if they employ an array of open connections that are periodically polled for responses to outstanding requests.
Obviously, responding to something that wasn't requested is a bad idea.
Flow control only works for large responses (which is good, because that is at least one resource that you can protect), makes you wonder what you could do with multiple answers small enough to fit in the same window and if that would allow you to identify HTTP implementations that have taken 'asynchronous' one step too far.
Surely reading the responses opens the browser up to active exploit whilst simply buffering unrequested "responses" allows the possibility of denial of service by filling the buffer and causing actively requested packets to be refused.
So, presumably if someone requests anything from your site you can keep bombarding their browser with unrequested content that will get queued. As jacquesm indicates having an array of connections with reserved queues would avoid this blocking requested content.
In the OS, the buffers are per TCP connection. Only packets destined for the full buffer are dropped. TCP connections can't be opened by a remote attacker unless the browser is actively listening for them.
So effectively all that can happen is a website can DoS the connection to itself.
(Yes, an attacker can try to initiate many many TCP connections. This uses much fewer resources than an actual HTTP session would but can still be an effective DoS attack. This is known as a SYN flood.)