This isn't remotely a bug, it's not even really anything to do with browsers. The browser will be doing (roughly) the following:
while pending_requests():
send_request()
read_response()
But what send_request and read_response are doing is putting data on the OS's outbound queue and then attempting to get data from the inbound queue. If the data is already in the inbound queue before the request is put on the outbound queue it doesn't matter - the browser is not aware of this fact. So long as the "responses" don't come in faster than the browser is sending requests and causing the queue to overfill, and so long as the responses send out come in the order the browser is sending requests this technique will work. In general this is just an "optimistic strategy".
I sure would have been happy to have you on our team in '95, it wasn't obvious to me at all. Afterwards of course it was obvious. But I'm always a little bit suspicious of myself when I think after I am presented with how something works that it is obvious.
Glad to hear you say that. The part of RFC 1945 you quoted seems pretty "clear" to me that the server can't send a response prior to receiving the request. It's pretty "obvious" to me that if the client validates responses and can't find an associated request you're going to have problems, and relying on undocumented behaviour is a bad idea.
In my mind the fact it works seems like dumb luck and I would never have thought to try it -- all of which is pretty depressing seeing as how evidently it made you lots of money, which is certainly something I could do with :) In hindsight, it makes perfect sense to exploit a simple solution with low implementation costs even if it has an unknown lifespan or potential risks - if it breaks, you're no worse off than you were before (well maybe not if your clients have come to depend on it and you don't have a backup), if it doesn't, great!
I can't tell you how many nights of sleep I lost whenever a new browser release by one of the larger browser manufacturers was announced. Every time I was sure our house of cards would come tumbling down but it never did!
Yes, I think this is a good example of the fact that "obvious" and "obvious in retrospect" are two very different things. In other words, hindsight is 20:20.
How would it even validate the request was sent first? It'd have to keep looking at the receive buffers and make sure they're always zero until it knows it actually sent the last packet. Right?
It sounds like it'd be hard to implement and for what benefit?
That was the million dollar question and I gambled that I understood enough of the implementation details that it would be impossible to close the hole. That didn't stop me from living in fear of just that :)
Well coming up with the strategy (and actually implementing it, edge cases abound) is the hard part. However the mechanism by which it works isn't a bug.
So once the initial request is made you can push anything you like to the browser using this pipelining method? What is stopping the responses from coming in too quickly and "overfilling" the queue? To make it work doesn't seem too hard but aren't there possibilities for exploits if you're loading in to memory unrequested data?
It doesn't work that way. All this buffering happens at the OS level, which is 100% unaware of HTTP. The OS just sees a TCP connection, which is simply a pair of unidirectional streams. It buffers data in both directions to decouple the application from the network; this is necessary to keep data flowing at a reasonable rate.
"Early" responses just sit in this buffer until the application (the browser) gets around to reading them; presumably after it's finished sending the request. The size of this buffer is advertised by the client's OS to the server's OS. A well-behaved server will stop sending data when the client's buffer is full. If the server is not well-behaved, the client just drops future packets (which the server will re-send later). The client is not aware of any of this.
If on the other hand, the server sends a wholly unsolicited response, well, it will still sit in the buffer if the client only reads data after sending a request. But say the client is designed to process incoming responses regardless of whether it sent a request, sure, there could be an exploit there, but that's no different from any other buggy network code.
Potentially, you could imagine a poorly-written web browser that could be fooled by an extra HTTP response, confusing it with a later request to a different web site.
So you could follow up a HTTP response for http://empty.website.with.no.other.files.to.request.com/ with a HTTP response containing malicious javacript. When the user then tries to view a different website (say their facebook page), they get 'served' your javascript that is now running under the context of the new site. Cookie stealing and other attacks could be run.
In practice it would be unlikely to happen. If the client doesn't read the 2nd request then it is likely to be sitting in a network buffer assigned to the first connection, which will probably be thrown away when it comes to open a connection to the new site.
That's not how it would work. The later request would be made on a fresh TCP connection (since it's a different URL). Your previous unsolicited response is sitting in the buffers for the old TCP connection. They would not mingle.
I think it's only checking the incoming buffer for responses that match the request they just sent out. Assuming you know what the request is going to be, you can craft the response packet beforehand and assume that the request will be made before the response actually arrives. I'm not super familiar with the nitty-gritty details, but at the very least I think you'd need to hijack the TCP connection to inject malicious responses.
What happens if you request something and get something different in response? The client has no idea about the buffering so it's just a case of whether it's smart enough to handle a misbehaving server.
I'm not sure which implementation you're referring to; in stevejones's example, incoming data remains buffered by the OS until the client specifically requests one response's worth in read_response(). If the OS's buffer ever gets full, it will signal the server's OS to stop sending data; if it continues to get packets it will simply drop them (thus minimizing resource usage).
It's of course totally possible for one to make a client that reads responses regardless of whether it send a request, but that's rather silly as giving up flow control like that immediately opens your application up to a DoS attack. (Of course just because it's silly doesn't mean no-one does it!)
Ah, I see, you're right, this assumes 'blocking' code.
I wonder if that's really how browers work or if they employ an array of open connections that are periodically polled for responses to outstanding requests.
Obviously, responding to something that wasn't requested is a bad idea.
Flow control only works for large responses (which is good, because that is at least one resource that you can protect), makes you wonder what you could do with multiple answers small enough to fit in the same window and if that would allow you to identify HTTP implementations that have taken 'asynchronous' one step too far.
Surely reading the responses opens the browser up to active exploit whilst simply buffering unrequested "responses" allows the possibility of denial of service by filling the buffer and causing actively requested packets to be refused.
So, presumably if someone requests anything from your site you can keep bombarding their browser with unrequested content that will get queued. As jacquesm indicates having an array of connections with reserved queues would avoid this blocking requested content.
In the OS, the buffers are per TCP connection. Only packets destined for the full buffer are dropped. TCP connections can't be opened by a remote attacker unless the browser is actively listening for them.
So effectively all that can happen is a website can DoS the connection to itself.
(Yes, an attacker can try to initiate many many TCP connections. This uses much fewer resources than an actual HTTP session would but can still be an effective DoS attack. This is known as a SYN flood.)
An alternative way to pump lots of webcam frames was to use multipart-MIME responses. That way, there was only one HTTP request and the response just streamed JPEG images, one after the other. No need to break any specifications to get full network usage.
Yes, but we did everything we could to stay away from things like java, downloads and plug-ins. Our mantra was 'it just has to work with whatever the user already has'.
And there always was a way, even if sometimes it required some - for want of a better word ;) - unorthodox methods.
For anyone who was, like me, confused by who "we" is and why I was supposed to know that, this is from the author's "About" page:
>My main occupations are being owner/operator of ww.com, which pioneered streaming webcam technology, and working as a consultant to do technical due diligence.
Reading the article, at first I assumed he was someone who worked on an early browser or something, then maybe a hardware webcam manufacturer. I assume that he's just not used to people showing up at his blog with no context about who he is, so there you go.
I was a bit undecided about that, though I'm leaning towards adding them now. There was a small discussion about that when I first submitted this: https://news.ycombinator.com/item?id=6957005
Some IPSes (Intrusion Protection Systems) that perform deep-packet inspection won't pass such traffic.
But this isn't really a "bug" per se; the TCP model is a stream is a stream is a stream. There's no notion of time, packets, or correlation between streams. So browsers (and the OS) are acting only as they can; by treating a TCP connection as two independent streams.
(Though, how could it be otherwise? Assume HTTP over SCTP (sequenced packets). We can't require, or even allow, HTTP servers to ignore response packets that arrive "too early", since it's possible that observers of the client (e.g. Wireshark) may not observe the exact same timing, which would lead to divergent interpretations of the conversation.)
Amazon does this too. Upload APIs will return 4xx errors well before the body is uploaded in the event there's an issue with the headers. Not that (a) most HTTP clients pay attention to this, or (b) they can do anything about it without closing and reopening the connection.
It is fine to respond early to a HTTP request, e.g. returning a permissions error when a user tries to upload a file. Beware however that some clients will go badly wrong if they don't get to send all their data.
What can happen is:
* Client starts to send HTTP request (e.g. POSTing a file). The file is large, so it will take a while to upload.
* Server spots that the user isn't allowed to upload the file and immediately returns a 4xx error of some kind.
* Server then thinks all is finished, closes connection.
* Client, still sending the file, gets an error as the write() fails. Complains about the broken connection to the user but never notices the actual HTTP response.
Many HTTP clients are written in a simple 'send my request, then (and only then) read the response'. They don't react well to getting an early error message. Often you have to work around this by not closing the connection to the client, and continuing to slurp up any further data received.
>> Some IPSes (Intrusion Protection Systems) that perform deep-packet inspection won't pass such traffic.
We had a vendor who's product did this. Everything worked except one feature (the main feature) and a quick glance at the firewall logs showed 'malformed tcp packet' flooding the console.
It was a simple thing to disable (from just their appliance, not the whole network), but I still found it odd that they did that.
IPS (Intrusion Protection System), not ISP. And yes, they'd drop the connection on such an error response if configured to do so. (Which is likely what you want to do anyway in this case. Amazon's engineers had the foresight not to respond early in the case of a successful upload thankfully.)
Ah! Complete reading fail on my end. (thanks for the edit, it is much clearer now, I apparently substituted ISP for IPS).
I don't think it is possible to respond early in case of a successful upload, after all, that means the upload can still fail for a variety of reasons. Success indicates that you can move to the next state, and an 'early success' might still turn into a late failure.
The difference, if anyone's still playing along, is what action the device takes. An IDS (detection system) is a monitoring and alerting device; traffic still gets through. An IPS (prevention system) drops the flagged traffic.
As you alluded to, the distinction between IDS and IPS is largely configuration and mode of operation.
Years ago, IDS and IPS were separate products, where IDS was the earlier, more primitive version of the other. Now-a-days, you are buying an IPS, which is run either in alerting mode (operating like an IDS) or in "shunning" mode, where the device tracks some defensive action (such as dropping traffic, bandwidth throttling, blacking the IP for a fixed period of time, etc).
"Shunning" mode can be dangerous, since you are essentially building in a feature to "Deny service to X for Y amount of time" into your network.
Attackers can spoof attacks to deliberately trigger the shunning of legitimate users. Because of this, it is less common to see an IDS/IPS with shunning enabled in production. It depends on where "Access to service for legitimate users" and "stopping and possibly hurting attackers" fall on the priorities list.
Agreed with some of the other posters that this isn't a bug. It would be pretty hard for a browser to make this not work. To make it not work, the browser would have to check whether there's data available in the local socket buffer before issuing an HTTP request. On Unix, you could e.g. put the socket in non-blocking mode, issue a read() to read 1 byte, and then see if you're getting an EWOULDBLOCK. If you get data instead of EWOULDBLOCK, then (supposedly) you're in violation of the RFC and therefore the browser might decide to close the connection (what should it do otherwise?)
It just doesn't make a lot of sense doing the above. Especially because there's a fundamental race condition here: there is no way to distinguish between data that's in-flight but not received prior to the browser issuing the request, and data that was generated after the remote peer read the browser's request.
You could encounter this behavior (dropping unsolicited responses) multiple "legitimate" ways (all of which suffer from the race condition you mention): reading & writing in separate threads can do it; so can an asynchronous receive mechanism.
Erlang TCP connections can be configured for asynchronous receive: any incoming data is delivered as a message to a given process, which usually immediately acts on it. Say this process has not yet sent a request; it's not unreasonable to just drop the incoming data.
Of course, I would consider such behavior non-conforming, for the reasons you point out. Time isn't really defined in a TCP stream.
Better is to utilize the flow control Erlang provides for asynchronous receive, but this is extra effort so it's plausible a naive implementation would miss this.
I remember this sort of thing being called "push" back in the day (1995 or so). Before animated GIF support was added to Netscape this was the only way you could achieve animation of any sort on the Web.
The only concrete example I can recall is that Suck.com used this to have an animated logo at the top of their page. (I think this predated the Java applet version that you see on the Internet Archive...)
One of the elements in SPDY is that responses to requests can be pushed by the server anticipating a request. But that's a relatively new development compared to when I figured out that this 'feature' is supported by just about every browser out there. And it's kind of logical, if you implement HTTP in the most straightforward way then the network stack will buffer the response until the next read, regardless of what the rest of the program is doing. So when the browser issues that read (either in a separate reader thread or in the same one if it is programmed single threaded write-then-read style) it immediately finds the answer to the request it just sent out.
Strictly speaking extra bytes sent past the end of the response to the current request (or before even any request has been sent) is a protocol violation but I'm really not complaining about this one, after all, that line in the spec does not actually specify the timing. We all just read between the lines to see what we expect to see: ping ... pong.
Combining it with dynamically generated DNS names might be a nice "content accelerator" add-on for CDNs, etc.
ie: a page uses resources, each of which has a unique url.
You have custom infrastructure (that sits in front of a normal website) which dynamically generates a new subdomain for each resource, and replaces the resource urls with the new urls (using the subdomain).
At the top of the page (or ideally on the previous page) you include some zero-length resources with the same MIME-type as the resources you want to serve.
The browser requests these resources, and as soon as you have the connection open you reply with the zero length resource and then the actual resource you want to serve.
Subsequently the browser requests the actual resource, and finds it already waiting.
The unique hostnames are needed to allow you predict which resource will be requested.
(This was probably patentable until I wrote it all out, too ;))
This sounds less like a bug, and more like a specific tweak to his logic due to his specialized use case.
In most cases, even if the web server knows that a specific page contains images, it does not know if the browser is actually going to request those images. What if it is a bot? What if the user cancels? What if they have disabled image downloads in that browser? What if they have the images and other secondary files cached?
I do think it is worthwhile to consider such things for your individual needs, but most use cases won't change the standard request/response mechanism.
This was actually one of the easier aspects to solve, the webcam server ran on a different port than your regular web server so it knew exactly what it was that you were going to request, it existed for one purpose only: to serve up those images. There was no HTML or other stuff to be confused with. Technically it was probably possible for a browser to re-use the same connection and to request say in index.html after requesting an image but in practice this simply never happened. After the first image request all the subsequent requests on that same socket would be image requests as well.
While true that you cannot know whether the client will request the images, you can use their user agent to make a pretty decent prediction. There will be corner cases where you are wrong, but most of the time your prediction will be true.
For instance, if a bot is pretending to be a Chrome browser, you'd think it was a regular client, but in fact it was not. But that's the bot's fault, not your implementation.
Is it possible that the technique had become widespread and actually known about by the browser makers? i.e. it was a bug but they didn't want to break any applications so didn't fix it...
Am I missing a trick or does this only work when the only thing you're serving at that HTTP server is the JPEG image of the camera? Otherwise the user later refreshes the page thus doing a "GET / HTTP/1.1" and gets /image.jpg instead.
But how will that work if you're sending the response before you parse the request? You don't know the URL the client is after. Were you relying on the browser keeping the same connection alive so you always went index.html->jpegs?
Right, what I meant was that you can't have the camera serve a nice /index.html with the embedded image and other niceties like modern IP cameras do, because you reply with an image to every request.
Well, you can actually. All you need to do is switch modes after the first request, which you handle like every other. Which is in fact what it did... The idea here is that once you've received one request for an image all subsequent requests on that socket will be images as well.
Ok, then you are relying on multiple requests on a single socket, which was what I had suggested before. Does that work reliably though if the user reloads the page while it's streaming? Wouldn't the browser reuse the same connection to request the HTML page again and get an image instead?
If you put your email address in your profile (or send me a line) I'll reply with a link to a cam that is still online from way back when using this technology.
I'd rather not post the link in the thread because the poor people sending out the stream would not be able to satisfy even a small portion of the kind of volume that HN can direct to a site in an eyeblink.
Alright, here's one of those dumb questions you seem very open to: Can we use this technique to (for example) reply with all of a page's dependencies upon the initial request? i.e. if a user goes to www.example.com/ and the server immediately replies with /, /favicon.ico, /styles.css, /script.js, /banner.png, etc? I imagine if it were possible, this would result in a massive reduction in latency...
Well, you can and you can't. See the problem is that you have no idea what the next request will be about! So if you're sending the same kind of request from the client you can respond with a payload in the mime type that is being expected. But for your usecase you could receive a request for /style.css and respond with /favicon.ico if a client decided to make the requests in an order that you did not anticipate.
If you get lucky it will work, but if you're unlucky then you'll be sending out the wrong payloads on all but the first request.
The only reason this trick worked for the webcam is because it knows ahead of time what kind of request will come (the request for the next frame). That's why it can anticipate.
Now that I think about it, one could use a small javascript library embedded in the index page to make a number of additional requests and interpret them as the correct types via data: URLs. That would be a lot of messy hacking to shave off a few hundred ms, but might be an interesting exercise to undertake...
That's a multipart response [0]. In the early days of 3G (and multimedia phones) we used to use those for some models of phone which supported it to give a better browsing experience - the gamble being that more downstream data would be faster than costing a request / response round-trip if you were pretty confident that the phone was going to ask for it.
But how are you going to use UDP to send images to a browser without using a plug-in or an applet? The whole idea was to remain 'compatible' (for small values of compatible) with HTTP, which more or less guaranteed delivery.
UDP wouldn't make it through most firewalls and would make all kinds of assumptions about port forwarding and so on besides that fact that browsers simply do not expect content to arrive via UDP.