* I presume the case here is the proxy is close to the server so the handshake is faster and thus the single benefit of this setup, although that's not at all what's illustrated.
* The illustrations show the request taking longer with the proxy, although maybe the two diagrams aren't to scale
* The originating UDP packet could get lost and the client would never know
The author could improve the latency by prepping the TCP connection before the request comes in, giving a significant reduction in latency.
It's about SENDING requests with "zero latency", not about completing http operations with zero latency.
And yes, you get no confirmation, and no reply.
I must admit I'm hard pressed to come up with a use case for this. You could just as easily do a regular HTTP request on a separate thread and throw away the result to get "zero latency" fire-and-forget behavior.
> I must admit I'm hard pressed to come up with a use case for this. You could just as easily do a regular HTTP request on a separate thread and throw away the result to get "zero latency" fire-and-forget behavior.
I do this quite often, and creating another thread might be easy to code, but it's harder on capacity planning and management. It's also hard to gossip utilisation of a shared network link across a cluster, so e.g. if requests are infrequently sourced, I may want to permit a pool of 500k concurrent output HTTP requests across the entire cluster, but I don't want to give every machine a mere 10k outgoing since busy endpoints would be starved by unbusy ones (and my network link would have plenty of spare capacity). Managing all that could be a full time job if I went down the "easy" path of just creating another thread.
Using UDP means if there is network congestion, messages just get dropped and I don't waste more network and CPU traffic by sending retransmits. I have retries further up the chain anyway for other reasons, so it makes sense to me to have less code and reuse what I've already got.
Do you really want a log that might have useful information incase of meltdown? I think logging is something you should probably confirm or you’re setting yourself up for nasty bugs.
A lot of logging software allows for UDP connections though. Not all logs are critical, and in times when you have close to 100% network saturation, dropping some log traffic to maintain a functional production environment is not a big deal for most.
The only use case I can think of might be gaming. If you had a HTTP/Websockets based game you wanted to cut latency down on you could have the FF proxy hosted near the game server.
I can tell you a real life use case that already exists because I built this exact thing already (though a closed-source version): satellite internet terminals.
With satellite internet, uplink (upload) is generally more complicated than downlink (download). This is because a terminal can just passively listen for packets intended for him, but with uplink you have to make sure you only transmit in your designated time slot.
Because uplink is more complicated than downlink, more often than not uplink will get in a degraded or broken state, while downlink is still operational.
Now say I get a call from a customer service rep saying a customer's terminal is stuck in a degraded state. How can I recover it? TCP (and therefore HTTP) is a no-go because it requires uplink to be fully functional. So I can't visit the terminal's embedded webserver or issue commands to the HTTP API.
However, using a udp to tcp proxy like this, I can issue commands to the terminal's HTTP API "blindly" which it will then execute to (hopefully) recover itself. This can also be done with other "one-way" protocols like SNMP.
I feel like the real solution is your terminal should ping every hour, and if a ping test fails it should auto-reset itself and file a bug for the developers with whatever state information is needed for them to fix the bug.
Unfortunately satellite internet systems are way more complicated than you probably think. It can be very hard to tell why a terminal is experiencing issues. Sometimes it's because of a nearby cell phone tower and you have to adjust some terminal configurations to shift frequencies slightly. Maybe there's just a lots of bad weather in that particular part of the country and rain fade is degrading the signal and the terminal isn't transmitting at max power like it should. Maybe your neighbor has a police radar detector in his car interfering with the signal. Maybe a horse has bumped the dish and it's slightly mispointed at the satellite now. Maybe it's an issue on the gateway side. (all true stories).
Unless it's a mass issue across a whole beam, it usually requires some investigation to figure out why an individual terminal state is degraded. And sometimes (more than I'm comfortable with) during the course of your investigation you make the terminal state even worse (because you tweaked some config parameter the wrong way). You can always tell when you done messed up because suddenly you get disconnected from your ssh session or you can no longer reach the embedded web server from your browser. In those cases sometimes the only way to recover is to precisely undo what you did, and an auto-ping-reset will not fix it (unless the system keeps track of most recent config changes or something). Your options are: use udp/tcp proxy or snmp to attempt to undo the config parameter you foobared or send out a technician and fix it on site...
I'm not saying our system is perfect. Surely we could improve fault detection and recovery. I'm just saying one-way communication protocols are real nice to have sometimes because of these scenarios.
We run a bunch of remote paging basestations that utilize satellite, however, they are also backed up with 4G routers which are used instead if the satellite link fails for any reason (rain fade, nuclear missile attack on the satellite hub/network etc). I guess TV over 4G would use a lot more bandwidth and potentially be very expensive.
I don't mean to sound smug, but I'm the author of the Jackbox Games multiplayer servers, I regularly handle hundreds of thousands of simultaneous websocket connections from players' phones. I can't for the life of me find a server implementation of webrtc data channels that I would actually put into production. I've seen a few blog posts with toy examples, but nothing that is ready for production. It sounds nice in theory, but try actually finding a production-ready server implementation of WebRTC data channels. I would LOVE to find a documented example of doing client-server communication with WebRTC data channels (a Go implementation on the server specifically). I've found no such thing. Every few months I search and don't find such a thing, and I don't need the extra few ms of latency improvement badly enough that it would be worth the engineering time to author it myself. If you know of one that's actually ready for primetime, I'd love to hear about it.
edit: after writing this comment, I did some snooping around since I haven't snooped around on this topic in a few months, this looks promising, unclear how widely used people are using it for production work: https://github.com/pion/webrtc
Had to write the SDP handshake and re-build a subset of SCTP from scratch since the only SCTP implementation uses a global listener which doesn't play nice for a clean server implementation(it was also a royal pain to compile in a way that played nice with Rust).
I got it working far enough to handle the WebRTC handshake over SCTP and get some basic data flowing. It's still in pretty rough shape and ran out of time to document it to any degree where someone else could take it and run with it.
If there's enough interest I might pick it up again but right now I've got other priorities going on.
Just wanted to comment and say thank you for your work :). The Jackbox games work flawlessly and even when they get into a weird state a refresh usually sorts things out. Thanks for covering whatever weird corner cases you had to cover to get it to work so consistently.
I'm actually a little surprised the Jackbox games aren't monetized differently. They cost so little up front and that CPU time can't be free.
Just today I set up my wife's laptop with Drawful 2 and Jackbox 6 so she can play with her coworkers tomorrow. Thanks!
I have, but I didn't use go. I wrote a webrtc server in C, and probably handled something north of a trillion event points per month this way (receiving telemetry from interactive digital advertisements).
My goal wasn't latency but capacity: I can handle maybe 100kqps in HTTP on the same kit that could handle something like 600kqps+ messages with this (Never did proper limit testing here, since I knew my supplier was limited on the network links). It took about three weeks to get operational to a point it could be added to production, which was worth it to me.
when you say http do you mean a roundtrip per data point? curious if you went from 1 http request per data point to webrtc or from something more like 1 websocket message per data point to webrtc.
It's just -- and this may be mildly out of date -- EXTREMELY hard to find a server-side setup that will let you do all of what the browsers now support with WebRTC.
My recollection from the last time I skipped across this topic was: there are 1 or 2 heavyweight open source projects that mostly focus on the video aspects but also supported data channel stuff, and then one lightweight C and one lightweight Go project that supported everything, and you were mostly out of luck otherwise.
I remember the last time I checked, a year ago or so, pieces of what's required being worked on in recent OTP releases in BEAM-land. Hmmm ... time to check again.
Well obviously you don't need to maintain a background thread and a connection so you would be able to send out much more data using the same resources by offloading the TCP workload to the proxy.
If the service you are sending data to is unavailable, you wouldn't get any backpressure from it.
Think of a monitoring or log system where you might want to send out millions of datapoints but you can afford to lose some, in return you don't have to worry about the system not being reachable.
Until you get some decent amount of load on the system. Then the background thread will still make progress. And the UDP/Proxy approach will only have failing requests, because the loss of any single UDP packet will make the request fail (there are no retries).
I can only see being interesting if you have a workload which requires sending small requests that fit into one or a couple of datagrams.
I wonder if the author wrote this with something in mind?
Fire and forget logging to HTTP endpoints would be pretty useful for IoT sensors. You'd have a lot less code and could potentially save significant power. You obviously lose the ability to guarantee data arrived, but that's probably not important for all sensor situations.
For that stuff protocols such as CoAP might be more useful as it's a standard with pretty much the same benefits but less custom code.
It's even possible to translate CoAP directly into HTTP using a proxy such as Squid.
There's also MQTT, which was basically designed for IoT sensor reporting. This has even more supported libraries and has been around for ages.
The UDP proxy system has the benefit of not being able to fall victim to classic UDP amplification vulnerabilities (send a packet with a spoofed source and have the response bounce back), but it does allow an unsuspecting proxy server to turn into a HTTP-based DDoS. You can send a single packet towards a server and the server automatically does a full TCP handshake and payload delivery for you! That's a lot of extra traffic.
I'd stick with known protocols for IoT stuff instead of this. At least the risks have been analysed and firewalls can (should) be easily configurable to block outgoing DDoS attacks in the worst case. The same is not really true for this.
I'm going to sacrifice my karma for this: the solution is not to make TCP use UDP, but instead to allow TCP to behave like UDP by giving it the option to ignore out-of-order. All operatives have flawed implementaions of this, but we need it to be in the TCP RFC.
Wouldn’t out of order TCP require whole new client libraries as well. All the TCP client code expects well ordered packets. So if you’re doing that, might as well go all the way to a new protocol. Which isn’t this then QUIC? I’m not familiar enough with either to know what the practical differences would be between QUIC and out of order TCP.
Good stuff. Could use some asserts and checks for NULL here and there, but overall it's a nicely organized C code (and with proper bracing and indentation style :)). Always a pleasure to read through something like this.
Actually it has the wrong indentation style. Should be tabs left spaces right. I run tab width of 3 in my editors. To each their own, if you use tabs left of course.
Neat! FF Proxy reminds me of one of my projects ( https://github.com/fanout/zurl ), but more extreme since the sender uses raw UDP. It seems like it would be great for remote use by lightweight senders. Maybe low power devices emitting stats?
P.S. good to see the comment, "TODO: filter out private IP ranges". I was going to suggest this but it seems you already know. :)
Effectively zero latency from the data being written to the socket until it appears on the wire, by eliminating the latency of the TCP 3 way handshake.
They've basically discovered TCP Performance Enhancing Proxies, used for decades for space communications.
Yes, but only for a specific set of circumstances (when you can live without reliable & timely delivery, and do not need any sort of response) and only from the PoV of the client.
The claim seems correct with those caveats, though is a bit "click baity" without them.
I don't see how the claim is technically correct in any way. Unless they've figured out how to circumvent the speed of light, there is no such thing as "zero latency."
The client just jets out a single UDP packet. There is no TCP setup latency, effectively no latency at all though only from that client's point of view.
I think it is a valid claim, but only for a rare set of use cases where the significant caveats regarding delivery & response are OK. Though of course those caveats are not as heavy as they might at first seem if the link between client and proxy is strong and reliable, because the proxy will be performing the full handshake and can manage redeliver on error and other such things - but the client doesn't need to worry about any of that.
Anything (hardware or software) in the network stack advertising "zero latency" is a lie. The idea or implementation may have merit, but it is best to avoid opening with a trivial impossibility.
i don't think most people read "zero latency" and think literally 0 ns of latency, because, as you said, it is trivially impossible. If you read any length of documentation you will see how the latency is orders of magnitude less than sending it out directly. That's close enough to zero latency for me.
I am reminded of statsd. I like this, though, as it generalizes more. Cool hack!
I wonder if it would be worth making this embeddable into an existing web framework for Python or Go, so that the existing service could receive UDP requests. Then again, that's what 0RTT is for so it might be a little wheel-reinventy.
WOW. This is such a fascinating project. (near) zero latencies are relatively hard to achieve especially over network connections. Will definitely try this out once I get the time. Oh and thanks for the various pre-built client libraries, will definitely come in useful.
You're also saving the time of network latency and waiting for a response. With the proxy, once the packets are out on the network, you're done and can move on.
The second figure could have shown the client receiving the HTTP response to its request. Also, isn't HTTP/3 (QUIC) already supporting something like this?
I dont know why more web servers are not using UDP by default and only go TCP when the client or content needs it. For example, sites like youtube - the videos must be streamed to us, why not the site itself? Only when I want the client to definitley know it sent data (a banking transaction, an amazon order) but for the vast majority of sites UDP bi-di should be fine.
Can you imagine the amount of bandwidth saved around the world in aggregate?
TCP should be restricted based on content classification.
* I presume the case here is the proxy is close to the server so the handshake is faster and thus the single benefit of this setup, although that's not at all what's illustrated.
* The illustrations show the request taking longer with the proxy, although maybe the two diagrams aren't to scale
* The originating UDP packet could get lost and the client would never know
The author could improve the latency by prepping the TCP connection before the request comes in, giving a significant reduction in latency.