Hacker Newsnew | past | comments | ask | show | jobs | submit | kwantam's commentslogin

The whole point of this article is that performant Wireguard-over-TCP support in Wireguard simply does not work. You're not fighting the prevalence of an idea, you're fighting an inherent behavior of the system as currently constituted.

In more detail, let's imagine we make a Wireguard-over-TCP tunnel. The "outer" TCP connection carrying the Wireguard tunnel is, well, a TCP connection. So Wireguard can't stop the connection from retransmitting. Likewise, any "inner" TCP connections routed through the Wireguard tunnel are plain-vanilla TCP connections; Wireguard cannot stop them from retransmitting, either. The retransmit-in-retransmit behavior is precisely the issue.

So, what could we possibly do about this? Well, Wireguard certainly cannot modify the inner TCP connections (because then it wouldn't be providing a tunnel).

Could it work with a modified outer TCP connection? Maybe---perhaps Wireguard could implement a user-space "TCP" stack that sends syntactically valid TCP segments but never retransmits, then run that on both ends of the connection. In essence, UDP masquerading as TCP. But there's no guarantee that this faux-TCP connection wouldn't break in weird ways because the network (especially, as you've discovered, any cloud provider's network!) isn't just a dumb pipe: middleboxes, for example, expect TCP to behave like TCP.

Good news (and oops), it looks like I've just accidentally described phantun (and maybe other solutions): https://github.com/dndx/phantun I'd be curious if this manages to sidestep the issues you're seeing with AWS and OVH.


> The retransmit-in-retransmit behavior is precisely the issue.

But you're concerned about an issue I do not have. In practice retransmits are rare between my endpoints, and if they did occur poor performance is acceptable for some period of time. I just need it to me fast most of the time. To reiterate: I do not care about what happens in non-optimal conditions.

> it looks like I've just accidentally described phantun (and maybe other solutions): https://github.com/dndx/phantun

I'll definitely look into that. They specifically mention being more performant than udp2raw, so that's nice.


> In practice retransmits are rare between my endpoints

You seem to be mistaken about how (most) TCP implementations work. They regularly trigger packet loss and retransmissions as part of their mechanism to determine the optimal transmission rate over an entire path (made up of potentially multiple point-to-point connections with dynamically varying capacity).

That mechanism breaks down horribly when using TCP-over-TCP.


Can't the tunneling software detect when the upper TCP is retransmitting segments and drop them?

That would give the lower TCP enough time to transmit the original segment.


Maybe, but packet loss isn't the only problem. You'll also want to preserve latency (TCP has a pretty sophisticated latency estimation mechanism), for example.

Some middleboxes will also do terrible things to your TCP streams (restrictive firewalls only allowing TCP are good candidates for that), and then all bets are off.

If you're really required to use TCP, the "fake TCP" approach that others in sibling threads have mentioned seems more promising (but again, beware of middleboxes).


But, my connection speed is usually greater and my loss is much less to my VPN endpoint than to whatever services I am accessing though that endpoint. As a result it doesn't affect things much. Further, accessing it with UDP is not always possible.


> [...] my loss is much less [...]

Unless it's actually zero, any loss on the "outer" TCP stream will cause a retransmission, visible to the inner one as a sharp jump in latency of all data following the loss. Most TCP stacks don't handle that very well either.


Sure, even when outer loss is pretty close to zero, it's conceptually not great.

On the other hand, I get 400mbps over TCP-over-TCP connections, and can't connect in any other reasonable way. 400mbps > 0.

Even tunneling in UDP is not great due to MTU effects.


One of the fun things about the median-of-medians algorithm is its completely star-studded author list.

Manuel Blum - Turing award winner in 1995

Robert Floyd - Turing award winner in 1978

Ron Rivest - Turing award winner in 2002

Bob Tarjan - Turing award winner in 1986 (oh and also the inaugural Nevanlinna prizewinner in 1982)

Vaughan Pratt - oh no, the only non-Turing award winner in the list. Oh right but he's emeritus faculty at Stanford, directed the SUN project before it became Sun Microsystems, was instrumental in Sun's early days (director of research and designer of the Sun logo!), and is responsible for all kinds of other awesome stuff (near and dear to me: Pratt certificates of primality).

Four independent Turing awards! SPARCstations! This paper has it all.


Job interview question for an entry-level front end developer: "Reproduce the work of four Turing award winners in the next thirty minutes. You have a dirty whiteboard and a dry pen. Your time begins... now."


And if you really want to impress, you reach into your pack and pull out the pens you carry just in case you run into dry pens at a critical moment.


I have actually done this. I bought myself a pack of premium pens at an art store and kept them in my laptop bag for customer site visits. They were designed for writing on glass but also worked great on whiteboards. It’s fun when your diagrams just look better than anyone else’s. It’s like a report document with good typography. It can be filled with made up BS but it still impresses subconsciously.


Here's a direct link for anyone who, like me, would be interested in reading the original article: https://people.csail.mit.edu/rivest/pubs/BFPRT73.pdf

That's an impressive list of authors, for sure.


Some other awesome stuff by Pratt:

Pratt parsing (HN discussion: https://news.ycombinator.com/item?id=39066465), the "P" in the KMP algorithm.


Great stuff dga :)

Turns out, Niall was also involved in one of the winning ZPrize submissions for fast multi-scalar multiplication (closely related to batch modexp, although over an elliptic curve rather than mod a prime); I assume it inherits from his work on CGBN.

He give a very nice talk about it last year at a Stanford crypto lunch, and it turns out the slides and recording are online!

https://cbr.stanford.edu/seminarTalks/slides_20230526_niall_...

https://www.youtube.com/watch?v=KAWlySN7Hm8


Howdy, neighbor! Thanks for this - I'm glad to hear he's still doing cool stuff. I'm off to watch that now, in fact... :-)


Maybe we're looking at different things, but the link appears to discuss ElGamal encryption, which is discrete log based (which means modern implementations use elliptic curves; historically it would have been discrete log in a subgroup of a large prime field). It also talks about BLS signatures, which are exclusively elliptic curve based.

By and large, anything whose security relies on discrete log can be implemented using an elliptic curve, but beginning cryptography classes treat that as an implementation detail because mostly all you need is a prime-order group, and elliptic curves can mostly be treated as a black-box prime order group.

(BLS signatures are an exception; they require a bilinear pairing, which in turn requires a special kind of elliptic curve that's not just a black-box prime order group.)

There are all sorts of great algebraic geometry tricks to be played with elliptic curves, but those almost certainly aren't going to be found in an intro crypto class, or maybe any CS class...


EdDSA signatures are specified to use deterministic nonce generation, so you're correct that they do not require randomness. But they certainly do require modular arithmetic in order to implement the elliptic curve operations!


RFC6979 attempts to guarantee that the nonce is unbiased (under the assumption that HMAC's output is indistinguishable from random). It's definitely attempting to give a stronger property than simply preventing a repeated nonce.

See step (h) in Section 3.2. The nonce is selected by rejection sampling. Thus, under the above assumption about HMAC, the result is indistinguishable from uniformly random in the [1, q-1] range.


The nonce is taken modulo the order of the prime-order subgroup. For DSA that's generally a 256ish-bit prime (e.g.: choose a 1024-bit prime p such that a 256-bit prime q divides p-1; then there exists an order-q subgroup of Zp).

For P-521, the base field is 2^521 - 1, but the modulus used when computing the nonce is not that value, it's the order of the P-521 curve. By Hasse's theorem, that's roughly p +- sqrt(p), which is essentially p for such large numbers (the cofactor of P-521 is 1, so the order of the group is prime).

So: both are 521-bit numbers, but the group order is less than 2^521-1. Its hex representation is 0x01fffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffa51868783bf2f966b7fcc0148f709a5d03bb5c9b8899c47aebb6fb71e91386409.


Ahh. Thanks! (I don't think about FFDLP much, as you can see).


This vulnerability has very little to do with P-521 per se. The issue is with ECDSA: any use of ECDSA with biased nonce generation, regardless of the elliptic curve it's implemented over, immediately causes secret key leakage.

(Rant: All these years later, we're all still doing penance for the fact that Schnorr signatures were patented and so everyone used ECDSA instead. It's an absolute garbage fire of a signature scheme and should be abandoned yesterday for many reasons, e.g., no real proof of security, terrible footguns like this.)


Schnorr wouldn't have helped in this specific case, since Schnorr is equally vulnerable to biased nonces (https://ecc2017.cs.ru.nl/slides/ecc2017-tibouchi.pdf).

EdDSA, which is essentially deterministic Schnorr, does solve the problem.

Also, the use of P-521 didn't specifically cause the vulnerability, but the bad interaction between SHA512 and P-521 did play a role. It is unfortunate that nature conspired against us to make 2^511 - 1 a composite number. The fact that you have to go up to 521 bits to get a Mersenne prime whereas the natural target length for a hash output is 512 bits is the fatal interaction here.


Excellent points all around, and thank you for the pointer to the ECC slides :)

(And indeed, nature could have been kinder to us and given us a Mersenne between 127 and 521...)


Shouldn't there be another close enough prime? Like 2^510-1 or 2^511-19?


> Schnorr signatures

Never heard of (which probably demonstrates that I know pretty much nothing about cryptography?), so seeing a name spelled like "Schn...r" in this context makes at least me think of an entirely different luminary in the area. Thought it was a typo at first.


I’m over here wondering why someone would want deterministic nonces.

Isn’t it kind of the point to just roll random numbers? When would you calculate?


It says in the OP. Windows at the time did not provide a cryptographic quality random number source.


I saw that, and wondered why PuTTY didn't contain it's own good CSPRNG, something like Fortuna, if Windows didn't offer one.


You still need a source of entropy, which is easier for an OS. An app has to resort to the user moving the mouse or bashing keys, which is a worse UX, although I guess they did that for actual key generation (if PuTTY did that) but it would be annoying to do it every time you made a connection.


I'm sorry to say that your analysis is wildly incorrect.

- 10 billion people =~ 2^33

- 1000 CPUs =~ 2^10

- 1024 cores =~ 2^10

- 10 GHz =~ 2^33

So: one second's computation by all of these people is 2^86 UUIDs generated. UUIDs are 128 bits. With probability essentially 1, there will be a collision within one second.

The reason is known as the birthday paradox. If you sample random values from a set of size k, after you've chosen about sqrt(k) values you will have chosen the same value twice with probability very close to 1/2. By 10*sqrt(k) samples you'll have found a collision with probability well over 90%.

In this case, after sampling 2^64 values you'll have a collision with probability 1/2. That happens in roughly 250 nanoseconds (2^-22 seconds) in your thought experiment.

2^64 sounds like a lot, but in many contexts it's not all that much. Every bitcoin block mined takes well in excess of 2^70 SHA evaluations. Obviously the miners are not dedicated to generating UUID collisions, but if they were they'd easily find thousands of them in the time it takes to mine one block (this neglects the fact that it is much easier to sample a UUID than to evaluate double-SHA256).


> The reason is known as the birthday paradox.

Right. Forgot about that little thing. You're absolutely correct.

With this taken into account a _single_ person with 1000 pieces of 1000-core, 10 GHz CPUs generating UUIDs will generate a match in a few minutes.


Yes, a correct encryption algorithm can encrypt (essentially) any bit string. But it's quite easy to turn a correct encryption algorithm into an incorrect one by bolting on something seemingly innocuous.

Here's a concrete example. Let's say you decide you want to make AES encryption more efficient by defining a new standard, "lzAES", that is just:

    Enc_lz(key, msg) := AES-Encrypt(key, lzw_compress(msg))
    Dec_lz(key, ctxt) := lzw_decompress(AES-Decrypt(key, ctxt))
This "works", in the sense that you can correctly decrypt ciphertexts, and it certainly seems innocuous. But it is now an insecure cipher!

Here's why: the definition of a secure cipher is that ciphertexts resulting from the encryptions of any two messages of equal length are indistinguishable. In other words, for any two messages you can choose, if I encrypt both messages under a randomly generated key, you can't tell which ciphertext corresponds to which message. In contrast, lzAES as defined above does not have this property: you could choose one message that's easily compressible (say, a string of zeros) and one that's not (say, a uniformly random string), and then you'd be able to tell which ciphertext corresponds to which plaintext just by looking at the ciphertext lengths.

And this is not just a definitional issue! If you use lzAES to encrypt something, an attacker can guess your message and test whether compressing the guess gives the same length as your ciphertext. Guess-and-check isn't possible with a secure cipher, but it is with lzAES---in other words, it gives away information about your plaintext!


Thank you for taking the time to explain, your explanation is clear and gives me something to think about. But I have a question about:

> But it's quite easy to turn a correct encryption algorithm into an incorrect one by bolting on something seemingly innocuous.

Isn't every poorly designed web app essentially a giant "bolt on" to the encryption algorithm (HTTPS, etc) it is served through?

If there's a get_thread API, which zips the comment thread, includes it in some JSON as base64 along with other metadata (only the thread itself is zipped), and then sends that as the response over HTTPS, is that not secure? Nobody would bat an eye at this scenario, but it's essentially the same as your example because the plaintext is compressed before encrypting and sending. If it's okay to do this for a web app, why is it not okay to do it as part of a home-made RSA implementation.

(Of course, I'm not actually arguing for a second layer of encryption because it is unnecessary. But my understanding is that it wouldn't cause any harm and I'm trying to understand if that's correct or not.)


The example you give is similar to but not quite the same as "lzAES". The distinction is that in your example, the application is deciding whether to compress or not---the input/output behavior of the cipher doesn't include the compression step, so the cipher it self doesn't suffer from the problem I mentioned in my first note.

But it's still possible for an application to use a cipher incorrectly. In particular, an application-level decision about whether to compress some data before encrypting can have an effect on the application's security. In the case you mention it seems unlikely to be a problem (but that's an application-level question, so it could be).

As an example where it seems like the application-level decision to compress or not matters a lot, imagine an application that sends an encrypted password to a server. If the application compresses the password first, an attacker could learn which values are not my password via guess-and-check-length. (Of course, even without compression the attacker can learn something about the length of my password just by looking at the length of the ciphertext---so probably this is a case where the application should first pad the message to some fixed length before encrypting. But in any case it almost certainly shouldn't compress the password!)


You should look into the CRIME, BEAST, and BREACH attacks on TLS/SSL. They're related to using compression before encryption.

The TL;DR is that you generally should not compress secret data before encrypting, especially if part of the request might be reflected in the response.

If you look carefully at your browser's HTTPS traffic, you'll notice that dynamic data is never sent using HTTP compression, though static (basically guaranteed to not contain anything secret) data might still use it.


Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: