Hacker News new | past | comments | ask | show | jobs | submit | more zx2c4's comments login

> This is what the patch does. It does not handle the case of VM resume yet.

Actually, it does. Look at the use of v->generation.


It's kind of wild, yea. I'd rather not do it. But if it's between unsafe userspace implementations and this thing, this thing is better. Maybe people will decide they don't care about hyperspeed card shuffling or whatever else. But if they do, this is an attempt to provide it safely.


I guess my biggest concern here is the notion that vDSO is going to manage the state in user space, if I understand correctly. That seems like a big footgun.

If I call the getrandom system call, and it succeeds, I am (pretty much) guaranteed that the results are properly random no matter what state my userspace program might be in.

With vDSO, it seems we lose this critical guarantee. If a memory corruption occurs, or my process’s memory contents can be disclosed somehow (easier to do against a userspace process than against the kernel!), I don’t have truly random numbers anymore. Using a superficially similar API to the system call for this seems like a really bad idea.


> If a memory corruption occurs, or my process’s memory contents can be disclosed somehow (easier to do against a userspace process than against the kernel!), I don’t have truly random numbers anymore.

Yea, that's definitely a downside of sorts. Jeffrey Walton mentioned that in the glibc discussion a few days ago: https://lore.kernel.org/linux-crypto/CAH8yC8n2FM9uXimT71Ej0m...

A mitigating factor might be that if your process memory leaks, then the secrets generated leak anyway, no matter what generated them, so maybe not as large of a difference. But of course a generator leaking means future secrets potentially leak too. I suppose frequent reseeds could mitigate this, just as they do for potential leaks in the kernel.

But anyway, I agree that compromising application memory is somewhat more possible than compromising kernel memory, though both obviously happen.


If you have memory corruption in your process, what makes you confident your program state will let you do something useful with the randomness you get back from getrandom()?


I guess my concern is with “silent” memory corruption, e.g. someone putting in a “bzero(state, …)” by accident and winding up with deterministic randomness. Sure, they could also just as well do a “bzero(randombuf, …)” before using it but that’s much easier to detect (and in my head, somewhat harder to do by accident).

Silly mistakes like the Debian randomness bug come to mind - a program can be totally well-behaved even in the face of a glaring entropy failure, in a way that’s hard for developers to detect.


I guess? I mean, I see "something overflowed on the stack and into my randomness buffer" as being similarly common and about as undetectable. That's not to say we shouldn't invest in making APIs that are harder to misuse even if you hold them incorrectly, but I'm not sure the benefits are very compelling here.


The vDSO page is mapped without write though, just r and x.


> hyperspeed card shuffling

The article mentions this case too. getrandom() on my system seems to return the required amount of random bits to perform a shuffle of a deck in less time than my clock seems to have precision for; … that's … too slow?


There are cases where you want tons of random numbers (e.g. monte carlo) and the line between "good enough" and "disastrously bad" is often unclear. Providing cryptographic random numbers is the only possible API that's both safe and generic.

As the post says, it's worth entertaining the idea of having the kernel provide a blessed way for userspace to do that, though I admit I've never personally seen a scenario where RNG was truly the bottleneck. But it'd still be nice to kill all the custom RNGs out there.


Don't you always want a reproducible random sequence for such simulations? I.e you use getrandom for the initial seed only, record it, and do the rest of your RNG state in userspace code?


It's a nice property, but a lot of people skip it because of the tradeoffs. I'm also sure there are lots of use cases I'm not aware of where you don't want reproducibility.


If you're into the open ISA idea but find the big guys a bit intimidating, you might have fun with OpenRISC. At least lately I've had a blast hacking on it. The kernel and QEMU implementations are very simple, and Stafford Horne is fun to talk with.



CONFIG_RANDOM_TRUST_CPU=y

CONFIG_RANDOM_TRUST_BOOTLOADER=y

CONFIG_HW_RANDOM=y

CONFIG_HW_RANDOM_TPM=y

and so forth all exist.


That's what this demo code project is about:

https://git.zx2c4.com/seedrng/tree/seedrng.c

https://git.zx2c4.com/seedrng/about/

https://twitter.com/EdgeSecurity/status/1509002499507818500

It's trying to do the seed file thing the "right way", and be portable enough that init systems can just copy and paste this where it fits.


There are a few intertwined closely related pitfalls that are each subtly different:

1) "premature first" wrt non-local attacker: this is the problem you identified - the RNG initializes when there's actually only 1 bit of entropy, and then SSH generates keys that some researchers bruteforce years later.

2) "premature first" wrt local attacker: the RNG has no entropy. Something legit feeds it 32 bits of entropy, and the kernel mixes that entropy directly into the key that's generating the /dev/urandom stream. Local unpriv'd attacker reading /dev/urandom (or some remote attacker who has access to overly large nonces or something) then bruteforces those 32 bits of entropy, compromising it, since it's only 32 bits.

3) "premature next": the RNG has some entropy. That entropy gets compromised somehow. Then the "premature first" wrt local attacker scenario happens. Maybe you think this is no big deal, since a compromise of the RNG state probably indicates something worse. But compromised seed files do happen, and in general, a "nice" property to have is that the RNG eventually recovers after compromise -- "post compromise security".

Problem set A) A malicious entropy source can currently cause any of these due to the lack of a fortuna-like scheduler. Since we just count "bits" linearly, and any source that is credit-worthy bumps that same counter, a malicious source can bump 255 bits and a legit source 1 bit, and then an attacker brute forces the 1 bit.

Problem set B) Making writes into /dev/[u]random automatically credit would cause the same issue, since it's already common for people to write non-entropic stuff into there (e.g. Android cmdline), and because others manually credit afterwards, and mixing into the /dev/urandom key without crediting would also cause a premature next issue, since some things trickle in 32 bits at a time. And other things that trickle in more bits at a time might still only have a few of those be entropic. Yada yada yada, it would cause some combination of the problems outlined above.

In spite of problem set A, the kernel currently does do a few things to prevent against these issues. First, it avoids problem set B, by not implementing that behavior. More generally, /dev/urandom extracts from the entropy pool every "256 bits" and 5 minutes. And, in order to prevent against a "premature first" it relaxes that 5 minutes to 5 seconds, then 10 seconds, then 20 seconds, then 40 seconds --> 5 minutes during early boot, so at least a potential "premature first" gets mitigated somewhat quickly.

Problem set A still exists, however. Whether anybody cares and what the code complexity cost is versus the actual risk of the issue remains to be seen, and should make for some interesting research.


Yeah. I see (1) and (2) as instances of the same basic problem, and (3) as mostly a non-problem (like, you do the best you can to get compromise recovery from a CSPRNG, you don't do nothing, but you don't hold up progress on it).

But from my read of the backstory here, the problem is userland regressions on (1) and (2), and I buy that you simply can't have those.


> (3) as mostly a non-problem (like, you do the best you can to get compromise recovery from a CSPRNG, you don't do nothing, but you don't hold up progress on it).

Mitigating that attack is the main selling point of Fortuna, which makes this attack way harder. I think this is the primary thing we would get from a Fortuna-like scheduler that we don't currently have or can't currently have given the present design.

> But from my read of the backstory here, the problem is userland regressions on (1) and (2), and I buy that you simply can't have those.

Yea so the way these interact with the current story is in two totally opposite directions.

The original thing -- unifying /dev/urandom+/dev/random -- was desirable because it'd prevent (1)-like issues. People who use /dev/urandom at early boot instead of getrandom(0) wouldn't get into trouble.

Then, in investigating why we had to revert that, I noticed that the way non-systemd distros seed the RNG is buggy/vulnerable/useless, but fixing it in the kernel would lead to issue (2) by introducing problem set B. So instead I'm fixing userspaces by submitting https://git.zx2c4.com/seedrng/tree/seedrng.c to various distros and alternative init systems.

By the way, running around to every distro and userspace and trying to cram that code in really is not a fun time. Some userspaces are easygoing, while others have "quirks", and, while there has been reasonably quick progress so far, it's quite tedious. Working on the Linux kernel has its quirks too, of course, but it's just one project, versus a handful of odd userspaces.


> I'm surprised to be reading justifications that amount to "it's been deployed for several years now, so we think it's OK",

I'm not making any claims or justifications about it being good or not. Just a simple statement that Linus put it there, and there it is, and it's been that way for a while. I even referred to it as "voodoo". As I mentioned in the post, 5.18 didn't change anything about entropy sources and gathering. Certainly trying to see more rigorously whether or not the Linus Jitter Dance is defensible would make for a worthwhile research project.


Any comments about how related the Linus Jitter Dance (btw, glad to see that djb isn't the only one making new dance moves..) is to JitterRNG: https://www.chronox.de/jent.html ?

The PDF linked to on the page goes into more detail but that is measured across a number of CPUs and has good performance in both amount and entropy quality produced. You do need to measure it per CPU, but it does comply with SP 800-90B which is what the US Gov considers the standards for randomness.


Here's the document for that change: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2841.htm

> In this proposal, a function declarator without a parameter list declares a prototype for a function that takes no parameters (like it does in C++).

And it seems like gcc implements this under -std=c2x now:

    zx2c4@thinkpad /tmp $ cat a.c
    int blah()
    {
            return 7;
    }
    
    int main(int argc, char *argv[])
    {
            return blah(argc);
    }
    zx2c4@thinkpad /tmp $ gcc -std=c17 a.c
    zx2c4@thinkpad /tmp $ gcc -std=c2x a.c
    a.c: In function ‘main’:
    a.c:8:16: error: too many arguments to function ‘blah’
        8 |         return blah(argc);
          |                ^~~~
    a.c:1:5: note: declared here
        1 | int blah()
          |     ^~~~


Yes. This happens via random.c's add_hwgenerator_randomness() hook, which the hwrng framework calls from a kthread.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: