Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Though it should be noted those “4 random word” passwords are strong only if the words are truly random (and the string is less likely to be memorable in this case).

A password generator that allows retries means people will hit that button until the string is memorable, reducing the entropy.



I was curious how much the entropy is reduced.

As a simplifying assumption, assume everyone agrees about which of any 2 strings are more memorable.

If someone takes m random samples, and of those, takes the one they find most memorable, how much does this reduce the entropy? If there are N possible strings, and so with a uniform distribution there would be, uh, -log_2(1/N) bits of entropy, I think(?) (because, summing over the N terms of -(1/N) * log_2(1/N) , gives a total of log_2(N) ) If one takes the maximum of m samples, what does that look like? The cdf of the uniform distribution over the terms (identified with their order in the list ordered by memorability) would be P[x \le a] = a/N , and with m independent samples , P[max(x_1,x_2,...,x_m) \le a] = (P[x \le a])^m = (a/N)^m = (1/N)^m a^m, and so the pdf would be, around (1/N)^m * m * a^(m-1) (approximating it as continuous because N is large. I am not sure that this is a reasonable approximation.) Then, the sum becomes, uh, again approximating as continuous, integrating from a from 0 to N, (1/N)^m * m * a^(m-1) * (-1) * log_2((1/N)^m * m * a^(m-1)) da , which is integral of (1/N)^m * m * a^(m-1) * (-1) * ( mlog_2(1/N) + log_2(m) + (m-1)log_2(a)) da which is, (mlog_1(1/N) + log_2(m)) + integral of (1/N)^m m(m-1) a^(m-1)*log_2(a) da ...

uh..... ok I just threw wolframalpha at it, and I got, -log_2(m/N) + ((m-1)/(m ln(2))) which, subtracting that from the initial -log_2(1/N) , gives log_2(m) - ((m-1)/(m ln(2))),

and that "((m-1)/(m ln(2)))" is about like, 1 or 2 or therabouts (it is 0 if m=1 of course).

so, if all the perhaps questionable approximations I made didn't mess this all up (and I didn't mess this up in some other way), I think that says that, if you pick the most memorable out of m random strings, by doing so you reduce the entropy by about log_2(m) + 1 bits.

That doesn't sound too bad to me, really. Well, I suppose it depends how many bits you have to spare, and how big of an m you pick.


Love this! Thanks, it was a casual idea of mine that I didn’t really think through before.

Here’s a slightly different approach to this. Let’s instead assume that the set of “memorable” strings is constant (say of size N/M where N is the number of all strings) and the user hits as many retries as needed to get a string from the memorable set. If the number of retries is a random variable X, then if we know the distribution of X we know M. Since the number of bits lost is something like \log_2(M), we just want to find out how X relates to M.

EX = \sum_{i\geq 0}i(1-1/M)^i(1/M) = WolframAlpha :) = M - 1

So it matches: if your average number of tries is M - 1, you lose something like \log_2(M) bits of entropy.

Makes me feel better about all those times when I hit retry a dozen times.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: