Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don’t think this is _that_ interesting because the probabilities don’t add up to exactly 50%.

We have:

- the smallest prime factor of less than half of all integers is ≤ 31

- the smallest prime factor of over half of all integers is ≤ 37

So, clearly, the prime p where exactly half of all integers have something ≤ p as their smallest prime factor must lie between 31 and 37 :-)

Alternatively, one could argue such a prime doesn’t exist. That, I think, is the better answer.

That doesn’t imply there’s no good definition of that median, though because it’s 100% acceptable to have a set of numbers whose median is not an element of that set. https://en.wikipedia.org/wiki/Median#Finite_data_set_of_numb...:

“If the data set has an even number of observations, there is no distinct middle value and the median is usually defined to be the arithmetic mean of the two middle values.[1][2] For example, this data set of 8 numbers

1, 2, 3, 4, 5, 6, 8, 9 has a median value of 4.5”

So i guess the hunt still is on for a good argument as to why a non-prime such as 36 or even a real such as 36.716… should be called the median of the smallest prime factors of all integers (I wouldn’t know whether that exists)



As the author says, this isn't the median of any particular data set, but the limit of the median as the number of points goes to infinity:

> "Obvoiusly" there's no uniform distribution on the natural numbers, so what does it even mean to choose a "random" one? The way the number theorists usually solve this problem is by fixing a large number N and looking at the probabilities when you pick a random number between 1 and N. Then you look at the N → ∞ limit of these probabilities.

And by using the λ₂(p) function, we can find the asymptotic proportion that each prime p appears in the data set: if the set has N items, then approximately λ₂(pN of them will be equal to p, and the actual number will asymptotically approach this approximation. Given this, we can look at the set in terms of the proportions of its elements. We have first, P(p < 37) = λ₂(2) + ... + λ₂(31) ≈ 0.49061; second, P(p = 37) = λ₂(37) ≈ 0.00964; and third, P(p > 37) = 1 − P(p < 37) − P(p = 37) ≈ 0.49975.

Thus, if we sort the set with N data points in ascending order, we'll ultimately see 0.00964·N instances of 37, flanked to the left by 0.49061·N values less than 37, and to the right by 0.49975·N values greater than 37. For N odd, the median is defined as the value at the 50% mark in the sorted set, which is clearly 37. For N even, the median is defined as the mean of the two values surrounding the 50% mark; since 0.00964·N → ∞ as N → ∞, there will eventually be at least two instances of 37 around the 50% mark, so the median will once again be 37. Thus, for N even and odd, the median will always end up at 37.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: