But you know neither the chance of the new rocket being explosion proof, nor the...

joshuamorton · on May 7, 2018

Enter the Beta(variate) distribution.

The beta distribution[1] is a cool statistical distribution defined by BetaDist{a,b} (or alpha, beta, but that's too much work), where a is the number of successes and b is the number of failures you've sampled.

It has a number of cool properties, chief among them that given X = BetaDist{a,b}, then cdf(X, x) = the probability that the mean of the distribution you are approximating is less than x. It has a bunch of other nice properties too (like E[X] = a / (a + b), which should be obvious), but those aren't as relevant here.

So let's say that you assume a uniform prior. This is defined as BetaDist{1,1} [2]. this is probably the wrong prior. So you might have a better idea. If, for example, you believe there is a 10% chance of your rocket exploding based on some calculations you've done, you might use a differently tuned beta distribution, like BetaDist{9,1} or BetaDist{4.5,.5} if you were feeling uncertain (but in general it would likely be better to use {8,2} in that situation iirc). But let's assume {1,1} for now.

So you launch your rocket. Everything goes great. You update your distribution. Its a success. So you get BetaDist{2,1} [3]. So what is the chance your rocket explodes? Well the cdf of your beta distribution is the probability that the mean is less than x. So the pdf of the beta distribution is the probability that the mean is exactly x. So then

The integral from 0 -> 1 of `(1 - x) * pdf(X, x) dx` is the estimated probability that your rocket explodes on its next launch, since that's "for every value x, the likelyhood of the distribution being that one multiplied by the chance your rocket explodes given that distribution". For the one rocket case, this happens to be equal to E[X] = a / (a + b), so it's 1/3.

For the two rocket case, you apply reinforcement learning/k-armed bandit strategies like UBC1[4] or Thompson Sampling[5]. These are algorithms that will result in you picking the best rocket with as few unnecessary explosions as possible, provably.

You can see some related discussion I've had on HN about these algorithms [6].

[1]: https://en.wikipedia.org/wiki/Beta_distribution

[2]: http://www.wolframalpha.com/input/?i=beta+distribution+(1,1)

[3]: http://www.wolframalpha.com/input/?i=beta+distribution+(2,+1...

[4]: http://banditalgs.com/2016/09/18/the-upper-confidence-bound-...

[5]: https://en.wikipedia.org/wiki/Thompson_sampling

[6]: https://news.ycombinator.com/item?id=17014232

pps43 · on May 7, 2018

Wouldn't it be easier to just add one success and one failure and calculate (events+1)/(trials+2) instead of events/trials?

joshuamorton · on May 8, 2018

I'm not sure what you mean.

pps43 · on May 8, 2018

I'm talking about pseudocounts (add one success and one failure) [1] or maybe Agresti-Coull estimator (add two successes and two failures) [2].

[1] https://en.wikipedia.org/wiki/Rule_of_succession

[2] http://users.stat.ufl.edu/~aa/articles/agresti_coull_1998.pd...