As I recall the Gaussian distributions are the only distribution that have the desired property (i.e. form a monoid). While the Gaussian is important it is by far the only distribution of interest in machine learning. I would like to hear what the author has to say about handling, for example, the beta distribution.
In summary: interesting idea but I'm not sure it applies outside a very narrow domain.
Gaussians definitely are not the only distribution. For example, the Poisson distribution is uniquely determined by its mean, so it also forms a monoid by the same reasoning. The Categorical distribution is also implemented in the HLearn library as a monoid.
The beta distribution is uniquely determined by two parameters called alpha and beta. Typically, these represent the sum of observations we've seen of some events. (On the same website is an example with heads and tails from coin flips.) The beta distribution forms a monoid simply by adding these parameters from each distribution. This is very similar to the Categorical distribution, and also generalizes to the Dirichlet distribution.
Things get really interesting when you stop talking about distributions, and start using them in the context of actual machine learning algorithms. If you know how the Naive Bayes algorithm works, for example, it should be plausible that it forms a monoid in the same way that the Gaussian forms a monoid. The future posts will cover how to apply this to some of the "important" learning algorithms like this.
i wonder if you are thinking of another monoid property of a gaussian?
suppose we have:
x ~ N(0,1)
y|x ~ N(x, 1)
then we have:
y ~ N(0, 2)
i.e., gaussians are closed under marginalization.
however, i believe gaussians are not the only distribution with this property either: i think this property corresponds to the stable family of distributions: https://en.wikipedia.org/wiki/Stable_distributions
Stable distributions are something else, not related to marginals or conditioning. They come up when studying laws of averages.
Gaussian distributions belong to the class of stable distributions, though, because of another of their properties: independent Gaussians, when added, are again Gaussian.
Nope. Closure under convolution is the same as closure under summation of the associated random variable, which is the defining property of stable distributions. This is explained in the first paragraphs of the wikipedia page you linked to ;-)
Closure under marginalization is something else.
It so happens that the functional form of the gaussian satisfies both, but the two properties are not at all the same.
P1: X, Y gaussian => Z = X+Y gaussian
P2: X, Y gaussian => X | Y gaussian
Noel, did you mean to say "while the Gaussian is important it is by far NOT the only distribution of interest in machine learning"? Not trying to put words in your mouth, just seems to me you may have intended that.
In summary: interesting idea but I'm not sure it applies outside a very narrow domain.