Hacker News new | past | comments | ask | show | jobs | submit login

It's Bayesian because we are estimating a (posterior) distribution for the state of the system, rather than just a "point" estimate. Restricting to only the Gaussian distribution (while a nice example) is losing the forest for the trees.

> However, I'm not sure I understand a single component of the expression P(A|B) = P(B|A) P(A) / P(B) in this context. What does A|B or B|A even mean when A is a 3D Gaussian and B is 2D? What are the dimensionalities of these operations? What does it mean to divide a product of pdfs?

I'm not able to understand why any of that is bothersome. Suppose A is the system's state (x,y,z) and B is the sensor reading (measures just the projection along the x-axis, such as a depth measurement). Since the sensor is not perfect, it could generate a range of readings for any given position of the object, specified by p(B[x]|A[x,y,z]) -- suppose you're give one value of B which is sampled from that distribution (say that was a Gaussian with standard deviation sigma). Given that value, you need to estimate the likelihood p(A|B). Alternatively, you can imagine performing variational inference for p(A|B) in the family of Gaussian distributions (which happens to contain the true distribution in this case). Whichever way we go about it, p(A|B) is a Gaussian with the same standard deviation sigma, centered at the measured B. (Intuitively, we can see that a measurement B will constrain the "x" component of A but not the other two coordinates)

p(A) is the "prior" you get as an extrapolation from your knowledge of the system's past state. However complicated/nonlinear the system's evolution function, you can find p(A, t) either as a closed form expression, or by doing simulations, etc. (typically from p(A,t-1) but note, for example, that this general formulation can be used even if your system's dynamics does not satisfy the Markov property).

If you have both p(A|B) and p(A) manifestly in the form of probability density functions, multiplying them is easy. At the end of this process, it is good enough to have an un-normalized posterior for A, which specifies the relative probability of different states. Dividing by p(B) is equivalent to normalizing the candidate posterior distribution. More generally, if the prior and the likelihood are specified by moments, or by some variational parameters, or by simulations, we can think of other ways to combine them.




> I'm not able to understand why any of that is bothersome.

Ok. It's bothersome to me because Bayes gets trotted out in the context of Kalman filters as though it is shedding light on the topic or providing rigor to the approach, but I haven't found anyone who can really describe what the elements of that expression are without hand waiving.

You've switched from capital 'P' to lowercase 'p' in your message (which I take to mean Probability and pdf respectively). If that was intentional, it opens a new batch of questions for what the notation means.

Anyways, maybe I'm just a slow learner, too pedantic, or something else. If you're interested, read my other reply where I go into it more.


I'm not the poster you're responding to, but your observation about the distinction between P and p is crucial. The version of Bayes' rule that is relevant for the Kalman filter is the third one here: https://en.wikipedia.org/wiki/Bayes'_theorem#Random_variable... ("if both X and Y are continuous").




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: