Hacker News new | past | comments | ask | show | jobs | submit login

Oh, I see where you're getting at.

I think I can clear that up for you: Bayes formula works for both concrete probabilities (P(A), P(A|B), etc..) as well as PDFs (p(A), p(A|B), ...). So

    p(A|B) = (p(B|A) * p(A)) / p(B)
is just

    P(A|B) = (P(B|A) * P(A)) / P(B)
the only difference is that in the first version the inputs are functions of A and B and the output is a function of A and B. In the second case you have A and B given and the output is a concrete number.

In the case of the Kalman filter (or LS estimation in general) all your p(..) PDFs are Gaussians.




Yeah, the difference between lowercase p and capital P seems pretty important. Most places show capital P, so that's part of the confusion.

There's more though. Lowercase p(B|A) seems to really mean p_B(h(a)), and that's not obvious. Hell, I might still have it wrong.

And most everyone says to ignore p(B) in the denominator, but that's really sloppy hand waiving. The notation means something, and there should be a well defined set of substitutions, but in each of the four terms, they do something radically different. I can't see a pattern to follow.


Lowercase p(B|A) seems to really mean p_B(h(a)), and that's not obvious. Hell, I might still have it wrong.

I think you have it about right. It is "semi-obvious" by the fact that of course, observation and state are related through the observation function (in your case, observing a 3D point as a 2D coordinate) and that is of course part of the PDF.

And most everyone says to ignore p(B) in the denominator, but that's really sloppy hand waiving.

It's not. You want to know the most probable value of A. In other words you are looking for the argmax of a function of A. p(B) is purely a function of B, no A involved, so it becomes a constant in your equation. Since it's in the denominator, it's a normalizing constant.

Note that if you have the "Uppercase Bayes" (P(A|B) = ...) you are looking for concrete values so the normalizing P(B) does matter.

Now in the case of "lowercase Bayes" (p(A|B) = ...) it matters just as much, but you can still ignore it if all you're looking for is the argmax of the resulting PDF, as the p(B) is just a scaling constant and it's not changing the argmax of p(A|B).

but in each of the four terms, they do something radically different. I can't see a pattern to follow.

I don't understand what you mean here.


> It's not. You want to know the most probable value of A.

Nah, it's really not that simple. When I've done this in the past, I've needed both the mean (which is the mode for Normal distributions) and the variance, so I can make confidence ellipses. I don't just care about the most probable location.

I already have a set of techniques for working with Kalman filters. The only reason I would want to understand applying the Bayes' theorem in this context is if it offers insight into a wider class of problems (non-Gaussian PDFs) or if it helps me communicate with others. In both of those cases, I'd like to understand the thing first before I hand-waive the denominator away.


Nah, it's really not that simple.

Well, you came here and asked and I gave you a response because I happen to know the topic and wanted to help. It's your choice to not believe what I explained but I doubt you'll get a very different answer from other people.


Heh, I think we stepped in the wrong direction. I didn't mean to offend you.

Take care.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: