Hacker News new | past | comments | ask | show | jobs | submit login

For something that has state that is not always fully known (for example, some cards have been played so we know what they were, but we don't know what's left in the deck or in other players' hands), what we're describing is a _partially observable_ Markov decision process, or POMDP.

In short, a regular MDP models moving from one state to future states with some probability. In a POMDP, we don't know which state we're in, but we have some guesses, i.e. we have a probability distribution over all states. So we model a sequence of taking actions and performing observations, and using those observations to update the probability distribution, so that we can reduce the uncertainty as much as possible.

The typical example is robot navigation, where POMDPs are used extensively. For example, a blind robot starts out somewhere, it doesn't know where, so the probability distribution over all possible locations is uniform. Then it moves north, and discovers that it was successful; this observation modifies the PD to states that are reachable by moving north from previously likely states (conceptually: this will exclude all sorts of corners and dead ends from consideration). Then let's say it tries to drive north once again, but it discovers it bumped into a wall and was unsuccessful. Now the PD gets updated again, based on previous PD and the results of this action (conceptually: we bump the probability of any state that we can drive north to, and hit a dead end or a t-intersection), and so on. As this sequence of actions and observations continues, we hope the probability distribution will collapse to the specific location that we're in, or a small set of guesses.

(This is a narrative description of course; for the nitty gritty details there are many more precise materials on POMDPs! :) )

By the way, this is still a _Markov_ process: the trajectory of _how_ we got to some probability distribution doesn't matter, if those sequences of actions produced the same result they are treated as equivalent per Markov assumption.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: