Card games e.g. Poker isn't a perfect information game, so history does matter, ...

chris_st · on May 20, 2018

Sorry, should have given a more concrete example, more what I'm interested in.

I was thinking Hearts or Spades, which aren't perfect information, but since the whole deck is dealt out, you can (with a lot of memory :-) know what cards are left.

clickok · on May 20, 2018

Yeah, so here we can "solve" the problem without using a POMDP by really expanding what constitutes the state space.

Initially, your state is your hand plus the initial rules (who goes first, which suit is trump, etc depending on what kind of game you're playing). After each player's turn you transition into a new state, which is <<starting hand>> + <<history of cards played>>, so at each point in time the state has all the available information. Obviously this leads to a combinatorially huge state space, because there's O(52!) different possible sequences for a standard deck of cards.

Typically you have to use some sort of approximation architecture because evaluating all different possible responses to a given sequence of cards would both take too long and even if you could do it, the memory requirements would be enormous. See for example DeepStack[0], which uses a neural net to approximate the quality of positions + possible responses (along with other techniques), rather than having a lookup table for what to do in each possible game state.

So you can formulate such games as MDPs, but the question then becomes: is this a useful way to think about it? Sometimes it is, because your approach to the problem scales and so you can throw enough resources (training data, compute power) at it to learn a useful strategy; other times there is a better way of formulating the problem[1].

I have some implementations of reinforcement learning algorithms and code for working with MDPs if you're interested in that sort of thing[2], and of course I highly recommend Rich's book.

---

0. https://www.deepstack.ai/ -- note that it's not really an RL system, but it does illustrate using neural nets to learn complicated functions in a large state space.

1. For example, I am not sure that such an MDP for hearts is well-posed: the convergence results that I'm familiar with rely on the MDP being ergodic and aperiodic, but the suggested construction is obviously periodic.

2. The algorithms: https://github.com/rldotai/rl-algorithms An MDP library: https://github.com/rldotai/mdpy (check out notebooks/example_notebook.ipynb for a basic demo).