Hacker News new | past | comments | ask | show | jobs | submit login

Why are you using a Markov process though to model time-dependent likelihood pathways ?

Doesn’t make sense. Your next step depends on much more than just knowing where you are at S. One needs to account for the history of where you were before.

Or maybe you’re just using technical words with precise meanings to describe a vague imprecise heuristic?




> time-dependent likelihood pathways

Future reward trajectories are THE core focus of multi-step MDP, see Sutton [1]

"Now we consider transitions from state-action pair to state-action pair, and learn the value of state-action pairs. Formally these cases are identical: they are both Markov chains with a reward process. The theorems assuring the convergence of state values under TD(0) also apply to the corresponding algorithm for action values: "

I wasn't going to differentiate in my original post between sub-types of "cycles" within increasingly complex MDP's for long sequence reward estimation:

[1]http://incompleteideas.net/book/ebook/node64.html


You’re just quoting from Sutton’s reinforcement learning book, which proposes a learning algorithm with a Markov process assumption.

Markov processes are nice because they are simple objects and therefore have nice properties and solid mathematical proofs.

Many mathematical models are studied because they have nice theoretical properties and one can prove theorems about them. This should not be mistaken with an actual mechanistic explanation for complex emergent phenomena like human decisions.


Your question is valid. I think the person is just using bombastic words for something already well-known and simpler. A Markov Chain is just a FSM with probabilistic transition functions and in the limit is just a deterministic FSM when the transition function probability becomes 1.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: