Why are you using a Markov process though to model time-dependent likelihood pat...

AndrewKemendo · on Jan 27, 2024

> time-dependent likelihood pathways

Future reward trajectories are THE core focus of multi-step MDP, see Sutton [1]

"Now we consider transitions from state-action pair to state-action pair, and learn the value of state-action pairs. Formally these cases are identical: they are both Markov chains with a reward process. The theorems assuring the convergence of state values under TD(0) also apply to the corresponding algorithm for action values: "

I wasn't going to differentiate in my original post between sub-types of "cycles" within increasingly complex MDP's for long sequence reward estimation:

[1]http://incompleteideas.net/book/ebook/node64.html

j7ake · on Jan 29, 2024

You’re just quoting from Sutton’s reinforcement learning book, which proposes a learning algorithm with a Markov process assumption.

Markov processes are nice because they are simple objects and therefore have nice properties and solid mathematical proofs.

Many mathematical models are studied because they have nice theoretical properties and one can prove theorems about them. This should not be mistaken with an actual mechanistic explanation for complex emergent phenomena like human decisions.

rramadass · on Jan 27, 2024

Your question is valid. I think the person is just using bombastic words for something already well-known and simpler. A Markov Chain is just a FSM with probabilistic transition functions and in the limit is just a deterministic FSM when the transition function probability becomes 1.