As an amatuer I think Markov chains are explicitly a crude frequency association whereas what exactly a neural network is storing to predict the next token involves stored representations in neural weights which can be far more nuanced.
Thank you, this is a good compressed and less-salty response. I appreciate your contribution to the conversation and will use aspects of it when trying to explain the matter in the future. <3 :thumbsup: