Markov Chains Explained Visually (2014)

kastnerkyle · on March 20, 2016

For anyone interested in Markov chains (aimed at language) in Python, and their relationship to the larger world of language modeling (including "modern" RNNLMs and so on) this post by Yoav Goldberg is an excellent introduction, with clear and simple code [1]. Extending the Markov chain even a few steps can give really impressive generated results. Clone/download [2] if you want to run the code yourself.

[1] http://nbviewer.jupyter.org/urls/gist.githubusercontent.com/...

[2] https://gist.github.com/yoavg/d76121dfde2618422139

xnzakg · on March 20, 2016

Woah, spent way too much time making this: http://setosa.io/markov/index.html#%7B%22tm%22%3A%5B%5B0%2C0...

agumonkey · on March 20, 2016

Very nice lattice.

nmaggioni · on March 20, 2016

Last year I made a pseudo-random text generator based on Markov Chains as a summer project to learn C++: https://github.com/nmaggioni/MarkovText

It's in Italian since it just was an hobby project and I didn't spend too much time adapting it for the public domain, but it may be interesting for someone.

herval · on March 20, 2016

I have a Twitter bot that makes up book passages combining books from the Gutenberg project: https://twitter.com/markovian_lit Code is also on GitHub if it interests anyone - not the prettiest, but it was fun to build :-)

michaelfeathers · on March 20, 2016

Here's a Ruby version in 16 lines of code: https://gist.github.com/michaelfeathers/2cf9e1599e06d0563f2e

theoh · on March 20, 2016

For the case of Markov chains with memory, Dasher is a good learning tool (even though the word Markov is never used).

http://www.inference.phy.cam.ac.uk/dasher/

(Actually it seems I mentioned Dasher when this same link was posted in 2014, but it bears repeating.)

erikpukinskis · on March 20, 2016

Interesting you mention Dasher. A month or so it occurred to me that some variation on Dasher might be the text input method of choice in VR.

theoh · on March 20, 2016

Along those lines, maybe also something like Fleksy or Swype, which are reportedly usable by blind people -- fleksy in particular is very smart about determining the size and orientation of the invisible "keyboard" the user is typing on.

rhodysurf · on March 20, 2016

I like this. Markov chains can be hard to explain without examples.

ams6110 · on March 20, 2016

It's a nice demo but when I moved one of the probability sliders from 0.5 to 0.9 my browser essentially locked up for nearly two minutes. (FF 45.0.1).

Seems we're still a ways from achieving native browser animation performance that Flash (for all its other problems) had a decade or more ago.

abiox · on March 20, 2016

i didn't have any lockups in chrome.

vicapow · on March 20, 2016

Find the source code for this and other explanations at: https://github.com/vicapow/explained-visually/tree/master/cl...

jokoon · on March 20, 2016

I don't see how a markov chain has something to do with pagerank. I understand that pagerank use graphs, and a markov chain is some kind of graph, but I don't see anything else.

pedrosorio · on March 20, 2016

PageRank assumes your browsing activity is a Markov chain in the graph defined by hyperlinks in the web (assuming 1/n probability for each outgoing link in a page). The rank of a page is the limit #visits to the page/#total visits (which is the eigenvector of the transition matrix for eigenvalue 1)

max_ · on March 20, 2016

This site is excellent for maths!!!! I recommend it to everyone!!!

Thanx sonabinu!!!

sonabinu · on March 20, 2016

Thanks Max! I think it's a great resource too !!!

Link-djst · on March 20, 2016

Nice way to learn new things!

wodenokoto · on March 20, 2016

I think it would be much more interesting with a guide to hidden markov models.

I'm surprised how much focus there is on regular markov chains without any hidden units on places like hacker news.

escobar · on March 20, 2016

This has previously appeared on HN in July 2014 :)

https://news.ycombinator.com/item?id=8103240

xchaotic · on March 20, 2016

But there wasn't enough hype around ML back then. A few Go wins later and it's getting a top spot.

dang · on March 20, 2016

Cheap shots like this degrade the community, so please don't post them here.

The story spent hours at #1 back then, and had three times as many votes as this resubmission (so far).

creed0r · on March 20, 2016

Am I right in taking from this that markov chains don't necessarily adjust to history?

So in the example, if a "sunny day" state has a chance of 0.9 to stay in that state and a 0.1 chance of transitioning to rainy day and I already had 5 sunny days, the 6th day would still have a 0.1 chance to be a rainy day?

dasboth · on March 20, 2016

Yes that's right. The reason they're "Markov" chains is that they make the "Markov assumption" which is usually stated as "the future is independent of the past given the present". This just means that the probability of the next timestep depends only on what happened in the current one; it has no memory as such.

As detaro says:

"A Markov chain with memory for the last m steps is sometimes called a "Markov chain of order m""

So if you wanted it to depend on the last 5 timesteps, you'd create a 5th order Markov chain, and so on.

ifdefdebug · on March 20, 2016

Probability doesn't adjust to history either: if you flip coins, and if you already have 5 heads in a row, your chance for a head the next time you flip is still exactly one half.

But weather is not driven by probability so history might matter.

detaro · on March 20, 2016

Yes, the classical Markov chain has no memory. A Markov chain with memory for the last m steps is sometimes called a "Markov chain of order m"

skybrian · on March 20, 2016

You could count sunny days by encoding the count as different states in a more complicated markov chain:

sunny1 -> sunny2 -> sunny3 -> sunny4 -> sunny5.

But as with regular expressions, you can't go to arbitrary depth.

justifier · on March 20, 2016

when this first hit hn it had an enormous rank of 1070, and the comments are all about the breakthrough importance of markov chains

for those who follow.. is this still the case?

i feel like i hear less and less about markov chains

are they still in research and use?

is page rank still the most prominent search algo? does page rank still use markov models?

i always felt these probabilistic models were a conflation of correlation and causation

jamessb · on March 20, 2016

the breakthrough importance of markov chains

i feel like i hear less and less about markov chains are they still in research and use?

Markov chains will always be important, because they are a fairly fundamental concept (going back to the early 20th century).

They idea is certainly still widely used: two major classes of applications are Hidden Markov Models (where you attempt to infer the hidden state of a system from observations that depend on the state) [1], and Markov-Chain Monte Carlo (MCMC) [2] (where you sample from a probability distribution function by constructing a markov chain whose equilibrium distibution is that distribution).

is page rank still the most prominent search algo?

PageRank is really a ranking algorithm, rather than a search algorithm. It's been applied to many things other than web searching, and has many variants (e.g. see the review article "PageRank Beyond the Web" [3). That said, Google rankings depend on many things other than PageRank.

i always felt these probabilistic models were a conflation of correlation and causation

What do you mean?

[1]: https://en.wikipedia.org/wiki/Hidden_Markov_model#Applicatio...

[2]: https://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo

[3]: http://epubs.siam.org/doi/10.1137/140976649

justifier · on March 20, 2016

> i always felt these probabilistic models were a conflation of correlation and causation

> What do you mean?

the example from the post of weather events is a simplified version of what i mean

the collected data expresses that there is a probability of 50/50 rain versus shine

the first string of possible outputs is scattered and more random than the real data, so weights are introduced to nuance to the output and the second string is more closely representative of the data

but this introduces a relative psuedo causation, what the author refers to as stickyness

it's reminiscent of the monty hall problem(o)

altering the state of the model causes its probability to change relative to a specific potential, but only within the limited scope of the probable space

choosing one door causes the probabilities associated with the state to change but the system itself is unchanged by the choice

and what you see with the weather is a simulation of the collected data stead the underlying system, which i feel conflates correlation with causation

(o) https://en.wikipedia.org/wiki/Monty_Hall_problem

kastnerkyle · on March 20, 2016

RNNs (recurrent neural networks) can be seen as generalizations of Markov chains - or as I prefer think of it, Markov chains are very limited RNNs. There is an enormous amount of research happening around these sequential models (and differentiable DAGs (directed acyclic graphs) aka neural networks in general) these days.

Markov chains have never (in the work I have seen at least) tried to imply causation - purely correlation/probability. There is other recent research on causality (see the work of Pearl), but just because one thing very likely follows another doesn't mean the preceding event "caused" the next one, as both events could be side effects from the same (unobserved) root cause.

justifier · on March 20, 2016

but the difference between neural nets and markov chains is the probabilistic element of nn is the optimal traversal and could be replaced with a perfect algo if one were found to be computationally feasable

whereas markov chains start from a place of probability and require it

the construction stage of the markov model agrees with your statement:

> just because one thing very likely follows another doesn't mean the preceding event "caused" the next one

but when in use the current state is the cause of the following state informed by your model

nn can simulate a system whereas an mc can only simulate data from a system

kastnerkyle · on March 20, 2016

RNNs are optimizing a probability, at least with standard training to maximize likelihood. As you say, with a different training criterion sure it could do something different - but so could a Markov chain if it was built to optimize a different objective (though it wouldn't be a Markov chain anymore I don't think).

Look at reinforcement learning - loads of stuff making the Markov assumption and doing quite well on complex tasks, though once again this is quite different than a Markov chain.

In theory (at least my opinion on theory) if we had an infinite size "state" it would be possible to just encode all past information in every state, and use Markov rules. It is just not practical, and much easier to extend lookup to multiple timesteps in the past especially with better understanding of optimizing things like RNNs.

Choosing the maximum p(X_t+1 | X_t) does not infer causality in any way in my mind - only saying that given the data when X_t happens, some event at X_t+1 is likely to happen. Classic correlation. I don't see where you are seeing causality in that part. I also don't see a distinction in simulating a system and data from a system - to me it is just that NNs can model much more complex systems than typical Markov chains.

In general I agree with you, but I find these two concepts are much, much more similar than they are different.

justifier · on March 21, 2016

> but so could a Markov chain if it was built to optimize a different objective (though it wouldn't be a Markov chain anymore I don't think).

i wanted to think about this, and i thought maybe you could have a purely markov model if it were a self constructing self similar model

a markov fractal

at each event the chain reconstructs itself from its new perspective in the fractal

but even done elegantly i think it would be more undue overhead to constantly reconstruct the model than just feeding a self similar model constantly reconstructed inputs, which i see a more clear affinity to neural networks or something wholly different than either

smac8 · on March 20, 2016

Just to add a comment on page rank, from what I've heard that is a actually a very small contributor to the overall google algorithm. That is - at least in the way it was originally conceived by Larry Page - PageRank is now only a minuscule tuner amongst 100's of other tuners in the overall search algorithm now, but because it was the first and core part of the google algorithm, people still continue to assume it is the predominant one.

sillysaurus3 · on March 20, 2016

This isn't quite related to your question, but I think Markov chains made their debut in the original information theory paper by Shannon: http://worrydream.com/refs/Shannon%20-%20A%20Mathematical%20...

It doesn't necessarily matter whether they're still in use. All of these tools are worth learning, so that if a problem ever comes up that can be solved by markov chaining, you'll know it right away.

Same for all the other tools, though. And there are a lot of them.

jamessb · on March 20, 2016

This isn't quite related to your question, but I think Markov chains made their debut in the original information theory paper by Shannon

What? That paper even says: Stochastic processes of the type described above are known mathematically as discrete Markoff processes and have been extensively studied in the literature; the corresponding footnote says "For a detailed treatment see M. Frechet ... 1938".

sillysaurus3 · on March 20, 2016

Always happy to be wrong, as it means I've learned. Thanks.

thaw13579 · on March 20, 2016

They're widely used for error correcting codes and MCMC and are generally important for modeling any kind of discrete sequential process. I guess the theory is well understood though, so deep learning etc. gets more attention.

vc98mvco · on March 20, 2016

Sorry to be off-topic, but the site partially disables my input, i.e. causes high lag. It makes it unusable. I'm using Firefox on a modern PC.

Opera seems to run fine, but it maxes out two cores. Uhhh...?

redsummer · on March 20, 2016

I think I understand the concept. Extrapolating the example of Rainy and Sunny days, how would you model this over a year, where there is seasonal variation? In winter for instance, the rainy days would more likely to continue than sunny days. And what are the recommended methods for creating models from existing data?

Mauricio_ · on March 20, 2016

Those animations are seriously burning my CPU on chrome.

kamiheku · on March 20, 2016

Yup, got my processor up to a toasty 94 degrees celcius/200 degrees fahrenheit.

qsysmine · on March 20, 2016

My computer has actually turned into a toaster.