Hacker News new | past | comments | ask | show | jobs | submit login
Markov Chains Explained Visually (2014) (setosa.io)
419 points by sonabinu on March 20, 2016 | hide | past | favorite | 44 comments



For anyone interested in Markov chains (aimed at language) in Python, and their relationship to the larger world of language modeling (including "modern" RNNLMs and so on) this post by Yoav Goldberg is an excellent introduction, with clear and simple code [1]. Extending the Markov chain even a few steps can give really impressive generated results. Clone/download [2] if you want to run the code yourself.

[1] http://nbviewer.jupyter.org/urls/gist.githubusercontent.com/...

[2] https://gist.github.com/yoavg/d76121dfde2618422139


Woah, spent way too much time making this: http://setosa.io/markov/index.html#%7B%22tm%22%3A%5B%5B0%2C0...


Very nice lattice.


Last year I made a pseudo-random text generator based on Markov Chains as a summer project to learn C++: https://github.com/nmaggioni/MarkovText

It's in Italian since it just was an hobby project and I didn't spend too much time adapting it for the public domain, but it may be interesting for someone.


I have a Twitter bot that makes up book passages combining books from the Gutenberg project: https://twitter.com/markovian_lit Code is also on GitHub if it interests anyone - not the prettiest, but it was fun to build :-)


Here's a Ruby version in 16 lines of code: https://gist.github.com/michaelfeathers/2cf9e1599e06d0563f2e


For the case of Markov chains with memory, Dasher is a good learning tool (even though the word Markov is never used).

http://www.inference.phy.cam.ac.uk/dasher/

(Actually it seems I mentioned Dasher when this same link was posted in 2014, but it bears repeating.)


Interesting you mention Dasher. A month or so it occurred to me that some variation on Dasher might be the text input method of choice in VR.


Along those lines, maybe also something like Fleksy or Swype, which are reportedly usable by blind people -- fleksy in particular is very smart about determining the size and orientation of the invisible "keyboard" the user is typing on.


I like this. Markov chains can be hard to explain without examples.


It's a nice demo but when I moved one of the probability sliders from 0.5 to 0.9 my browser essentially locked up for nearly two minutes. (FF 45.0.1).

Seems we're still a ways from achieving native browser animation performance that Flash (for all its other problems) had a decade or more ago.


i didn't have any lockups in chrome.


Find the source code for this and other explanations at: https://github.com/vicapow/explained-visually/tree/master/cl...


I don't see how a markov chain has something to do with pagerank. I understand that pagerank use graphs, and a markov chain is some kind of graph, but I don't see anything else.


PageRank assumes your browsing activity is a Markov chain in the graph defined by hyperlinks in the web (assuming 1/n probability for each outgoing link in a page). The rank of a page is the limit #visits to the page/#total visits (which is the eigenvector of the transition matrix for eigenvalue 1)


This site is excellent for maths!!!! I recommend it to everyone!!!

Thanx sonabinu!!!


Thanks Max! I think it's a great resource too !!!


Nice way to learn new things!


I think it would be much more interesting with a guide to hidden markov models.

I'm surprised how much focus there is on regular markov chains without any hidden units on places like hacker news.


This has previously appeared on HN in July 2014 :)

https://news.ycombinator.com/item?id=8103240


But there wasn't enough hype around ML back then. A few Go wins later and it's getting a top spot.


Cheap shots like this degrade the community, so please don't post them here.

The story spent hours at #1 back then, and had three times as many votes as this resubmission (so far).


Am I right in taking from this that markov chains don't necessarily adjust to history?

So in the example, if a "sunny day" state has a chance of 0.9 to stay in that state and a 0.1 chance of transitioning to rainy day and I already had 5 sunny days, the 6th day would still have a 0.1 chance to be a rainy day?


Yes that's right. The reason they're "Markov" chains is that they make the "Markov assumption" which is usually stated as "the future is independent of the past given the present". This just means that the probability of the next timestep depends only on what happened in the current one; it has no memory as such.

As detaro says:

"A Markov chain with memory for the last m steps is sometimes called a "Markov chain of order m""

So if you wanted it to depend on the last 5 timesteps, you'd create a 5th order Markov chain, and so on.


Probability doesn't adjust to history either: if you flip coins, and if you already have 5 heads in a row, your chance for a head the next time you flip is still exactly one half.

But weather is not driven by probability so history might matter.


Yes, the classical Markov chain has no memory. A Markov chain with memory for the last m steps is sometimes called a "Markov chain of order m"


You could count sunny days by encoding the count as different states in a more complicated markov chain:

sunny1 -> sunny2 -> sunny3 -> sunny4 -> sunny5.

But as with regular expressions, you can't go to arbitrary depth.


when this first hit hn it had an enormous rank of 1070, and the comments are all about the breakthrough importance of markov chains

for those who follow.. is this still the case?

i feel like i hear less and less about markov chains

are they still in research and use?

is page rank still the most prominent search algo? does page rank still use markov models?

i always felt these probabilistic models were a conflation of correlation and causation


the breakthrough importance of markov chains

i feel like i hear less and less about markov chains are they still in research and use?

Markov chains will always be important, because they are a fairly fundamental concept (going back to the early 20th century).

They idea is certainly still widely used: two major classes of applications are Hidden Markov Models (where you attempt to infer the hidden state of a system from observations that depend on the state) [1], and Markov-Chain Monte Carlo (MCMC) [2] (where you sample from a probability distribution function by constructing a markov chain whose equilibrium distibution is that distribution).

is page rank still the most prominent search algo?

PageRank is really a ranking algorithm, rather than a search algorithm. It's been applied to many things other than web searching, and has many variants (e.g. see the review article "PageRank Beyond the Web" [3). That said, Google rankings depend on many things other than PageRank.

i always felt these probabilistic models were a conflation of correlation and causation

What do you mean?

[1]: https://en.wikipedia.org/wiki/Hidden_Markov_model#Applicatio...

[2]: https://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo

[3]: http://epubs.siam.org/doi/10.1137/140976649


> i always felt these probabilistic models were a conflation of correlation and causation

> What do you mean?

the example from the post of weather events is a simplified version of what i mean

the collected data expresses that there is a probability of 50/50 rain versus shine

the first string of possible outputs is scattered and more random than the real data, so weights are introduced to nuance to the output and the second string is more closely representative of the data

but this introduces a relative psuedo causation, what the author refers to as stickyness

it's reminiscent of the monty hall problem(o)

altering the state of the model causes its probability to change relative to a specific potential, but only within the limited scope of the probable space

choosing one door causes the probabilities associated with the state to change but the system itself is unchanged by the choice

and what you see with the weather is a simulation of the collected data stead the underlying system, which i feel conflates correlation with causation

(o) https://en.wikipedia.org/wiki/Monty_Hall_problem


RNNs (recurrent neural networks) can be seen as generalizations of Markov chains - or as I prefer think of it, Markov chains are very limited RNNs. There is an enormous amount of research happening around these sequential models (and differentiable DAGs (directed acyclic graphs) aka neural networks in general) these days.

Markov chains have never (in the work I have seen at least) tried to imply causation - purely correlation/probability. There is other recent research on causality (see the work of Pearl), but just because one thing very likely follows another doesn't mean the preceding event "caused" the next one, as both events could be side effects from the same (unobserved) root cause.


but the difference between neural nets and markov chains is the probabilistic element of nn is the optimal traversal and could be replaced with a perfect algo if one were found to be computationally feasable

whereas markov chains start from a place of probability and require it

the construction stage of the markov model agrees with your statement:

> just because one thing very likely follows another doesn't mean the preceding event "caused" the next one

but when in use the current state is the cause of the following state informed by your model

nn can simulate a system whereas an mc can only simulate data from a system


RNNs are optimizing a probability, at least with standard training to maximize likelihood. As you say, with a different training criterion sure it could do something different - but so could a Markov chain if it was built to optimize a different objective (though it wouldn't be a Markov chain anymore I don't think).

Look at reinforcement learning - loads of stuff making the Markov assumption and doing quite well on complex tasks, though once again this is quite different than a Markov chain.

In theory (at least my opinion on theory) if we had an infinite size "state" it would be possible to just encode all past information in every state, and use Markov rules. It is just not practical, and much easier to extend lookup to multiple timesteps in the past especially with better understanding of optimizing things like RNNs.

Choosing the maximum p(X_t+1 | X_t) does not infer causality in any way in my mind - only saying that given the data when X_t happens, some event at X_t+1 is likely to happen. Classic correlation. I don't see where you are seeing causality in that part. I also don't see a distinction in simulating a system and data from a system - to me it is just that NNs can model much more complex systems than typical Markov chains.

In general I agree with you, but I find these two concepts are much, much more similar than they are different.


> but so could a Markov chain if it was built to optimize a different objective (though it wouldn't be a Markov chain anymore I don't think).

i wanted to think about this, and i thought maybe you could have a purely markov model if it were a self constructing self similar model

a markov fractal

at each event the chain reconstructs itself from its new perspective in the fractal

but even done elegantly i think it would be more undue overhead to constantly reconstruct the model than just feeding a self similar model constantly reconstructed inputs, which i see a more clear affinity to neural networks or something wholly different than either


Just to add a comment on page rank, from what I've heard that is a actually a very small contributor to the overall google algorithm. That is - at least in the way it was originally conceived by Larry Page - PageRank is now only a minuscule tuner amongst 100's of other tuners in the overall search algorithm now, but because it was the first and core part of the google algorithm, people still continue to assume it is the predominant one.


This isn't quite related to your question, but I think Markov chains made their debut in the original information theory paper by Shannon: http://worrydream.com/refs/Shannon%20-%20A%20Mathematical%20...

It doesn't necessarily matter whether they're still in use. All of these tools are worth learning, so that if a problem ever comes up that can be solved by markov chaining, you'll know it right away.

Same for all the other tools, though. And there are a lot of them.


This isn't quite related to your question, but I think Markov chains made their debut in the original information theory paper by Shannon

What? That paper even says: Stochastic processes of the type described above are known mathematically as discrete Markoff processes and have been extensively studied in the literature; the corresponding footnote says "For a detailed treatment see M. Frechet ... 1938".


Always happy to be wrong, as it means I've learned. Thanks.


They're widely used for error correcting codes and MCMC and are generally important for modeling any kind of discrete sequential process. I guess the theory is well understood though, so deep learning etc. gets more attention.


Sorry to be off-topic, but the site partially disables my input, i.e. causes high lag. It makes it unusable. I'm using Firefox on a modern PC.

Opera seems to run fine, but it maxes out two cores. Uhhh...?


I think I understand the concept. Extrapolating the example of Rainy and Sunny days, how would you model this over a year, where there is seasonal variation? In winter for instance, the rainy days would more likely to continue than sunny days. And what are the recommended methods for creating models from existing data?


Those animations are seriously burning my CPU on chrome.


Yup, got my processor up to a toasty 94 degrees celcius/200 degrees fahrenheit.


My computer has actually turned into a toaster.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: