Model-Based Machine Learning

oergiR · on Dec 4, 2016

The "model" in the title is the model of the world, as a probabilistic model. The good thing about such a model is that it explicitly states your beliefs about the world. Once you've defined it, in theory reasoning about it is straightforward. (In practice a lot of papers get written about how to do approximate inference.) It's also straightforward to do unsupervised learning.

This is a different perspective from (most uses of) neural networks, which do not have this clear separation between the model and how to reason about it. It's funny that Chris Bishop in 1995 wrote the textbook "Neural Networks for Pattern Recognition" and now is effectively arguing against using neural networks.

You can use both by using neural networks as "factors" (the black squares) in probabilistic models.

nl · on Dec 4, 2016

It's funny that Chris Bishop in 1995 wrote the textbook "Neural Networks for Pattern Recognition" and now is effectively arguing against using neural networks.

I haven't read "Neural Networks for Pattern Recognition", but his "Pattern Recognition and Machine Learning"[1] is the text for ML work including Bayesian approaches.

I don't think one should view this as "arguing against" neural networks - it's more that Bayesian approaches give you something different.

[1] http://www.springer.com/gp/book/9780387310732

highd · on Dec 5, 2016

One of the most popular ways of using techniques like this is the "Variational Autoencoder". I've been working on using some alternate distributions with them as of late - it's very interesting, and quite powerful.

nl · on Dec 5, 2016

How does this work? You use the VAE to model variables and then somehow get the distribution from them?

Got a link? (I know the basics of VAEs, but I'm missing how to link them to this)

highd · on Dec 5, 2016

The VAE "coder" is modelling a distribution p(z|x), and the decoder is modelling a distribution p(x|z).

I like these slides: https://home.zhaw.ch/~dueo/bbs/files/vae.pdf

ThePhysicist · on Dec 4, 2016

I have to say the layout of this website looks great! Very accessible and clean. Was it made with a specific framework?

jamessb · on Dec 4, 2016

One of the css files [0] includes a copyright notice for Skeleton ("A dead simple, responsive boilerplate"). [1].

[0]: http://mbmlbook.com/HtmlReader.styles.base.css

[1]: http://getskeleton.com/

danappelxx · on Dec 4, 2016

Hmm, not very responsive for me (iPhone 6 safari iOS 10)

nkozyra · on Dec 4, 2016

I've never heard supervised learning referred to as model-based learning.

llasram · on Dec 4, 2016

My take from the introduction is that the books is going to mostly be about probabilistic graphical models (PGMs).

I look forward to reading this book when finished and hope they find success with this presentation of the core ideas. As a practitioner I see a fair amount of "I have a hammer; now I just need this problem to be a nail" type thinking with regard to using off-the-shelf techniques.

In the intro to this book the authors have an example with Kalman filters. A similar example is how Latent Dirichlet Allocation (LDA) is treated by different communities. In a certain chunk of the CS-dominated topic-modeling literature and in the data science blogosphere LDA is this recieved atomic technique; a black-box tool for modeling documents. In the Stan manual, it is one fairly boring example of a mixture model, only worth talking about explicitly because so many people ask about it.

Cybiote · on Dec 4, 2016

As rm999 points out, this book is vastly more useful than limiting distinctions such as supervised/unsupervised learning (what happens in brains is learning while predicting, which is not completely well captured by that delineation, nor even fully by reinforcement learning).

This book will provide a set of skills which will age far better than if it had been specific to some machine learning framework or ideas. It's one of the best I've seen on reasoning probabilistically, bayesian networks, graphical models and probabilistic programming generally. It also teaches the core of the involved algorithms. These skills will be important going forward as we seek to implement ever more brain like systems (and better). The knowledge will also carry over to gaussian processes (which are a subset really) and the more future proof generative deep learning ideas.

It also teaches how to reason about your problem and diagnose machine learning systems. Whether you're designing features, trying to figure out how to make a research paper work in real life, or are one of the rare people capable of coming up with deep learning architectures, what the book teaches will be indispensable to you.

rm999 · on Dec 4, 2016

The introduction clarifies what the authors mean. In this context "model" isn't about implementing a supervised model, it's about "modeling" your problem to build a bespoke algorithm that closely matches the problem. Unsupervised methods like clustering would probably fit in here too.

I haven't read much of this early access book yet, but I'd give the authors a lot of benefit of the doubt. Christopher Bishop wrote one of my favorite machine learning books (I read it after my graduate study in machine learning and it filled in a lott of the gaps): https://www.amazon.com/Pattern-Recognition-Learning-Informat...

brudgers · on Dec 4, 2016

From the Hacker News guidelines:

Please don't insinuate that someone hasn't read an article. "Did you even read the article? It mentions that" can be shortened to "The article mentions that."

It is possible to edit the comment to remove the phrase if you wish.

rm999 · on Dec 4, 2016

It was an honest question, not snark (passive aggressiveness is not my style).

The introduction is kind of hidden on the page, and clarifies the meaning of "model" in this context. Otherwise, GP is correct that "model" is often used to mean a supervised model, and that people generally call it "supervised learning", not "model-based learning".

brudgers · on Dec 4, 2016

I'm glad it was an honest question. Editing the comment is an option.

I think the guideline exists because even as an honest question it does not add anything to the comment and at best an answer doesn't change anything and at worst it detracts from meaningful dialog.

One feature of this particular guideline is that it provides an alternative phrasing that is likely to avoid misinterpretation.

rm999 · on Dec 4, 2016

>I think the guideline exists because even as an honest question it does not add anything to the comment

I hope you see the irony here considering how much you're derailing this conversation (I'm only responding because I realize your intentions are good). And I'm pretty confident my comment added plenty of value to the discussion - I realize sometimes tone is lost in text, but after my clarification I don't see why you need to harp on this. Anyway, original comment edited.

brudgers · on Dec 4, 2016

If I had thought of suggesting editing your comment before posting my second comment, then it might have been different. And in a similar situation in the future I well might. That said, until I thought about it a bit more, it didn't occur to me. Anyway, for me, writing is thinking.

yummyfajitas · on Dec 4, 2016

This isn't a book about supervised learning, from what I can tell. Based on my reading of the murder mystery and the skill assessment, it's about defining models based on your understanding of the underlying system and then fitting them to the data.

This is a lot closer to classical statistics than machine learning.

kristjankalm · on Dec 4, 2016

rather than downvoting i'm actually curious why you think unsupervised learning is not ML?

there'd be so much less noise in these comments/discussions if we just did away with vague and illdefined labels such as ML or AI

yummyfajitas · on Dec 4, 2016

To me at least, the major distinction between "classical statistics" and "machine learning" is that machine learning" strives to work independently of the underlying distribution while classical statistics tries to model it.

I.e., a statistician doing linear regression assumes that reality is linear (or at least differentiable) in the region of interest. A convergence proof of linear regression will use this assumption.

A machine learning practitioner does NOT assume reality actually has a random forest out there in the world somewhere, and as a result needs to prove far more general (and less accurate) convergence results for the random forest.

From what I can tell, this book falls into the former category.

xapata · on Dec 4, 2016

> assumes that reality is linear

The assumption is that a particular relationship is reasonable to model as if it were linear. No one believes reality is strictly linear.

I've read your posts enough to believe you know how linear regression works. I'm criticizing your comment because it encourages a misunderstanding of traditional statistics as having nonsensical assumptions.

yummyfajitas · on Dec 4, 2016

Out of curiosity, was my caveat "(or at least differentiable) in the region of interest" insufficient for that purpose?

I certainly didn't mean to imply that statistics has unreasonable assumptions. Merely that it tends to have stronger assumptions - and more accurate results - than machine learning. Personally I'm a huge fan of classical statistics and think it's currently underappreciated.

xapata · on Dec 4, 2016

The caveat doesn't work for a technical reason and a more important practical reason. Most relationships, even ones that aren't proper functions, can be transformed into a linear model. An absolute value function is non-differentiable for one value of the input, but it'd be perfectly fine to model with linear regression. More importantly, the audience I worry about isn't the type to pay attention to parenthetical notes using jargon. Linear is somewhat accessible jargon, but differentiable is less so. I'm not claiming that I write clearly, but I aim to write such that I don't need caveats.

jefft255 · on Dec 4, 2016

Yes it really is underappreciated. As quoted by other comments, "Most businesses think they need advanced ML and really what they need is linear regression and cleaned up data". A significant portion of businesses currently investing millions in ML should basically hire a couple of statisticians and get over it.

xapata · on Dec 4, 2016

To be fair, the fully loaded cost of a couple statisticians (ones who can code, or combined with an engineer assistant) might be half a million or more annually.

eli_gottlieb · on Dec 5, 2016

>A machine learning practitioner does NOT assume reality actually has a random forest out there in the world somewhere, and as a result needs to prove far more general (and less accurate) convergence results for the random forest.

Of course, most of the time nowadays, "throw a neural network or an SVM at it" doesn't really require strong convergence results... even though there are some nice analytical results for support-vector machines.

huac · on Dec 4, 2016

I think yummyfajita's point is that a traditional statistics approach begins with some understanding of the system being modeled, and that you create a model using that understanding. There is usually a high focus on parsimoniousness and explainability, while in ML/AI, you don't really care what the underlying model is or how the model comes to a particular conclusion. The focus is on accuracy at the expense of explainability.

povik · on Dec 4, 2016

I am afraid you misunderstood. OP's point isn't that it's unsupervised learning, hence not ML. His point is that it's no learning at all. IMHO.

geooooooooobox · on Dec 4, 2016

Anybody know if Scala's Figaro software is in the same category as Church?

nextos · on Dec 4, 2016

Yes, it is equivalent.