I'm only starting with all that machine-learning, NN stuff and as many others I ...

nl · on March 15, 2017

So how, for instance, do I even decide, that Random Forest is not enough for this task and I want to build some specific kind of neural net?

The problem here is that it's really hard to give generic advice. As an analogy this is like asking "how do I know if Rails is enough for this task".

The answer is usually "yes", but the specifics matter a lot.

So in this specific case (and I realize you aren't looking for specific advice here, but I think the principles are useful):

Random Forests are very powerful, and work really well for hundreds, maybe thousands of features, on large but not huge amounts of data and are fairly easy to train.

There are a large number of types of neural networks. One of the big advantages of deep neural networks is that that can reduce the need for manual feature engineering. For examples conventional neural networks extract features from images that work better than any human engineered features, and LSTMs (and variations) work well at extracting features from text. The problem with deep neural networks is that they (generally) need a lot of data to train.

So, as usual the answer is "it depends".

In industry though, 90% of the time the question isn't "what classifier should I use". It's "how do I get the data"/"how do I extract features" and then "lets try all the classifiers and see what works best".

mattkrause · on March 15, 2017

"Try 'em all" is not just an answer, but the only answer.

The No Free Lunch Theorem says that averaged across all possible problems, no single classifier is the best; in fact, they're all equivalent.

However, you probably don't care about all possible problems, but a specific one. Over the last decade or so, we've discovered that deep learning works really well on certain classes of problems, particularly those that may have some kind of nested structure, as in object or speech recognition. If your problem resembles one of those, a deep neural network might be a good place to start.

nl · on March 15, 2017

Right - this is good advice.

To paraphrase the learnings of thousands of data scientists on years of Kaggle competitions:

A quick and dirty model for a baseline: Random Forest

Structured data: Use a boosted tree algorithm (specifically the XGBoost implementation of gradient boosting), ensembled with maybe Extra Trees, Random Forests and MLPs

Some kind of time component on large datasets: FTL regression, XGB

Binary data (images or sound): Deep neural nets

Text: Try LSTMs, but this will often be beaten by manual feature engineering and Word2Vec derived features put into XGB.

minimaxir · on March 15, 2017

LightGBM (https://github.com/Microsoft/LightGBM) is shaping up to beat XGBoost; it has mostly API parity and it won in benchmarks before a v2 with a new algorithm.

nl · on March 15, 2017

I tried LightGBM for a Kaggle. I couldn't get anywhere near XGB.

I was using the LambdaRank stuff. Given the boasting the LightGBM team had done I had assumed it would be close to XGB out-of-the-box for a ranking problem (since XGB only does pairwise ranking). It was far enough away that I had to ask if I was misinterpreting the output[1].

That was 6 months ago now, so maybe it has improved. I know they made big claims.

[1] https://github.com/Microsoft/LightGBM/issues/37

minimaxir · on March 15, 2017

Development was rapid when I was working on a blog post in January using the tool. Things have likely improved if you want to give it another shot.

nl · on March 15, 2017

Yeah, I might, thanks.

Did you manage to replicate their results vs XGB?

I don't think anyone has successfully used it for a high result in a Kaggle yet, which - for all its faults - is a good way to see what the maximum performance of a software package seems to be.

LibFFM is the other thing I should have mentioned previously as being worth trying.

soVeryTired · on March 15, 2017

Another less-recognised point is that in industry, you also need to ask "how can I maintain this?" and "what can go wrong with my algorithm?".

In one use case, a "blip" in your algorithm might mean showing the wrong kind of advertisement to a user. Not great, but ultimately no big deal. In another, it might mean automatically buying billions of dollars' worth of pumpkin futures (cf. Knight capital).

In the latter case you need a much greater penalty on model complexity, and much more emphasis on interpretability.

nl · on March 15, 2017

While I agree with your point (and often use this in interview questions) that wasn't what caused the Knight Capital problem.

That was bad software engineering and deployment practices, and had nothing to do with interprability of the model (actually it had little to do with the model at all.) They repurposed a feature toggle, then misdeployed the code: http://pythonsweetness.tumblr.com/post/64740079543/how-to-lo...

I understand that this was an example, but I'm sure someone will misread it as what happened in that case.

soVeryTired · on March 15, 2017

Yep - meant it as an example of a general catastrophic software glitch rather than a ML algorithm gone haywire.

jph00 · on March 15, 2017

Generally for structured data (i.e. each column represents a distinct type of information, such as 'revenue' or 'color') you'll want random forest or GBM.

For unstructured data, where you'll need lots of complex feature engineering, you'll generally want to let the model learning those features - so use deep learning. E.g. images, natural language, audio...

I've won competitions with random forests and teach deep learning - both definitely have their place, but they are generally for quite different types of data. (This may change in the future, however, with deep learning showing that it has the potential to work well for structured data too.)

(Don't worry about the No Free Lunch theorem - it has little to do with predictive modeling in the real world. Recent research shows that a random forest will give amongst the best results for the vast majority of real world datasets.)

chas · on March 15, 2017

Which research are you referring to?

sherjilozair · on March 15, 2017

Nothing beats reading papers. Check this out for a very comprehensive list of the most influential deep learning papers: https://github.com/songrotek/Deep-Learning-Papers-Reading-Ro...

krick · on March 15, 2017

Thanks, I'll try that as well. But then again, this is specifically about deep learning. I'm asking more about something generic, systematic overview that would help me to know that I'm using some specific techinque because of reasons, and not because "deep learning is cool". Something that would include very basic, "manual" statistics approach as well as intro to NNs. I mean, I probably know that I need CNN when I'm presented with a picture, and sometimes I might guess that I might want to use RNN if I'm presented with a text I don't know how to parse, but when I want to predict something given a bunch of numbers and stuff, it is not all that obvious which exactly approach is likely to be "the right one" and which one is probably "because fashion".

kriro · on March 15, 2017

Even though you specifically say you are willing to go full blown PhD and are interested in digging deep on algorithms etc. I strongly recommend working through "Practical Deep Learning for Coders" course at fast.ai It's free :)

It gives you an excellent feel for what is possible and they are very focused on solving interesting and practical problems right away. They explicitly try to take the "requires a math PhD" out of deep learning. Once you're through with the course you have a very solid practical overview and understanding and can solve tons of real world problems (it's almost a startup idea generator tbh.) and once you're at that stage it becomes tons easier to dive deep into specific algorithms and optimizations.

tl;dr: Take the course (they also walk you through setting up a AWS GPU server so no fancy hardware required) and you'll be able to solve real world problems with state of the art algorithms.

dave_sullivan · on March 15, 2017

I'd definitely watch the first few episodes of Ng's stuff, up to and including logistic regression (unless you know all of that already, in which case: read papers and do practice projects for yourself--or compete in kaggle if you don't have any application ideas)

The most common way to apply machine learning is supervised classification. The basic formula is: we learn a model (set of weights) to approximately map data (a matrix X) to corresponding labels (a matrix Y). Where you can use logistic regression to learn a set of weights, you can use a keras-based neural network.

If all of that makes sense to you already, I think you're well prepared to read Keras' documentation.

krick · on March 15, 2017

It surely does make sense to me, but I seriously think (maybe hope, even?) that "hacking-driven" approach here is significantly overvalued. Because of sociological reasons. After all, all this is mathematical problems, and while I'm aware that NNs are pretty much unexplored space, there surely must exist some quite significant amount of knowledge at level below the NNs that can be actually systematically learned. All these various statistical methods R-lang community is buzzing about which I'm not ever aware of, some rationale about "why NN and not just a regression", etc. You know, the math.

dave_sullivan · on March 15, 2017

If you just pick up a math book, you'll learn lots of stuff that you don't need to know. That's fine, but it strikes me as a good way to avoid actually doing anything and gaining practical experience.

If you hit a wall in practice because you don't understand the math, you'll usually have enough of an idea of the problem to ask more intelligent questions about what kind of math you need. That will, incidentally, help you understand the math better because you're coming to it out of an actual need rather than just seeing it mixed into a bunch of chapters.

Unless you're going to write a machine learning framework or be a researcher, the required math isn't too bad and it sounds like you might have enough of a background already. So don't be afraid to dive into something practical (like a kaggle competition).

FWIW this is a really good blog for insight into the math and intuition behind deep learning: http://colah.github.io/ (i'm not sure if it's quite what you're looking for though)

nl · on March 15, 2017

The problem with looking for a theoretical as to why one method should be chosen over another is that you run into the "No Free Lunch theorem"[1]:

any two optimization algorithms are equivalent when their performance is averaged across all possible problems

Once you accept that, then you start looking at practical considerations.

Having said that, if you do want to do the math then you might like the course from Oxford/Nando DeFreitas (now at DeepMind/Oxford)[2]

[1] https://en.wikipedia.org/wiki/No_free_lunch_theorem

[2] https://www.youtube.com/playlist?list=PLE6Wd9FR--EfW8dtjAuPo..., https://www.cs.ox.ac.uk/people/nando.defreitas/machinelearni...