I think this "machine learning for hackers" approach is just not enough. Oftentimes, you do need a solid theoretical/mathematical background. Most people seems to approach ML like they approach programming tools or libraries - learn just enough to get job done and move on.
I was studying machine learning from Andrew Ng's CS229 (the class videos are online. I think they date from 2008 or hereabout). There is no way you can progress beyond lecture 2 (out of 20) without a solid probability background. A solid background in probability/statistics probably means a good first course in Probability or maybe the first five chapters of "Statistical Inference" by Cassias and Berger. Similarly, for SVM, you need a solid background in Linear Algebra and so on. You probably also need a background Linear Optimization. Here are the recommendations by Prof. Michael Jordan https://news.ycombinator.com/item?id=1055389
Not a lot of people want to dive in this much. They have got things to do and who cares about proofs anyway. The thinking goes like "Most of the mathematics is abstracted away by libraries like scikit-learn. Let's get shit done.". Well, I think a lot of competitive advantage of Google/Facebook in ML is because they have staffed their engineering with people who have studied these things for years (by PhD). Compare that to flipkart's recommendations.
However, I don't think this problem is unique to ML/Data Science. It is equally bad in "Distributed systems". Let's use Docker, that's the future!
I understand where you're coming from and also agree in principle, but I'd change the claim that "this approach is just not enough" with "this approach is just not enough for achieving many things in machine learning including breaking new ground". I think there's always a way to be creative within the constraints and concepts/axioms you take as given. For example, the fact that I have absolutely no control (or knowledge for that matter) on how to design or improve the microprocessor on my computer, I don't feel this is limiting my creativity in software development at all. Once changes and improvements occur in the hardware level, I'm sure they will find a way to the software development layer and then I'll be handed even more degrees of freedom to be creative (though I'm not complaining with the freedom I currently have). Don't you think the same might apply to machine learning - i.e those with solid theoretical/mathematical background are analogous to the chip designers and the "machine language hackers" are the software developers?
I don't think Andrew Ng would agree with your assertion. His Coursera ML class assumes little more than a basic high school math education, and at the start of the course, he teaches the very small subset of linear algebra required to understand his course materials.
I think what Andrew Ng would say is that without a rigorous statistical background, you will be limited in your ability to use ML, and you will certainly be more liable to blow your foot off by using it improperly. That being said, in a subset of cases, you may be able to achieve non-trivial insights through the techniques he teaches in the course.
So how I would rephrase your assertion is that a hacker can probably get a lot more out of ML techniques if they are willing to learn the math underlying them.
> And at the start of the course, he teaches the very small subset of linear algebra required to understand his course materials.
I tried doing the ML course without any prior knowledge of Linear Algebra and dropped out after the first three weeks. In hindsight, I realized it wouldn't have been possible to appreciate how PCA works without understanding eigenvectors, how collaborative filtering is an elegant application of matrix factorization and so on.
But after I completed Strang's Linear Algebra course, the entire ML class was a breeze.
Sure! The course has a slow start. The first few lectures focus on the mechanics of matrix operations instead of starting with linear transformations, and compositions of linear transformation - the core ideas behind Linear Algebra. But after lecture 10, the course picks up pace.
I've been interviwing for ML positions and what struck me was the general disdain for details. I had one manager claim that they're set to beat their competitors now because they're moving to GOOG's new tensorflow. Others knew little more than Tensoflow and Backprop.
Frankly I regret spending time understanding all the math, instead of working for some company, munging through their data and applying some black box stuff. The hype is quite bad IMO.
Totally agree, I've got a solid applied calculus background from my electrical engineering undergrad degree and some DSP in my first job, but avoided learning statistics and probability because I thought it was intuitive. But after doing a MOOC course on Machine Learning, I realise that statistics and probability is more complex than anticipated.
If you don't work on your fundamentals, you end up simply memorising the algorithms and basic applications. Much like the neurons discussed for deep learning, we need to create rich relationships between our own to ensure we retain and can apply the knowledge in the long term, which starts with laying a foundation of fundamentals.
For brushing up on those fundamentals, I recommend Bertsekas & Tsitsiklis : Introduction to Probability. Theory supported with lots of examples, as well as comparing the theory to "intuition" and why it is much more effective to apply the theory.
> I think a lot of competitive advantage of Google/Facebook in ML is because they have staffed their engineering with people who have studied these things for years (by PhD).
This is not entirely true. Most of their advantage comes from the corpus of data. Of course, I'm not discounting the fact that they're pioneers in the field, but at this stage data is their competitive advantage (hence they open sourced Tensorflow.)
I feel the present state of ML libraries or even distributed system libraries not being a black-box solution and not the "just works" type is a growing pain, and will be evolve into something more accessible/robust in the future. The whole point of it being a "layer of abstraction" is that you don't need to know the details.
An alternate theory is that you can make use of ML to do useful tasks through an understanding of just high school math (basic algebra) and basics of python. I'm not sure if that's actually true, but I'm inclined to be in this camp as abstractions are used in virtually every other tasks. The amount of extra value that can be had from applying even basic ML techniques is so great, that there is probably a lot of upside to using ML even if they're only able to hire practitioners.
A great resource specifically tailored to those that don't have a especially strong grasp of probability and statistics is Grokking Deep Learning
This is backward thinking. It borders on elitist, although I know it's not meant that way.
Developers everywhere use Paxos without even knowing it, much less having read Lamport's papers, because they're building on top of solid tools that use Paxos (or Raft or what have you). This is more true at Google and Facebook than anywhere.
Same goes for ML. You can study the theory, and you can learn to apply it. In the field's nascency you basically need to understand the theory in order to apply anything, but eventually robust tools are built upon which developers can build systems without having "studied these things for years (by PhD)".
Let's put it this way. A startup that insists every one of its developers touching ML has a sound basis in fundamental theory is going to get left in the dust.
Anyone who wants to be the guy/gal who understands the fundamentals will be valuable. But we don't need everyone trying to be that person. And most wouldn't be successful, though they'd be successful as the guy/gal who does other stuff.
Everytime there is a paradigm shift there is always that voice: If you don't understand the paint at a chemical compound level you can't make a beautiful painting. Wait what?
Eh, let's revise that analogy. More like not understanding the bricks means you can't make a good building. You can by good intuition, but it won't be spot-on perfect (as by calculating all the physics) and you'll need more luck the higher you get.
Well, most states won't let me practice as an engineer, even if I get really good at Solidworks from watching YouTube videos, because I have no background in the field.
> Oftentimes, you do need a solid theoretical/mathematical background. Most people seems to approach ML like they approach programming tools or libraries - learn just enough to get job done and move on.
I've been coming across this on HN front page and it's worrysome to an extent.
If you are a hacker, you tend to be more driven towards learning techniques (ML, DL etc) to solve a problem at hand rather than just learning a technique and then hunting for a problem which can be solved with it. For example, my motivation for learning ML & any associated statistical methods went through the roof when confronted with a problem of figuring out a better way to identify & predict which devices (from a huge set) would go bust largely based on available indicators like power drain etc. I wouldn't have made the effort to read a bunch of papers & watch relevant videos if I didn't have a problem to solve. Maybe that happens if you've been a code-monkey for 20+ years.
Although to do serious production level ML, I agree that you need to understand the math. But as a starting point, the machine learning for hackers is a great place to start.
I think writing some algorithms and using them to solve problems provides great motivation for the math. In particular, the math will explain why certain approaches did and did not work. Without the hacking that material can get a bit dry.
> Well, I think a lot of competitive advantage of Google/Facebook in ML is because they have staffed their engineering with people who have studied these things for years (by PhD). Compare that to flipkart's recommendations.
Not entirely true. Google & FB have orders of magnitude more data than flipkart.
You can have the smartest ML people on the planet churning out the most clever, advanced ML algos & models, but without enough data, its not going to be useful & effective.
I recently attended slashn[0], flipkart's annual technical conference, and spoke to a bunch of their ML folks. They have masters & phd degrees in ML from IITs, IISc and are as smart as they come.
Sure, flipkart doesnt have marquee names of the likes of yann lecun, andrew ng, but i wouldnt doubt the ML talent Flipkart has
ML is easy to get set up, but often difficult to debug if you don't really understand the details. Within the last week I pair reviewed a recommender system written in MLlib (ala this post http://spark.apache.org/docs/latest/mllib-collaborative-filt...), that was doing strange things, despite performing well on a test set. It turned out the metric being used on that page was not a good one for our purposes, and the algorithm had zoomed in on a degenerate solution that nailed the test score. This was clear to me after about 2 minutes by looking at the auxiliary matrices generated. The less experienced person I was helping did not how to proceed.
Thanks for sharing. Here's a set of deep learning resources I've found useful to give you a good theoretical background as well as start applying techniques to real world problems:
1. Intro deep learning, bit of theory and intuition building while applying it to a toy problem:
The amount of free resources now available for learning machine learning/deep learning nowadays is robust and easy to comprehend. (indeed, Andrew Ng's Coursera class is very good). And running running ML code is even easier, with libraries like Tensorflow/Theano to abstract the ML gruntwork (and Keras to abstract the abstraction!)
I suspect that there may be machine learning knowledge crash, where the basics are repeated endlessly, but there is less unique, real world application of the knowledge learned. I've seen many Internet testimonials saying how "I followed an online tutorial and now I can classify handwritten digits, AI is the future!" The meme that Kaggle competitions are a metric of practical ML skill encourages budding ML enthusiasts to look at minimizing log-loss or maximizing accuracy without considering time/cost tradeoffs, which doesn't reflect real-world constraints.
Unfortunately, many successful real world applications of ML/DL are the ones not being instructed in tutorials as they are trade secrets (this is the case with "big data" literature, to my frustration). OpenAI is a good step toward transparency in the field, but that won't stop the ML-trivializing "this program can play Pong, AI is the future!" thought pieces (https://news.ycombinator.com/item?id=13256962).
Nailed it. Neural net execution speed is so critical for may production systems and it's very difficult to hit the sweet spot on trade offs but I never hear about these issues in wild.
> Nailed it. Neural net execution speed is so critical for may production systems and it's very difficult to hit the sweet spot on trade offs but I never hear about these issues in wild.
That's probably because you're not listening. There's plenty of literature on scaling down neural nets to smaller devices because everyone knows it's an issue and can trivially get a smaller device, see techniques such as Quantization , Distillation, Pruning or architectures specifically designed for the task such as YOLO.
Distributed Systems and ML are probably two most interesting things that I have on the radar, that got me really scared to the point where I do not know from where to start, and most importantly for what?! Most of my free time (time I spent on personal projects) was writing physics simulation in Java, playing with Lisp and doing some backend development. Nothing amazing. Year and a half ago I got really interested into Operating systems (tried FreeBSD and blew my mind) and played with Docker. And at the end of this year, I am like:
"Ok Philip what shall I focus on for year to come?" And the thing is If I choose to go Ai route, I do not know from where to start (I consider my math background to be pretty good, I was studying EE before I dropped out after 2 years, and enrolled to CS, done all of the math courses which were pretty rough), Ai/ML looks interesting but it looks so high level to program and so abstract to understand. It's really looking like arcane magic to me. With Dist. Systems is that I have a feeling that is more "engineering" and "industrial" thing, where you can't do much by yourself at home, besides reading and writing some code in relevant languages about backend, sometimes lower level, and learning about systems and computer innards. And the third option was to go and play with Erlang/Elixir, which is most attractive since results will come pretty soon, and may be relevant form my interest in Distributed systems.
> If I choose to go Ai route, I do not know from where to start ...
This type of comment is often made in machine learning (ML) related submissions.
The pre-req list is long: calculus, linear algebra, stats, probability, numerical methods (for optimization, linear algebra, maybe interpolation), etc. BUT, you don't really need to go through the entirety of each subject for ML. For example, in calculus, you probably only need to focus on the aspects necessary for optimization, rather than integral techniques, convergence of sequences, etc. The trouble is that it is difficult to know which subtopics of each subject are worth spending time on unless you already know machine learning (or you have the luxury of someone with experience guiding you).
The latter difficulty is compounded by the fact that there seems to be many more resources (at least posted as popular submission on the web) for learning neural nets or learning some specific framework to implement neural networks, than to learn the mathematical and statistical foundations of ML. This is fine -- neural nets are a popular and powerful model, and people like to work on something tangible to get acquainted with a topic.
I wonder if people might enjoy a well-written textbook covering the basic math for ML -- something like, "All the math you missed (but need to know for machine learning)" [1]. I might enjoy working on such an ebook if there was desire for one, but my time is pretty limited (like most).
Thanks, that was the answer I was looking for, you said it much better than I did! When i look at Ai/ML I see a lot of mathematics, not frameworks and programming languages. Anyone can learn to use specific framework or adopt to certain programming language and environment. What concerned me was mathematics wise, since on EE course Math was much more apply oriented, with integration techniques and geometry, not so much about statistics and probability.
A counterpoint: Deep learning is currently hyped, making you not consider other techniques that might work better, or are simpler and work just as good. Deep learning might have a limited scope and turn out to be a dead end for areas other than the ones already examined.
> making you not consider other techniques that might work better
Currently, DL is the most powerful technique for many problem types. I think for a beginner, learning that is a safe bet, the "next thing" is likely to be an elaboration on DL. It's good if some people ignore the hype, they may come up with the next paradigm going in a completely different direction from DL. But the people doing that won't be beginners. They'll be people with similar levels of experience as Yann LeCun, Geoffrey Hinton, etc. Those well rounded people with deep theoretical knowledge will make the big breakthroughs. Beginners should start with what we know works well now, and expand out.
> [Other techniques may be] simpler and work just as good
This is a much better reason to learn other things than DL. Efficiency and lower complexity if you know the problem domain is amenable to the technique.
I have an almost opposite problem. I spent years learning alot of ML stuff and worked at a job doing this kind of work for a couple years or so. I think the issue was that the data we had at the organization and the internal politics seemed to make it difficult to use for ML in a way that mattered to the business. I grew frustrated with having spent alot of time learning things that were exciting then realizing it didn't really matter if some manager can just say "we're doing it this other way that makes sense to me." (Not based on data, but gut feelings)
I'm not sure what to do with that. Probably ML works best in organizations and situations that are on board for using ML to make decisions for the business. Here's the other thing -- finding a business where ML is core to its decision making that will hire a person with no formal ML related education may be difficult. Perhaps I'm wrong about that and have just given up on ML after my frustrating experience.
Now I'm building data systems that the business uses on a daily basis to get things done. I feel alot better doing that than ML stuff, even though I loved playing with data and ML. I guess I've given up on ML for now, maybe I'll find my way back to it again.
I agree. Most of what I've seen suggests that everyone _thinks_ they need and want machine learning specialists. But mostly they need people who are flexible in how they combine business acumen with statistics, a little ML, analysis, and programming. Business owners usually insist they need the ML soooo much, but they're rarely willing to go all the way and actually deploy ML. Plus, often they don't really need it...Or even modeling of any sort. They may need automated dashboards (for keeping an eye on important KPIs), decision support platforms, etc... Lots of things which require something more than analysts, statisticians, programmers, and the like. In essence they need highly capable jack-of-all-trades who can, on occasion, bring to bear advanced algorithms and stats. But that's a lot more rare than you'd be led to believe given job postings and all the news articles.
It is about data and algorithms and if calling them machines and learning or cognition or intution or thinking or whatever makes you happy then no body can stop you from doing so.
Even I recently started with ML/DL but my approach is more theoretical way. I started with Andrew's course, but alongside doing Python Machine Learning textbook,while testing my self on Kaggle. I hope to build some interesting system soon. The only thing I am worried about is getting a full time job, which I think always require someone with 2+ year experience.
Admirable intentions by the author but I hope (s)he changes his font/formatting style.
The current font with dense paragraphs makes it hard for me to read without a headache, sparser sentences (either via bullet pointed lists or illustrative images) are much easier for me to parse.
Unfortunately the author begins by citing the fraud Taleb. After that I have to doubly examine everything he writes for signs of subtle nonsense, and its just necessary to close the tab.
I was studying machine learning from Andrew Ng's CS229 (the class videos are online. I think they date from 2008 or hereabout). There is no way you can progress beyond lecture 2 (out of 20) without a solid probability background. A solid background in probability/statistics probably means a good first course in Probability or maybe the first five chapters of "Statistical Inference" by Cassias and Berger. Similarly, for SVM, you need a solid background in Linear Algebra and so on. You probably also need a background Linear Optimization. Here are the recommendations by Prof. Michael Jordan https://news.ycombinator.com/item?id=1055389
Not a lot of people want to dive in this much. They have got things to do and who cares about proofs anyway. The thinking goes like "Most of the mathematics is abstracted away by libraries like scikit-learn. Let's get shit done.". Well, I think a lot of competitive advantage of Google/Facebook in ML is because they have staffed their engineering with people who have studied these things for years (by PhD). Compare that to flipkart's recommendations.
However, I don't think this problem is unique to ML/Data Science. It is equally bad in "Distributed systems". Let's use Docker, that's the future!