Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Contrary to the other advice around here, I would strongly advise NOT taking a course. I think it is a good idea at some point, but it is not the first thing you should be doing.

The very first thing you should do is play! Identify a dataset you are interested in and get the entire machine learning pipeline up and running for it. Here's how I would go about it.

1) Get Jupyter up and running. You don't really need to do much to set it up. Just grab a Docker image.

2) Choose a dataset.

I wouldn't collect my own data first thing. I would just choose something that's already out there. You don't want to be bogged down by having to wrangle data into the format you need while learning NumPy and Pandas at the same time. You can find some interesting datasets here:

http://scikit-learn.org/stable/datasets/

And don't go with a neural net first thing, even though it is currently in vogue. It requires a lot of tuning before it actually works. Go with a gradient-boosted tree. It works well enough out of the box.

3) Write a classifier for it. Set up the entire supervised machine learning pipeline. Become familiar with feature extraction, feature importance, feature selection, dimensionality reduction, model selection, hyperparameter tuning using grid search, cross-validation, ....

For this step, let scikit-learn be your guide. It has terrific tutorials, and the documentation is a better educational resource than beginning coursework.

http://scikit-learn.org/stable/tutorial/

4) Now you've built out the supervised machine learning pipeline all the way through! At this point, you should just play:

4a) Experiment with different models: Bayes' nets, random forests, ensembling, hidden Markov models, and even unsupervised learning models such as Guassian mixture models and clustering. The scikit-learn documentation is your guide.

4b) Let your emerging skills loose on several datasets. Experiment with audio and image data so you can learn about a variety of different features, such as spectrograms and MFCCs. Collect your own data!

4c) Along the way, become familiar with the SciPy stack, in particular, NumPy, Pandas, SciPy itself, and Matplotlib.

5) Once you've gained a bit of confidence, look into convolutional and recurrent neural nets. Don't reach for TensorFlow. Use Keras instead. It is an abstraction layer that makes things a bit easier, and you can actually swap out Tensorflow for Theano.

6) Once you feel that you're ready to learn more of the theory, then go ahead and take coursework, such Andrew Ng's course on Coursera. Once you've gone through that course, you can go through the course as it actually has been offered at Stanford here (it's more rigorous and more difficult):

https://see.stanford.edu/Course/CS229

I will also throw in an endorsement for Cal's introductory AI course, which I think is of exceptionally high quality. A great deal of care was put into preparing it.

http://ai.berkeley.edu/home.html

There are other good resources that are more applied, such as:

http://machinelearningmastery.com/

I hope this helps. What I am trying to impart is that you will understand and retain coursework material better if you've already got experience, or better yet, projects in progress that are related to your coursework. You don't need to undergo the extensive preparation that is being proposed elsewhere before you can start PLAYING.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: