As someone who uses R for just about all of my ML/data analysis needs I'm surprised not to see Theano[0] mentioned. SciPy, SciKit-learn, pandas etc are great and all, but there's not much really different than what you get with R (except of course having it all in a general purpose language). But Theano (plus it's related deep learning tools) really stands out for me as something the R tool chain can't compete with.
I feel like eventually I should become as fluent with SciPy/Scikit-learn/pandas as I am with R, but learning Theano well is much higher on my list.
If you're interested in theano primarily because of deep learning, I highly recommend you check out pylearn2 (not too much documentation, but docs here: http://deeplearning.net/software/pylearn2/ and source on github here: github.com/lisa-lab/pylearn2 ).
Pylearn2 is a set of deep learning algorithms implemented with theano. The LISA (deep learning) group at the University of Montreal (same group that created theano) maintains this library and puts a lot of the code they use for their papers in pylearn2. pylearn2 thus makes it quite easy to use a lot of state of the art algorithms, such as maxout.
Thanks! This is exactly the sort of thing I've been looking for. I really want to experiment with some deep learning techniques for some problems I have, but the start up cost of "oh yea you have to build all the tools yourself right now" keeps putting me off.
> except of course having it all in a general purpose language
But that's the point though. A lot of data analysis is just munging numbers / getting things in shape so you can actually do the analysis, and so being able to do that in a general purpose language is a breath of fresh air.
Indeed,
R libraries are superior to python and it is more of a lingua franca in the data world so if you are 'just' doing data that is probably the superior choice.
But I would never attempt to build a production system in R.So if you want to go from research to production in the same language or as the same programmer python has all the advantages.You also have the R2Py route for missing libs though that is not the same as doing it natively.
That said if anybody got the Pandas/Numpy/Ipython workflow going in Go however I drop python in a heartbeat.I would love faster loops(natively not just through numba) and better concurrency in python.
BTW IPython now runs R code(and Octave!) interactively so there is an advantage to knowing both from the python perspective.
I feel like eventually I should become as fluent with SciPy/Scikit-learn/pandas as I am with R, but learning Theano well is much higher on my list.
0. http://deeplearning.net/software/theano/