I've gotten so used to Github style pages that I expect to scroll down and see some examples, documentation and links. I actually found myself frustrated when I scrolled down to find nothing but a copyright notice. Even this site's github only had a changelog. Not a knock against the site, I just don't see much in the way of selling me on the library.
Here you go http://packages.python.org/milk/ The link was right there at the bottom of the Github page. Some of the links from doc to the src code does not work though, but one can always browse the Github repository directly.
@luispedro I did not see your reply, hence the duplication. I see that Milk isnt doing too well on pca (11 times slower than the best performer). From the code it seems you are importing numpy.linalg. I think if you import scipy.linalg your results will be better without any change to the algorithm (unless you have built numpy with ATLAS or MKL).
Scipy.linalg links with underlying BLAS if its avaliable whereas the standard build of numpy implements linalg on its own (but it is possible to override that). Second point, I think for large data sets you are better of giving an api for computing an user specified number of principal components rather than all of them. Thirdly, if you find limiting yourself to gcc not too restrictive, you can stick with stl style algos in place of c++ looping. The advantage is that you will get parallelism for free http://gcc.gnu.org/onlinedocs/libstdc++/manual/parallel_mode...
Thanks for the tips. I know that the PCA implementation is very naïve, but I never put as much work into it as into some of the other parts of the library.
Sorry I wasn't really complaining about your example, just in general. I've become trained by github to expect a certain thing. Keep on making cool stuff!
Could you please comment on why a Python machine learning developer should use your code and not, say, Shogun or scikits.learn? In what use cases would your code be preferable, or dispreferred?
How does your k-means implementation compare to that of scipy's?
Why not push your code as modules into scikits.learn? Their library is designed to be many loosely coupled components.
Why would you use my code instead of others? None of the packages covers all of machine learning, so it depends on what you're looking for. I focus mainly on supervised learning and kmeans. I want my algorithms to be as scalable as possible too.
Other projects have other priorities/functionality.
I like my interfaces better too, but I might be biased by being so much more familiar with them.
"""How does your k-means implementation compare to that of scipy's?"""
I think that my code is faster: http://bit.ly/e8VOXy and it is probably more scalable.
Scikits.learn has more functionality in certain aspects. So, if you need those, use it. I started milk when there was no scikits.learn. The interfaces are different and they work together here.