That's very neat. Would you mind posting a link to your version from the comments on my article? Thanks.
I learnt Haskell before I tried using Clojure and really like it as a language. It doesn't surprise me that it runs faster. I'm guessing the feature value look-ups don't involve the extra unboxing that Java's maps do.
Types in Haskell are boxed as well, though it's possible to get around this with some GHC-specific hackery, I haven't done it.
But the biggest problem is the parsing code -- it actually spends around 70% of the runtime parsing the input! Changing to regular expressions may help that, but I gave up trying to find good documentation for Text.Regex.Posix.
"The reported accuracy is simply the cumulative total number of errors divided by the number of steps."
Use a moving average.
I use:
m <-- m - (2/t) (m - x_t)
when estimating the current training error.
1/t would be the exact historical average. 2/t gives more weight to recent events, which is good when your distribution is non-stationary (as is the case when your model is changing). With a constant learning rate (independent of t) you get an exponential moving average.
I'm the author of the blog post and to answer your question, yes, I know about Incanter. In fact, I found out about Parallel Colt via Incanter. I've played around with it a little and it looks very cool.
I was initially going to build my algorithm using its libraries as a base but I thought a simpler first step would be to write it without pulling in too many extra dependencies.
http://gist.github.com/147988