I can recommend this episode of The REPL podcast, where the author talks about some of the whys, hows, and current state of data science in Clojure: https://www.therepl.net/episodes/25/
Dragan, thank you for your continued hard work (and your well written posts!) I haven’t got the time yet, but I’m very much looking forward to reading both your series of “deep learning from scratch” posts and your books.
You don't have the mature bindings to things like TensorFlow or Torch, you don't have good viz libraries, you don't have broad support for the types of analysis scipy allows, and beyond Weka and random stuff like XGBoost having Java bindings, you don't have access to a lot of different models.
That said, Clojure is _much_ better than both Python and R for data prep. You can build very nice, fast (parallel) pipelines with transducers etc, and stuff that seems like magic to tidyverse consumers in R is just everyday data transformation in Clojure. And despite the fact that Incanter more or less died, I still think the language would be a great fit for data science if the community was there, and Dragan's work really deserves that sort of attention. The foundations are already far superior to what's available in R and Python (e.g. you are doing stuff on the GPU on day one, you can do bayesian analyses in some cases thousands of times faster than Stan etc).
You don't need to sell me on Clojure being a nicer foundation than Python in most respects, but the thing I keep running afoul of when doing data science on any JVM language the performance hit from all the copying it takes to pump data back and forth across JNI.
The showdown that's more interesting to me is Clojure vs Julia, which is very nearly an acceptable Lisp, and also has a nicer interface to C libraries. And, IIRC, also the ability to interface directly with C++ libraries, without having to first wrap them in a C-compatible interface.
Well, there wouldn't be once data is already copied into Uncomplicate data structures. But surely you can't just pass a pointer to the guts of a Java array, and do have to copy data back and forth to get it into Uncomplicate data structures in the first place, don't you? Otherwise, how does the C/Fortran/whatever code deal with the fact that the JVM's garbage collector reserves the right to move data around?
Why would you pass a pointer (or the contents it points to) to the guts of a Java array? Neanderhal does not require Java arrays (although it supports transfer to/from arrays for convenience).
Please try Neanderthal; there are lots of getting starting resources. You can benchmark it yourself (very easy to do in Clojure) and see...
I assure you that the only copy you would need is the same one you need in C, C++, or any language: the one from the source of your data (IO such as database, network, scv string etc). And even this is not required if you initialize the vectors randomly (which is often the case).
Open source software: https://github.com/uncomplicate
Books: https://aiprobook.com
Deep Learning for Programmers: An Interactive Tutorial with CUDA, OpenCL, MKL-DNN, Java, and Clojure
https://aiprobook.com/deep-learning-for-programmers/
Numerical Linear Algebra for Programmers: An Interactive Tutorial with GPU, CUDA, OpenCL, MKL, Java, and Clojure
https://aiprobook.com/numerical-linear-algebra-for-programme...