Fast Tensors in Clojure – A Sneak Peek

dragandj · on Aug 26, 2019

Some more coordinates related to this post:

Open source software: https://github.com/uncomplicate

Books: https://aiprobook.com

Deep Learning for Programmers: An Interactive Tutorial with CUDA, OpenCL, MKL-DNN, Java, and Clojure

https://aiprobook.com/deep-learning-for-programmers/

Numerical Linear Algebra for Programmers: An Interactive Tutorial with GPU, CUDA, OpenCL, MKL, Java, and Clojure

https://aiprobook.com/numerical-linear-algebra-for-programme...

sansnomme · on Aug 26, 2019

Congrats on shipping! Are you going to be writing the JVM for Clojure people book you suggested before? Am really looking forward to it!

dragandj · on Aug 26, 2019

First I have to finish the ones that are in progress :)

jonahbenton · on Aug 26, 2019

Amazing, amazing work. Thank you.

fnordsensei · on Aug 26, 2019

I can recommend this episode of The REPL podcast, where the author talks about some of the whys, hows, and current state of data science in Clojure: https://www.therepl.net/episodes/25/

dkersten · on Aug 26, 2019

Dragan, thank you for your continued hard work (and your well written posts!) I haven’t got the time yet, but I’m very much looking forward to reading both your series of “deep learning from scratch” posts and your books.

Scarbutt · on Aug 26, 2019

What's the pitch in using Clojure for data science instead of Python, production workloads?

thom · on Aug 26, 2019

You don't have the mature bindings to things like TensorFlow or Torch, you don't have good viz libraries, you don't have broad support for the types of analysis scipy allows, and beyond Weka and random stuff like XGBoost having Java bindings, you don't have access to a lot of different models.

That said, Clojure is _much_ better than both Python and R for data prep. You can build very nice, fast (parallel) pipelines with transducers etc, and stuff that seems like magic to tidyverse consumers in R is just everyday data transformation in Clojure. And despite the fact that Incanter more or less died, I still think the language would be a great fit for data science if the community was there, and Dragan's work really deserves that sort of attention. The foundations are already far superior to what's available in R and Python (e.g. you are doing stuff on the GPU on day one, you can do bayesian analyses in some cases thousands of times faster than Stan etc).

thom · on Aug 26, 2019

If you're downvoting me, tell me what I am getting wrong cos I do this stuff as part of my day job and I'm very much open to better workflows.

mumblemumble · on Aug 26, 2019

You don't need to sell me on Clojure being a nicer foundation than Python in most respects, but the thing I keep running afoul of when doing data science on any JVM language the performance hit from all the copying it takes to pump data back and forth across JNI.

The showdown that's more interesting to me is Clojure vs Julia, which is very nearly an acceptable Lisp, and also has a nicer interface to C libraries. And, IIRC, also the ability to interface directly with C++ libraries, without having to first wrap them in a C-compatible interface.

dragandj · on Aug 26, 2019

There is no copying back and forth across JNI, thus no particular performance hit there (in Uncomplicate libraries).

mumblemumble · on Aug 26, 2019

Well, there wouldn't be once data is already copied into Uncomplicate data structures. But surely you can't just pass a pointer to the guts of a Java array, and do have to copy data back and forth to get it into Uncomplicate data structures in the first place, don't you? Otherwise, how does the C/Fortran/whatever code deal with the fact that the JVM's garbage collector reserves the right to move data around?

dragandj · on Aug 26, 2019

Why would you pass a pointer (or the contents it points to) to the guts of a Java array? Neanderhal does not require Java arrays (although it supports transfer to/from arrays for convenience).

Please try Neanderthal; there are lots of getting starting resources. You can benchmark it yourself (very easy to do in Clojure) and see...

I assure you that the only copy you would need is the same one you need in C, C++, or any language: the one from the source of your data (IO such as database, network, scv string etc). And even this is not required if you initialize the vectors randomly (which is often the case).

marmaduke · on Aug 26, 2019

> you can do bayesian analyses in some cases thousands of times faster than Stan

What exactly does that mean? What sort of problem is taking 1e3x longer in Stan in C++ than a JVM language?

thom · on Aug 26, 2019

It's more C++ vs CUDA than C++ vs JVM.

iamcreasy · on Aug 28, 2019

> you are doing stuff on the GPU on day one, you can do bayesian analyses in some cases thousands of times faster than Stan etc

I am curious. Could you please elaborate on it a bit more? Thank you!

elwell · on Aug 26, 2019

Not a Clojure Data Scientist, but I would imagine a functional programming language would lend itself well to data consumption and transformation.