What do you mean by "TensorFlow is a subset of general purpose computing, and th...

zackmorris · on June 28, 2018

I'm not sure if you or albertzeyer asked first, but what I meant by that is that MATLAB is similar to any other C-like language, except uses the vector as its primitive instead of something like integer or float. That's really all there is to it. Other than a few details about notation, every major concept of MATLAB stems from that and is easily understood and predictable. MATLAB (or non-proprietary analogs like GNU Octave) let you write C-like code and then its runtime deals with architecture and optimization details internally (so there are no limitations on vector size or number of samplers or anything like that).

Whereas things like TensorFlow, CUDA, OpenCL, OpenGL etc seem to deal more with DSP processing of buffer(s). They all have their own abstractions and lingo which work extremely well for certain use cases, kind of like domain-specific languages (DSLs).

The end result is that it's trivial (at least in theory) to go from a TensorFlow implementation to a MATLAB implementation. But it's very difficult to go the other direction. Another way to think of this is that any solution written in TensorFlow can be run by MATLAB, but the reverse is not necessarily true. Trying to run MATLAB code within TensorFlow might encounter hardware limitations or other restrictions that makes the code run thousands of times slower.

Now I could be wrong about this - maybe they truly are equivalent. But until I'm able to transpile MATLAB code directly to TensorFlow or OpenCL or whatever and have it be performant, I'm going to continue working under this assumption.

albertzeyer · on June 29, 2018

In TensorFlow, a vector (or a tensor) is also one of the fundamental datatypes.

> The end result is that it's trivial (at least in theory) to go from a TensorFlow implementation to a MATLAB implementation. But it's very difficult to go the other direction.

I would say just the opposite. From Matlab to TF should be trivial, but the other way not so much. Or can you give me an example in Matlab which would be hard to translate to TF?

I can give you one in TF which would be very hard in Matlab: E.g. how to implement async SGD, with a parameter server and all that logic?

zackmorris · on June 29, 2018

I'm much more familiar with MATLAB than I am with TensorFlow so I might be wrong about the lowest-level computation stuff. Part of it is an architecture issue. I can imagine writing a transpiler from MATLAB that automatically dices up the matrix calls to work over a distributed system, maybe with something like Go or Elixir running under the hood and sending the jobs in batches to other processors, then joining all of those results, which abstracts away most concurrency issues.

I find it more difficult to think about TensorFlow this way, because to me it seems more similar to OpenCL or OpenGL, where you are dealing with multiple cores reading from a single memory space and transpiling the TensorFlow code to something like a shader. That type of code works more like SIMD or VLIW and has trouble with things like branching or dynamic codepaths or even reading/writing from/to a computer's main memory.

So what I'm looking for is a way to write at the high-level abstraction of MATLAB syntax and then have the runtime dice that up internally as TensorFlow or Elixir or whatever (that part doesn't matter to me as much). Maybe Julia can do that but I haven't learned it yet. But I want to stay away from the boilerplate - things like binding buffers or carrying the mental burden of having separate vector and system memories.

bmer · on June 29, 2018

Yeah, I am pretty certain you're not right in this understanding. I am not the most knowledgeable, but basically all I know is scientific computing, so here are my two cents:

Any C-like language does not use the "vector" as its "primitive" (not clear what you mean by primitive, and I am instead interpreting it as "machine type"), since integers, floats, characters, are the "primitives" (machine types), which are the things that the hardware itself "knows" how to store using a consistent system ultimately involving groups of bits.

On top of these primitives, single-type arrays (e.g. basic C arrays) are built, for instance a string is an array of characters.

Using C/typical hardware, a simple array is literally stored in memory "contiguously". An array only needs two pieces of information to define: what is the memory address of the first element, and how many elements are there. Thus, if you want the 6th element, the machine literally finds the first element, and then moves 6 memory "blocks" (C arrays can only have a single type, so each element needs the same amount of space to store) forwards to the 6th element, and then gives you back what it finds there. The simplest/fastest interpretations of a C array don't even care about how many elements there are, so if your array has 5 things, but you ask for the 6th thing, you just get back the data stored where the 6th thing should have been. Typically, this is garbage, but maybe it's super valuable (e.g. the first character of a stored password is put there..."overflow vulnerability"...to get the second character, ask for the 7th element, etc. etc. etc.)

Okay, going off topic. Back to the point.

Other data structures that can be built upon primitives are structs, enums, etc., but the array is important relevant to us because it's easy to think of an array as a linear algebra vector. C does not implement dot products natively, but one can easily write a function called "dot_product" that takes two lists of integers, or floats and returns a scalar integer or float. Some higher level languages (e.g. MATLAB) do exactly this, and save you the work of implementing all of linear algebra again.

Matrices are more tricky: for usual non-parallelized hardware, they are still stored contiguously as an array of primitives, but some extra data might go along with this array to tell you how many elements you have to pass before you're onto the next row. Again, MATLAB just provides this sugar-coating.

So where do GPUs come in?

Well, think of a simple grayscale image on a monitor, and note that it is built up of little pixels: so, this image can be thought of as a matrix of integers that range from 0 to 255 inclusive (256 values total), where the (i, j) corresponds to how light/dark each (i, j) pixel is: if your monitor is a square that contains 1000 pixels by 1000 pixels, you can represent it by a 1000x1000 matrix. Using C-like languages on typical hardware, you are basically representing your 1000x1000 matrix as a 1,000,000 long array. Imagine you want to transform each pixel (independently of the others) by applying some function you have written --- in typical non-parallelized hardware, you would go through each of the million entries in that array, and apply your function, one after the other (serially).

This is the sort of operation you typically have to do when you want to transform images on a screen. You can imagine that doing it serially would get more and more tiresome, as your images get more detailed, your monitors get higher resolution, etc., so purpose built hardware called GPUs were created, which can represent a matrix as a true machine type: a true "primitive", where memory is actually (i, j) addressable. You can pass matrices to the thing, and get matrices back. If you can do matrices, you can also obviously do vectors. Most importantly, you can give the GPU instructions that say "this functions should be applied to each pixel, and it doesn't care about other pixels when transforming one", and the GPU will apply that function in parallel. It will do a 1,000,000 calculations at once (assuming it can store a 1000x1000 matrix, otherwise it might have to get into some other abstractions, but now we are going off topic).

Eventually people figured that any problem that can ultimately be thought of as "I have X data points, and I want to apply a function on each data point independent of the other" could be run efficiently on a GPU.

Machine learning is just one application where you need to deal with lots of matrices.

People have come up with languages that know about matrices as a machine type (early examples being graphics shader languages), and TensorFlow fits in somewhere in this space, with in-built sugar-coating for things the creators thought were "relevant". I have never used TensorFlow though, so someone else can probably give you more detail.

--------------------------------------

Some things I have not mentioned:

* vectorization: basically, how to do vector operations "smarter" on non-parallelized hardware, something that many languages (e.g. MATLAB, NumPy) and hardware now support

--------------------------------------

Anyway: "transpiling" vector/matrix operations from MATLAB/NumPy to TensorFlow/OpenCL/Cuda is a breeze conceptually (but I bet it's kind of boring problem for advanced programmers). If a transpiler doesn't exist, it's probably because no one has put in the work to open source it. One example of a Python+NumPy to Cuda "transpiler" is Continuum's Numba: http://numba.pydata.org/numba-doc/0.38.0/cuda/index.html

The devs there are also thinking about OpenCL, and some progress was made. This is the current state of that task: https://github.com/numba/numba/pull/582

SPIR-V is basically a standardized "middle language" that "transpilers" can use. What you do is translate from language X -> SPIR-V -> OpenCL/Vulkan/whatever -> hardware drivers -> machine language

And SPIR-V is work in progress: https://en.wikipedia.org/wiki/Standard_Portable_Intermediate...

There is also similar work for MATLAB, but its done by the owners: https://www.mathworks.com/matlabcentral/answers/25973-matlab...

Also, I have to echo other comments about how no one really cares about MATLAB that much in sci-comp, probably because it's proprietary and expensive. Heavy work is usually done in Fortran (legacy code), C/C++, or experimental languages like Julia, or languages that are more flexible/open-source like Python (NumPy/SciPy are just wrappers around lots of C/Fortran).

People who use MATLAB tend to be those who were just "introduced to it" (e.g. through school) and have never hit any limitations that need them to switch to more flexible languages. Or they don't even realize that more flexible languages exist? It's a matter of comfort/goals (I don't advocate that everything should be done in C, and MATLAB is great for quickly doing many things, but many other things that have been done in MATLAB would have been less painful/faster if it used...say, Python+NumPy, because then you can use everything the Python ecosystem has to offer, and aren't limited to the MATLAB ecosystem).

ChrisRackauckas · on June 29, 2018

>say, Python+NumPy, because then you can use everything the Python ecosystem has to offer, and aren't limited to the MATLAB ecosystem

That has an implicit assumption that Python's ecosystem is better for this kind of work than MATLAB which just isn't true in many areas of scientific computing. Take differential equations for example. MATLAB has matcont which is a great bifurcation analysis library which is unrivaled by Python (PyDSTool is a toy in comparison). Python doesn't really have usable DDE solvers. Python's ODE (SciPy, Sundials wrappers, etc) solvers are much less developed than MATLAB's, doing okay since it's wrapping some standard software but with a lot less flexibility than MATLAB. And you can keep going. And to top it off, compilation with Numba or Cython doesn't work well in the case of differential equations since there are a lot of small function calls. So it's not so clear that Python is good at scientific computing at all, and in fact in some regions like differential equations it's quite a step downwards. Generally, the ecosystem in this area is well developed in Julia + the commercial offerings (MATLAB, Maple, Mathematica).

That said, Python's ecosystem outside of scientific computing is so much better than MATLAB's. However, at that point I would think Julia is a good choice.

bmer · on June 29, 2018

> MATLAB has matcont which is a great bifurcation analysis library which is unrivaled by Python (PyDSTool is a toy in comparison).

I know lots of people who work with dynamical systems (math bio), and most people don't use MATLAB's matcont, as it is a toy compared to XPP/AUT. People tolerate XPP/AUT's archaic userface for its power, and can also export data easily to Python for grahping.

Also, the DDE thing is no longer true: https://aip.scitation.org/doi/10.1063/1.5019320

> Python's ODE (SciPy, Sundials wrappers, etc) solvers are much less developed than MATLAB's

What do you mean by "much less developed"? ODE solving is an "old problem" in the sense that it has been optimally addressed in C/Fortran/C++ and its just better to make wrappers around that existing code. ODEPACK/SUNDIALS/PETSc are great examples of a cutting edge standard ODE/PDE solvers which have Python wrappers, and if you've got wrappers around them, you're going to be hard pressed to find anything better.

Then there are people who are developing new ways of integrating ODEs numerically, and for them, MATLAB is only good as a first pass/prototype thing. A professor I know working on something like this doesn't bother using MATLAB for it (C++/Python).

> compilation with Numba or Cython doesn't work well in the case of differential equations since there are a lot of small function calls

Again, I have no clue what you mean by this. I just had a paper published which modelled a system involving hundreds of ODEs, with every involved function in the dynamics being compiled using numba's "nopython" mode. The stuff is blazing fast, only about 10 times slower than a C implementation. The structure of the program looked like this:

scipy.integrate.odeint -> f -> sub-function ... -> sub-function ...

`f` (the function to compute derivatives at each time point) and `sub-function`(s) (which are, I guess, the small function calls you were referring to?) were compiled using numba's nopython mode. It was super easy. You can have a look at the relevant code here: https://github.com/bzm3r/numba-ncc/blob/master/core/dynamics...

So I don't get what you mean by "Numba/Cython doesn't work well in the case of DEs since there are a lot of small function calls".

Note that MAPLE/Mathematica are not in the same camp as MATLAB, since they are primarily symbolic mathematics engines. Both are fantastic though, I agree.

No complaints about Julia. Awesome stuff. Just like Rust. I am not a huge Python fan, in the sense that I am actively moving away from it to "better tools", but I still think Python mostly beats MATLAB.

ChrisRackauckas · on June 29, 2018

>I know lots of people who work with dynamical systems (math bio), and most people don't use MATLAB's matcont, as it is a toy compared to XPP/AUT. People tolerate XPP/AUT's archaic userface for its power, and can also export data easily to Python for grahping.

No, XPP/AUT is almost strictly less powerful than matcont. It can recognize and handle a much smaller set of bifurcations. Even PyCont (part of PyDSTool) can do some things XPP/AUT can't (though there it's much more of a tradeoff). XPP/AUT is good enough for most math bio though since these higher order bifurcations are much more rare to actually find in models.

>Also, the DDE thing is no longer true: https://aip.scitation.org/doi/10.1063/1.5019320

That's matching dde23 which is only non-stiff with constant lags. Still very very simple and cannot handle a lot of DDEs. Mathematica and MATLAB handles state-dependent DDEs. Maple, Julia, and Fortran via Harier's RADAR5 handle stiff state-dependent DDEs.

>The stuff is blazing fast, only about 10 times slower than a C implementation.

This is the overhead I was mentioning. I say it's slow since it's 10x slower than the C implementation. If that's fast enough for you, that's fine, but there's still a lot to be gained there.

>Note that MAPLE/Mathematica are not in the same camp as MATLAB, since they are primarily symbolic mathematics engines. Both are fantastic though, I agree.

Look at their differential equation solver merits in full detail and you'll see that there's a ton of things these cover that Python libraries don't. I was surprised at first two, but they aren't just symbolic engines. Maple has some of the best stuff for stiff DDEs for example, and Mathematica's Verner + interpolation setup is very modern and matches the Julia stuff while MATLAB/SciPy etc. is still using Dormand-Prince (dopri5, ode45).

I am not saying Python's libraries aren't fine. They definitely are fine if you don't need every little detail and if you don't need every lick of speed. But as you said, it's leaving behind 10x on the table. Also, a lot of its integrators don't allow complex numbers. Also it doesn't have access to much IMEX and exponential integrator stuff. So the Python libraries are fine, but the are far from

bmer · on June 29, 2018

Hmm. Interesting stuff. I am surprised that Julia has come along so far. Where can I read more about Julia's integrators?

ChrisRackauckas · on June 30, 2018

It's all in the docs for DifferentialEquations.jl. For example, here's the page for the first order ODE solvers: http://docs.juliadiffeq.org/latest/solvers/ode_solve.html .

Koshkin · on June 29, 2018

> does not use the "vector" as its "primitive"

Well, doesn't the SIMD architecture make vectors "primitives", even if in some limited sense?

bmer · on June 29, 2018

As I said:

-------------------------------------

Some things I have not mentioned:

* vectorization: basically, how to do vector operations "smarter" on non-parallelized hardware[SIMD?], something that many languages (e.g. MATLAB, NumPy) and hardware now support

-------------------------------------