FORTRAN has multidimensional arrays, and the compilers know about them and optim...

cf498 · on Nov 28, 2018

> C people just do not get multidimensional arrays. They think an array of arrays is good enough.

I have to say, i dont get what you are trying to say. Do you have a source that explains it?

What is the difference between a multi dimensional array and arrays of arrays? And what does a FORTRAN compiler optimize here? It is a piece of memory written and organized in a certain way. What is there to organize. You basically have a pointer to a memory region with an offset for the position at row X and column Y.

What am I missing? Are you only talking about the syntax for the final address? What has that to do with the compiler? Do you mean the parser?

sparky_z · on Nov 28, 2018

I'm not a Fortran or C programmer (so take this with a grain of salt), but as I understand it, Fortran is so optimizable because it's arrays are not fundamentally just a memory layout with pointer math. In Fortran, arrays are abstract entities, which allows the compiler to lay them out in the best possible way to take advantage of vectorization for the current code. Originally, pointers weren't even a feature of the language, though they were bolted on in a later version.

Sure, an expert programmer can replicate this speed in C by being careful about memory layouts, etc. But in Fortran, everyone gets it for free.

geoalchimista · on Nov 28, 2018

I would add that since Fortran 90, you can also get scalar-array and array-array operations for free. The `elemental` keyword that automatically broadcasts pure scalar functions to multidimensional arrays works almost in a way reminiscent of NumPy (I'm a weird one who came to do some serious work in Fortran long after known C and Python).

alfalfasprout · on Nov 28, 2018

an array of arrays is not necessarily contiguous in C(++). Indeed, if allocating on the heap, you end up with a bunch of discontiguous memory that's not necessarily correctly aligned.

A good tensor implementation accounts for strides that are SIMD compatible (eg; each dimension is a multiple of the SIMD register width).

int_19h · on Nov 28, 2018

An array of arrays is necessarily contiguous in C - this is implied by the type. An array of pointers to arrays will, of course, not be contiguous - and is the only way to get a dynamically sized heap-allocated 2D array in C (VLAs give you stack-allocated arrays, with all the size limits that entails).

In C++, this all is best handled by a library class.

cygx · on Nov 28, 2018

Heap-allocated dynamically-sized NxM matrix in C99:

    double (*mat)[M] = calloc(N * M, sizeof (double));

int_19h · on Nov 28, 2018

Ah, good point. I always forget that VLA types in C99 are actually types, and so you can use them in these contexts as well.

It's a shame they killed VLAs as a mandatory language feature. They didn't make C into Fortran (which I think was the hope, between them, complex numbers, and "restrict"?), but they did make some things a great deal more pleasant to write.

cf498 · on Nov 28, 2018

>and is the only way to get a dynamically sized heap-allocated 2D array in C (VLAs give you stack-allocated arrays, with all the size limits that entails).

Why wouldnt you be able to create a dynamic array of arrays with placement new and a cast.

int_19h · on Nov 28, 2018

A cast to what, though? You need to be able to write the type of that array, but you can't do that unless dimensions are compile-time constants.

Const-me · on Nov 28, 2018

> A good tensor implementation accounts for strides that are SIMD compatible (eg; each dimension is a multiple of the SIMD register width).

Never implemented any tensors, but in my experience, sometimes you better do what GPUs do with some texture formats: switch from linear layout into dense blocks layout. E.g. for dense float32 matrix and SSE code, a good choice is a 2D array of 4x4 blocks: exactly 1 cache line, and only consumes 4 out of 16 registers while being processed i.e. you can do a lot within a block while not hitting any RAM latency.

bane · on Nov 28, 2018

This might help

https://dzone.com/articles/memory-layout-of-multi-dimensiona...

http://exodus.physics.ucla.edu/Fortran95/scf95.lect3.pdf

xiii1408 · on Nov 28, 2018

There's a better solution, to your problem: pass an "index object" as the argument to operator[].

So `T& operator[](index_t)`, where `index_t` could just be a tuple.

  a[{1, 2}] = 12;

This has virtually the same syntax, will be compiled in the same way by any modern compiler, and lets you create special index types that have invariants (suppose you want i0 < i1).

beojan · on Nov 28, 2018

That's not a "better solution", it's a workaround.

kitd · on Nov 28, 2018

I disagree. The tuple is an index into an array. The multiplicity of the tuple indicates (literally, in fact) the multiple dimensions of the array. Ie to the coder, it is the index that takes on the multi-dimensionality, not the array.

slavik81 · on Nov 28, 2018

That's a current topic of discussion for C++20 and beyond. https://cor3ntin.github.io/posts/sandiego/#deprecate-comma-o...

Animats · on Nov 28, 2018

Wow, two decades after I was pushing for it.

jhasse · on Nov 28, 2018

So the problem with C++ is that you need to write one of

    foo[a][b]
    foo[{a,b}]
    foo(a,b)

for multidimensional data instead of

    foo[a,b]

? Doesn't sound like that big of a deal to me.

std_throwawayay · on Nov 28, 2018

It's one of those many tiny things that add up. There's good reasons why C++ gets so much hate and there's good reasons why it's used in so many applications. I see no good reason to not have multiple array indices.

jhasse · on Nov 29, 2018

> I see no good reason to not have multiple array indices.

It would add a little bit complexity to the language with only a very smal benefit: The same can be achieved with just a few characters more or () instead of [].

std_throwawayay · on Dec 1, 2018

Using Matrix[i,j] is the most intuitive way for most people. It's what you'd use if you don't know anything and it's what you'd understand immediately. That's the benefit for the user.

AndrewGaspar · on Dec 3, 2018

And to be honest, in the list of things that the C++ folks seems to think is important, multi-index index operators seems like it would rank substantially higher than a lot of the other things in the list in terms of value, and a lot lower in-terms of additional complexity.

shaklee3 · on Nov 28, 2018

If this was true then the BLAS Fortran library would run faster when calling with Fortran as opposed to C, but it doesn't. As someone else pointed out, multi-dimensional arrays are contiguous in C, by definition. It's not like Fortran has some magical powers to unlock instructions only available to that language.

jacoblambda · on Nov 28, 2018

The argument is not that because Fortran has multidimensional arrays it is faster, it is that because Fortran has multidimensional arrays it can and will often be faster in most cases for free.

Obviously an extremely optimised library such as BLAS will run just about as fast in C and Fortran. For a good majority of code out there however people just don't want to put in the effort to carefully optimise array layouts. The feature moves the responsibility of optimisation to the compiler which can in many cases do a better job than a normal programmer with reasonable assurances that the optimisations will not introduce bugs and / or side effects into the code.

celrod · on Nov 28, 2018

Languages like C(++), Rust, and Julia have vector intrinsics that are unfortunately lacking in Fortran. Is there a way to wrap [SLEEF](https://github.com/shibatch/sleef) in Fortran?

While writing fast code in Fortran is easy, I think it is unfortunately harder to write "extremely optimised" libraries like BLAS in Fortran than in these other languages for that reason. Without the low level control, you can't be as explicit about vectorization patterns.

Eg, for BLAS, you normally write a lot of kernels, which you then apply to blocks of the matrices. The blocks will not be contiugous; each column will be separated by the matrix stride (or rows separated by a stride, if row major).

gfortran 8.2 will not succesfully vectorize a matmul kernel (I did file an issue on bugzilla). C (with vector intrinsics) does great. Julia does well too, but not perfect: there are a couple redundant move instructions per for loop iteration.

When it comes to applying the kernel to blocks, Fortran will also try and make array temporaries of each block, which would cripple performance. In languages with pointers, all you'd have to do is pass a pointer and the stride between columns. I haven't tried playing with pointers in Fortran, but I guess that would work too? Julia often refuses to autovectorize anything as soon as you start playing with pointers; does Fortran's stance on aliasing mean it fairs better? If not, Julia at least has the LLVM vector intrinsics and metaprogramming that let you force it to generate nearly optimal code despite its protests.

But to say something nice about Fortran: dynamically-sized-mutable-stack-allocated-arrays. (-fstack-arrays)

In Julia, stack-allocated objects are immutable, and (partly for that reason) also generally very small. You wouldn't want to create a new 30x30 array every time you want to edit a single value. You also can't get stack pointers, meaning you can't use masked load/store operations to vectorize code when the array dimensions aren't a multiple of SIMD-vector-width.

This means to write fast code, I normally heap allocate everything. And to avoid triggering the GC, that means keeping the same arrays alive, and avoiding any temporaries.

With Fortran, you don't have to worry about any of that. You can always write convenient and clear code, and it's likely to be very fast. If you hold vectorization in mind while laying out your data and computations, compilers will normally do a great job figuring it out.

celrod · on Nov 29, 2018

I think it's worth pointing out that someone who knows way more about Julia internals than I do (yuyichao) said this comment is wrong on several points on the [Julia discourse](https://discourse.julialang.org/t/fortran-vs-julia-stack-all...).

Highlights include that that "Stack allocation is a property of the object, mutability is a property of the type", and that mutable objects that do not escape a function are likely to avoid heap allocation (ie, try to avoid calling non-inline functions).

Unfortunately, that can sometimes mean having to roll your own functions like `sum` and `dot_product`. A simple for loop is easy, but that some of these basic functions don't inline by default can make it a little more cumbersome.

pjmlp · on Nov 28, 2018

You need to look at commercial Fortran compilers when talking about performance, e.g. PGI, Intel, XL, not gfortran.

celrod · on Nov 30, 2018

Big disclaimer: my experiences are limited, and compiler comparisons are going to be highly variable, depending on the particular code bases.

That said, I have a statistics model that has to be fit thousands (millions?) of times. As a baseline, running it using JAGS (a popular tool for Gibbs sampling) for a given number of iterations and chains in parallel took >160 seconds.

The same model in Julia took 520 ms, g++ & gfortran 480 ms, and ifort took 495 ms.

The code I compiled with g++ used vector intrinsics and SLEEF. Not exactly a revelation that code like that will be fast. But part of my point is that C++ easily lets you take measures like that when you want more performance or control, therefore it is more optimizable. Switching to a commercial compiler wasn't an automatic performance boon.

All else equal, I'd much prefer sticking with an open source compiler suite. I also won't be a student with access to the Intel compilers for much longer.

The actual model is addressing a fun problem (improving the accuracy of satellite-space debris collision probability estimates), and I'm organizing some of it into a "Gibbs Sampling" presentation for tomorrow, so I could put the material online.

chewxy · on Nov 28, 2018

Do you mean syntax? Because if layout, then Gorgonia's tensors are actually flat dense layouts controlled by an access pattern. Gorgonia supports fortran layout as well as C layout.

Syntax is a bit painful though.

cheez · on Nov 28, 2018

I just overload operator() for multidimensional data.