Fortran is an excellent choice for numerical computation. Its non-aliasing assumption allows for compiler optimizations that dramatically speed up computations. It is even easy to get started with concurrent code through OpenMP. When I need to run a simulation or a discontinuous optimization algorithm, I turn to Fortran first.
Here are some resources I have collected over the years.
Glad this was posted. First thing I did on loading the comment section was ^F "alias". Fortran is still used for numerics because it makes writing fast numerical code easy. C simply wasn't designed for it, just as Fortran wasn't designed for low level system/OS programming.
In C99, you can use the "restrict" keyword to tell the compiler an array is not aliased, and even without restrict it's often possible to get the same optimisations in C or C++. When I was still doing HPC in academia, all our C++ code was maxing out the memory bandwidth. I think in practice the difference rarely matters these days.
Insomuch as I've seen HPC code written by science academics, both when I was one and later when I became an HPC guy, as much as I cringe at some of the Fortran, the C/C++ can be far worse... and I wish that I could retroactively restrict some of my colleagues and customers to sticking with Fortran.
Wow, look at the downvotes! I've been saying this to people for 20+ years, in conference talks and face-to-face conversations, and I've never had a negative reaction. Happy to learn if folks think I'm wrong about my personal experiences. If you've had a different experience, happy to learn about that too (although I'd be surprised if you'd downvote me for that.)
Note that I'm not saying that science academic C++ is never great; I'm saying that it's often unfortunate. Is the code you're talking about code that's progressed through several maintainers without becoming sad? That's often where things go wrong, not the initial code, but keeping it going, cleanly, through multiple collaborators / post-docs / grad students.
I agree that meshing is definitely a difficult situation; I've never seen a solution (for unusual geometry) that I liked much. Things that can pretend to be rectangular? No problem. Very adaptive meshes? Argh.
Hard to say, but I'm looking at 39 contributors, 539 stars, 3,005 commits [1]. Nicely documented, pretty extensive list of users, CI on Linux and Windows. But with that said, yes, I've also dealt with absolutely horrendous academic C++.
Are there any users that are science academics? I wasn't talking about computer scientists / computer graphics / visualization researchers at all in this thread; they do great with C++. I clicked on a bunch of links of the users of this library and didn't see any physics, astronomy, chemistry, fluid dynamics, etc.
So it sounds like you're somewhat like the folks I'm talking about, even if the other users of this code are not... but have you had a lot of turnover of your coders, like grad students and postdocs that turn over every 2-4 years? Because that was the full nature of my complaint: science not-computer-scientists over several maintainers doing better in Fortran.
The hand-unrolling is predicated on the knowledge that the arrays do not overlap. It is obvious to the compiler that the locals do not alias with the arrays, or with each other.
The stores to c[] could alias with a[] or b[], but that hardly matters; we assume that all the array loads and stores take place. We have done ourselves the unrolling that was hindered by the suspicion that the arrays overlap, and could do more of it.
When coding in C for tightness, it helps to pretend that locals (at least basic type scalar ones) are registers and references to arrays and structs are memory loads and stores. Then think like an assembly language programmer, somewhat. It's better to use a few more locals than a few more loads and stores. Try to load once, process with locals, store once.
yes, but restrict is limited in that it can only be used to tell two pointed memories could never alias. Finer grain aliasing can be done using unions and structs, but it's not as flexible as Fortran and the possibility of non-aliasing is easily broken just by allowing address to be taken.
Some processors expose performance counters but you don't need to accurately measure memory bandwidth to know you are maxing out. Suppose you know that the memory bandwidth is 1Gb/s and you program reads 100Gb of data to produce the result. If the execution takes ~100s then you know you hit the maximum memory bandwidth.
As an introduction, you might want to look into John McCalpin's history as a shallow-water oceanographer and his subsequent fixation on the STREAM benchmark. TL;DR: some algorithms aren't friendly to caching.
I have no idea if this applies to smarnach, but it definitely applies to other people.
Nice bit of trivia! I was an admin of a Primos machine briefly while also doing scientific programming.
Primos wasn't much good compared to the commercial Unixes that were appearing. It did have one nice-ish feature late in it's history; the CPL scripting language.
Wow, big trip down memory lane here because my first job was on a Prime mini-computer! Wrote a complete system using CPL scripts (with the occasional bit of Fortran thrown in) on Primos.
Shame to hear that Prime / Primos is now defunct but that job was back in the 80's!
It means that the compiler knows that pointers/arrays don't overlap in memory. Data can therefore be safely stored in registers, instead of continually re-read from memory, to account for possible alterations between computations [1].
As a counterpoint, it is worth noting that essentially no high-performance BLAS or LAPACK implementations (e.g., GotoBLAS/OpenBLAS, ATLAS, BLIS, etc.) are implemented in Fortran and instead tend to use a combination of C (in order to use vector intrinsics) and assembly.
To be fair, there are GCC built-ins that allow you to tell the compiler that arrays are nonaliasing, but you're right in that it's more difficult in general to get a compiler to spit out fast numeric code with C than it is with Fortran.
I feel like it's a "right tool for the job" situation. Fortran was literally built for the purpose of numerical computation. Multi-dimensional arrays, non-aliasing arrays and pointers, multi-dimensional array intrinsics, intrinsic array binary operators... all come with Fortran right out of the box. Of course, C/C++ can do all of these things, but all of the proposed methods, at least that I have seen, feel like they are trying to hammer a screw.
I have one question - if I compile and link a program written in C and Fortran, can the linker inline Fortran functions in C? I know it can do it if I link object files produced from C source, but can it do it if I link object files from separate languages?
Having never done it with Fortran, I don't know. It should be possible, because e.g. on Linux, everything adheres to the ABI. However, the ABI doesn't specify how to interpret pointers AFAIK. Array ordering [0] and indexing is different in Fortran compared to C, so I'm sure there's pitfalls you'd have to account for.
It's not the link editor's function to inline; an optimizing compiler does that. A linker can, eventually and depending on the platform, perform some binary optimization, as is for example the case with Sun Studio compilers on SPARC (-xlinkopt=2), or with GCC (-flto), but that's not inlining.
To nitpick, compilers nor linkers typically are not able to do "binary optimization", if by that you mean analyzing the asm and optimizing that. What happens when you compile with -flto or equivalent is that the compiler hides some internal representation in a special section in the object files, then at the link stage (if you also use -flto when linking) is that the linker invokes the compiler again which deserializes those trees from the object files, and does the inter-procedural optimizations on those.
Fortran is still better at multidimensional arrays. Other than Matlab, few other languages do multidimensional arrays at all. C-derived languages use arrays of arrays, and use templates, macros, or objects for multidimensional arrays. Fortran optimizers have lots of multidimensional array-oriented optimizations which other compilers usually lack.
I tried to talk the Go and Rust people into putting in multidimensional arrays, but the discussion degenerated into bikeshedding, with arguments for fancy but slower representations so you could take arbitrary N-dimensional slices from a multidimensional array.
As I member of the Rust team, I personally would like Rust to be a suitable replacement for fortran, and I suspect most Rust core contributors would feel the same. Rust already has competitive performance so it feels like a small step to be able to fit into fortran's niche. I don't know the particular multidimensional array discussion you experienced, but evolving a language takes slow time, and meanwhile people experiment with what is possible out-of-tree, like with the ndarray library, which seems to be well respected.
Right now there is a focus on SIMD in Rust, which is crucial for this domain, so there is progress being made.
> I don't know the particular multidimensional array discussion you experienced, but evolving a language takes slow time, and meanwhile people experiment with what is possible out-of-tree, like with the ndarray library, which seems to be well respected.
A Fortran-like multidimensional library could effectively happen out of tree, but ndarray is closer to Python's numpy.ndarray than to Fortran arrays. And this means that ndarray can not support negative indexing (see here: https://github.com/bluss/rust-ndarray/issues/152). I started a prototype of Fortran like array here: https://github.com/Luthaf/mudi, if anyone want to look at the code or even work on this.
My only problem at the time was that `Index` must always return a reference, and cannot return a new struct. From my understanding, this could be solved with the HKT/ATC work.
> Right now there is a focus on SIMD in Rust, which is crucial for this domain, so there is progress being made.
This is very exciting! And knowing the Rust community, we may even get a cross-platform cross-architecture SIMD library with a nice API!
The SIMD stuff is extremely complex. The focus for now seems to be to _not_ work on a nice API, and tackle the problems with just getting the absolute basics done in an acceptable way. Higher level stuff will come later.
So in my previous life as a physics student I did talk to many physicists about why Fortran was popular (and later when I discovered Rust, discussed Rust with them).
The main blocker is just that fortran has a wonderful ecosystem of scientific computing packages and you can find almost any operation you need out there. This is not true of even Matlab or Mathematica. I have often had to reimplement random algorithms in Mathematica. Fortunately mathematica gives some pretty good high-level tools, so it's not that bad, but it's still annoying.
Multidimensional arrays was brought up as an issue with C/++. AIUI using macros in C++ makes it tolerable, but not great. I did ask about `arr[(1, 2, 3)]` and folks seemed okay with it. Many have used mathematica where indexing is done with double braces (`arr[[1,2,3]]`) so it isn't too bad.
Implicit bounds checking was something that was frowned upon. Could even be a dealbreaker. While usually bounds checking can get elided by llvm, scientific computing does tons of indexing and the overhead might crop up in the profile. I'm not actually sure of how well this maps to the real world, and never got a chance to profile this, but it's a concern.
Libraries like ndarray could turn it off but it would basically be unsafe :|
-------
I did use Rust for some scientific computing. At the time, IIRC numeric types had to be more explicit, which was annoying, but that was pretty much my only issue with it (I didn't need any existing scientific libraries). I didn't know of any good visualization libraries so I would usually output to json/csv and visualize in Mathematica. But I have done the same for Fortran code; Mathematica is just awesome at visualization and analysis.
I recall having access to a scientific computing cluster where the only up to date thing was ifort. It had an ancient python, and icc (IIRC it was ancient too, but I'm not sure). Had to compile a bunch of compilers myself to get my own code to run. Asked about this and apparently everyone just used fortran. Python is common in post-processing but a lot of the tools they were using were ancient or maintained by people with similarly ancient toolchains so stuff still worked on it.
> I tried to talk the Go and Rust people into putting in multidimensional arrays, but the discussion degenerated into bikeshedding, with arguments for fancy but slower representations so you could take arbitrary N-dimensional slices from a multidimensional array.
This is the most frustrating aspect of Rust, to me, as a developer of scientific software. Rust has great potential to challenge Fortran in this domain, but numerics is mostly an afterthought in Rust.
> Rust has great potential to challenge Fortran in this domain, but numerics is mostly an afterthought in Rust.
A common problem with Rust: many people from different backgrounds sees it as a godsend (for some good reasons) and they'd like to replace their former language with it, but the language can't evolve fast enough to satisfy everybody at the same time.
It started as a language for desktop softwares (like a web browser, it's first use-case), and currently the focus is to evolve it to be suitable for back-end development (with asynchronous IO). I hope one day scientific software will become a first class citizen too.
While this is true in some sense, there's also a difference between the language changing to support a use-case, and the library ecosystem. The language doesn't _need_ to change for the async io stuff, though there is one feature that will be extremely helpful, "impl Trait". That feature is also needed for other reasons too, so it's not solely motivated by the async io stuff.
It is totally true that there are many areas that can be improved, and it's gonna take time and effort to knock them out. It's tough! My personal pet area is OS dev...
Can you go into a bit about what specifically is missing?
A lot of the people who work on the compiler/library support for this stuff aren't the primary consumers of it. I personally like working on this kind of code when I have time (I've done a bit of work on the rust bignum library), but it helps to know what's actually going to be useful to people.
Basically, OPs article gives a good overview on what scientists want out of a language used for large scale numeric simulations.
1.) Multi-dim arrays supporting array math, slicing and flexible indexing, sizeof always known (internally solved through lookup tables).
2.) Fast-by-default rather than safe-by-default. A safe language is useless if it gives 2x or more runtime overhead. Make it safe where it doesn't give you any performance penalty (i.e. compile-time checks), otherwise make the default performant or leave the choice to the programmer. Example: Pass-by-reference as default, no GC.
Essentially, I don't think Fortran will be replaced any time soon - maybe strong AI that can program all of this stuff from pure math specification will be here sooner than a Fortran replacement. To everyone not understanding this, read the article again. There's been a huge amount of effort going into this language the last 20 or so years - almost more than justified by the size of the HPC market, which is visible by all the compilers lagging behind the specs. If you want to help, go and improve gfortran (e.g. with better GPU support so people don't need the lagging PGI or locked down Cray compilers anymore).
There's a lot of activity for D in this area. For instance, there's a BLAS replacement[1] with good performance. Supporting numerical programming is a priority for Walter and Andrei. There's more work to do, but it is adding enough functionality that it will be a realistic option for those that like the language.
If you want to get started, here's my recommendation: Program Kernels only in Modern Fortran, learn about f2py bindings and do the other parts in NumPy. If you use Fortran purely as a number crunching language - no I/O, no setup, no config code etc. - it's pretty sweet actually - IMO there isn't much need for replacing it with something new that has 50 years of compiler work to catch up (which is near impossible anyways).
Not the OP, but a computational physicist writing a fairly decent sized codebase for a long term project, just went through the language choice and initial prototyping phase. I have extensively used Julia, Python, MATLAB, C99, C++98/11/14, Fortran and D for work and I believe in picking the right tool for the job.
I chose C++ (writing in a fairly modern style, but not eschewing aliasing-restricted pointers, etc. where beneficial) because it handily beats everything except Fortran for performance and is vastly superior to Fortran in terms of zero-cost abstraction. I'm writing code that's largely portable between CPUs and accelerators (mostly GPUs) and has a lot of interchangeable parts. With C++ I can use templates to achieve compile time polymorphism and genericity, while retaining Fortran level performance on CPUs at the cost of higher code complexity.
The Fortran story on GPUs has also been very disappointing initially, especially for those of us not interested in proprietary compilers, though I understand the situation is improving there. Ultimately, in my experience, while individual strengths of C++ can be reproduced in other languages, there's no other language that does it all. I hope that changes some day (for now, D looks like the most promising alternative to me) as writing C++ is often an exercise in frustration, but that's where we're at.
I'm in a similar situation as you, albeit with a large Fortran codebase that needs to stay - therefore I chose to write a transpiler in python that enables a Fortran codebase to be parallelised on CPU and GPU - see my profile link.
Hybrid-Fortran looks like an elegant workaround for an ugly problem, kudos for dealing with the storage order as well.
Ultimately, it seems to me the only way to justify using Fortran for scientific computing nowadays (apart from legacy issues, of course) is if you have no intention of ever using accelerators and absolutely must have optimal CPU performance. NVidia and, to a lesser extent, Intel/AMD could have maintained the language advantages Fortran offers in the accelerator space as well, but chose not to, and now you need solutions like this coupled with proprietary compilers just to approach CUDA C / OpenCL performance. Legacy projects will be around for decades, of course, but for new things I think C++ is/will increasingly be the default choice.
Julia has strong numerical and multi-dimensional array support, but it's JIT compiled. However, once a function is JITted, it has comparable performance to Go & Rust.
Yep, Julia is a very nice language with very good performances. But I abandoned it because it is not (yet?) oriented toward library development, and more usable in exploratory style.
I ran into issues a year ago when trying to organize a library with more than 8000 loc, when the "dump everything into a global namespace" started to be more a hassle and less a nice way to get stuff done. I am waiting for it to stabilize a bit before giving a second try.
Could you elaborate on why you felt the need to "dump everything into the global namespace"? For quite a long time now, you can use modules in Julia to keep things in separate namespaces. With "import MyModule" (instead of "using MyModule") things will not enter the global namespace, but will need to be qualified, like "MyModule.myfunction(...)". I feel this is very much like Python. You could even introduce abbreviations for the modules, using "const mm = MyModule" (admittedly, Python's "import...as" is cleaner).
Yes, but `using MyModule` is (was?) the default and preferred way to use a module. Which is equivalent to recommend everyone to do `from module import *` in Python.
Not really. They're just written in a way such that the memory layout is compatible with BLAS libraries — which are frequently written in Fortran. So they can be passed directly to functions that happen to have been written in Fortran.
Additionally, invoking BLAS/LAPACK only happens for dense arrays of specific types (single/double precision real/complex numbers). The great thing in Julia is that almost all of the array and linear algebra functionality is implemented completely generically, so that algorithms can be written using these higher-level concepts, and the low-level functionality will (very efficiently) dispatch to BLAS/LAPACK if possible; if not, the generic implementations still allow your algorithm to work for e.g. your user-defined number type.
haha yeah I guess i should have been more precise with my wording. I'm currently prototyping a custom number system in julia, and obviously, the multidimensional array ops will not be backended in BLAS.
I haven't played with Ada arrays much, but I know it supports various different representations (so it can be binary compatible with Fortran column-first arrays and C row-first arrays), it's got nice array slicing semantics, it's got good support for generics which can operate on arbitrary sized arrays.
I don't know whether it has Fortran-like aliasing guarantees, though. You can't take an address of an array element unless the array has been declared to support it, which helps, but I don't know whether you can e.g. pass the same array by mutable reference into a function twice. (It makes sense that this isn't supported, as it's kinda meaningless, but I can't find an actual reference.)
Also check out ParaSail language designed for easy, safe parallel and concurrent programming. It's an Ada variant from Tucker Taft at AdaCore. Might have something interesting on this topic. Might not.
The Julia programming language which is gaining a lot of popularity in scientific computation has in fact multidimensional arrays which are very efficient. The whole idea of Julia is to be comparable to C and Fortran in speed while having the ease of development of Python.
I'd dabbled in Go and Rust as well but don't see them as serious competition to Julia in this space. As much as I love Go, it is really poorly suited for numerical computing. No operator overloading of any sort among other things. You really want that for vector based math.
Also Rust, seems like more of a language for software developer professionals. I think Rust looks neat, but it is too complicated for scientists I think. They don't want to focus that much on their language.
In this respect I'd actually think Swift has more appeal as it is quite fast, but easier to use. By easy, I mean there are fewer concepts to learn.
But Swift will have the same problems as Go and Rust, in that it is not being designed with the needs of people doing scientific computing in mind.
That is what is nice with Julia. Out of the box it does so many of the things you want. Right out of the box without explicitly importing anything you can start doing advance matrix operations, load large CSV files, groks imaginary numbers. It actually has a number system which distinguishes between rational and irrational numbers e.g.
What facility is Rust lacking that prevents you from coding this as an extension/package rather than needing core coding support?
That sort of thinking is what drives number-crunching people away from a language. You end up with ten different representations for matrices, and can't pass matrices from one package to another.
If the numerics people can't agree on something, throwing the decision over to the core Rust folks is unlikely to result in something better.
So, the question still stands: What in Rust prevents an ambitious number cruncher with good design taste from adding the appropriate pieces to the language?
Fortran has 1st class many-dimensional arrays; can you really do that efficiently (as in "works great with a loop nest optimizer/LNO" and "has pretty syntax" and "there's one way to do it") as a Rust extension? C++ says "NOOOOOOO!!!" I don't know Rust well enough to answer the question.
You can do `arr[(1,2,3)]` in Rust (not pretty, but ok). Assuming you've marked the indexers as inline, LLVM should be able to optimize it pretty well. If it doesn't, optimizations can be added.
Rust can't enforce "one way to do it" since this would be done as a library. Currently there's only one library offering this (ndarray), so in a sense there's only one way to do it :)
What does arr[(1,2,3)] have to do with what I asked for? The code that I write for my personal scientific stuff (numerical hydrodynamics) needs loop jamming/fusion and splitting, SIMD without having to use any special notation, and other stuff which is typically provided by compiler LNO. I happen to have interchanged loops in my code, but if I goofed that up, it'd be nice for the compiler to do it for me.
No array slices needed. Plain Fortran 77 notation provides what I need.
Thanks for trying to be helpful. The place where C and C++ goes wrong is multi-dimensional arrays. Some googling tells me that narray gives Rust syntax like foo[[1,2,3]] for a 3 dimensional array. I have no guess if this optimizes well or not.
It will compile down to `foo[1][2][3]` and get optimized the same way. Rust additionally does hint a lot more about aliasability to LLVM (everything is restrict by default, basically), so there probably is more scope for optimization here.
This is already true in the Fortran world. When I was a student I was working with some condensed matter physics software and there was all kinds of madness with flipping conventions in there.
And Rust doesn't have ten different ways to do n-dimensional arrays. There's one major library out there (ndarray). I just learned about a second (https://github.com/Luthaf/mudi) which seems to use the same conventions.
There are multiple ways to do matrices, but these are in graphics-oriented libraries.
Well, it can work if one package gets dominant - e.g. the Python ecosystem works quite well with having all this outside of core language in the numpy package.
Nothing prevents a person from doing that, but that isn't what folks doing scientific computing are interested in, generally speaking. Some are, sure, but in aggregate as a matter of what the "day to day" is, it's just not of much interest.
I mean, maybe I was an outlier, but when I was in academia I wanted to have my models compute as accurately and quickly as possible, and I generally didn't have much time for or interest in the computer coding beyond the necessary implementation details required to conduct my research.
Telling me "implement an extension", then, would have made me roll my eyes and go back to fortran.
I don't think the idea is necessarily that you personally implement an extension, but that "someone" needs to implement it and the bar for "someone" to do so is lower if it can be done via extensions or libraries rather than having to be done in the core. That someone should share their work, of course, and that someone may find they have a feature or two they have to get into core, but even that's easier than sticking a whole whackload of changes into core.
It would be nice to have access to an overloadable index operator that works on multiple dimensions at once (like C#'s [1, 2, 5] syntax). It's sometimes not easy to write something to be efficient if it has to go one-by-one like [1][2][5].
One of the nice things about D is that it offers multidimensional indexing. For instance, you can write x[1, 2..6], and you can do it for user defined structs. In my case it makes it really nice for interfacing with R.
Fortran is also better at complex numbers. In Fortran these are a native type with every bit as much support as for real floating point types. Not so in the top competing languages (C/C++).
Last time I looked at C, the libraries had yet to be ported to C99. You had to choose between z = u + 2 * i * w, or zs = fft(ws); you couldn't have both. Therefore Fortran kept winning.
It is kind of a "too little, too late" problem. C added complex numbers a decade after C++ spun off, so the two languages have different approaches. And C99 complex number support was slow to be added to compilers. On the UNIX side, it was never added to Solaris 9 (EOL 2014); Microsoft finally added Visual Studio support in 2013. And even if your compiler supported it, code generation quality has been historically bad. Some things were never added to the standard, like printf support, so complex double is still a lesser native type than double.
In C, I've seen people do "multidimensional arrays" by A[n * i + j], with A a double[m * n], rather than by arrays of arrays. Especially if the array is being passed as a parameter, and the array's dimensions are given by other arguments, and the array is to be used with BLAS.
Is the compiler smart enough to do strength reduction and get rid of the multiply when indexing? FORTRAN compilers have been doing that for multidimensional arrays since 1954.
Guy Steele and his group had a really interesting thing going with the language Fortress (name is a nod to Fortran of course, but fortified). It was meant for high productivity as well as high speed (petaflops range). It got shelved after Oracle bought Sun. It was one of the three languages seeded by DARPA funding for high productivity computing
It is simply due to apprenticeship. Physicists do research, and their code is research code. Like most other research, the research code is maintained and used in-house. As most in-house non-shared code (in any language), it is impenetrable unless you take apprenticeship. The masters are typically old and FORTRAN is their language and it is the only choice for the apprentices. Once the apprentices graduate and become masters themselves, they further developed their own in-house code and it is inevitably in FORTRAN too. As any over-developed code, it is too expensive to migrate even when the new master recognize the shortcoming. The bottom line, FORTRAN is not bad once you are familiar with (and invested in) it, so FORTRAN persists over another generation of apprentices ...
However, there is slight change. The new generation of masters often do know other languages, so they sometime tolerate their new apprentices to use other language and still able to teach them. So at this point, by far, not all physicists use FORTRAN.
FORTRAN will die out, just takes time.
One of the sticking point is the library BLAS/LAPACK. It is in FORTRAN. Or more precisely, its interface is in FORTRAN. I don't foresee this will change for a long time. So FORTRAN will be like COBOL, it will persist even when no-one really use it anymore.
I think that's only party true. Physicists would probably not be exposed to Fortran if it wasn't for "apprenticeship". But the same would be true for any other very domain-specific language (let's say Mathematica). However, that doesn't change the fact that Fortran (in its modern syntax) is a great language for numerics. Its abstractions are extremely well-suited for any computation focusing on multidimensional arrays. It is certainly a more expressive language for numerics than C/C++ (and it's not as easy to shoot yourself in the foot in Fortran). There's not that many other languages that can compete for high performance applications. Julia looks like it can do some interesting things. Python is great as a glue language (but not for number crunching, and of course it's slow). I would still (and regularly do) choose Fortran for new projects. It's not going away anytime soon.
First, if one is familiar with a language or invested in a language, he will be biased to choose that language and it makes sense for him. Before you would argue, I admit I can't prove that you are biased, but nor could you prove you are not biased. So let's agree that some anecdotes like your case is not a strong argument.
So let's look at it in another angle. If physicist/people choose FORTRAN not merely due to apprenticeship, it necessarily means that there will be people choose FORTRAN even when his master do not introduce/require him to use FORTRAN. In another word, we need examples of people who choose FORTRAN despite of non-familiarity to FORTRAN. There are plenty of such cases for, e.g. rust. For me at least, I know none such. How about you? For example, how successful have you been influence other people (other than your student) to use FORTRAN? On the other hand, I do know people who choose other language despite of the familiarity of FORTRAN.
Of course knowing a programming language strongly biases someone to choose that language in a project. I also agree you will not find a lot of people (if any) using Fortran who have not been introduced to the language as part of their research, following the recommendation of someone senior (e.g. their advisor). However your original argument was that Fortran was only used because advisors were forcing their students to do so, and that Fortran would die out eventually along with those dinosaurs. I don't think this is true, and I don't think the fact that people generally don't just stumble into Fortran by themselves is any evidence to the contrary. As the original article points out, the reason that Fortran continues to be used is that it is the most suitable language for its domain (but of course not as a general purpose language)
The prerequisite for using Fortran is that someone works in a field that relies heavily on large scale, high performance numerical computation. Since Fortran is pervasive in that domain, it's extremely likely the they would be exposed to Fortran through "apprenticeship". Basically, most people who should be using Fortran are using Fortran! I've seen a few instances where people used C/C++ or Matlab in situations where Fortran would have been more appropriate. Matlab has its own drawbacks, but it is also a very domain-specific numerical language, and likewise its users are exposed to it through apprenticeship! C/C++ is still the default language for high performance computing outside of numerics, and thus it's not surprising that people who have not been exposed to Fortran would attempt to use it for numerical purposes as well.
There isn't exactly a lot of choice for numerical programming: Python and Matlab are (partly) designed with numerical purposes in mind, but don't scale to high performance. C/C++ are performant, but not expressive for numerics. There really isn't any other language that would even be in consideration, besides now Julia, but Julia is extremely new, so time will tell.
I don't know anything about Rust, but based on reading its Wikipedia article its not a language designed for numerics, but basically something designed to replace C++ for systems programming. Thus, like C/C++, it will be performant, but not expressive for numerical purposes.
It is not animosity -- I love the simplicity of FORTRAN -- but I am stating my speculation.
C replaces FORTRAN. That is from merit point of view. In reality, computational physicists mostly goes to C++ plus Python. Truth to be told, physicists are not computer scientists and they are not even into it. They simply picked out of necessity/convenience -- which was the reason FORTRAN was picked then.
PS: That FORTRAN is fast is over stated IMHO. I am familiar with high performance computing. In terms of parallel computing, it all boils down to algorithm design. In terms of vector/linear algebra, it all depends on machine tuned BLAS (which is mostly due to hand crafting rather than the language itself). There is no evidence FORTRAN code runs necessarily faster than C -- comparing first craft non-optimized code. Unlike C, FORTRAN is less capable and more cumbersome to do trendy stuff -- this is the actual merit of FORTRAN over C.
From my point of view, it's understated, and should be given much more credit than it gets. intel has done a lot of work to make their ifort compiler leverage their latest instruction sets like AVX2, resulting in even bigger speedups, and PGI has done absolutely amazing work of writing a compiler which can ingest unmodified Fortran code and compile it to run on NVidia GPU's. It's nothing short of mind blowing to me and gets way too little mention. Modern Fortran is a really nice language and all this bashing of it because it's not trendy is completely unfounded. Just ask intel.
I believe that one of Fortran's biggest problem is that if you want to get all the advantages that modern Fortran has to offer you have to use closed source commercial tools. If the open source tools had been closer to what PGI and Intel offers in terms of features and performance I don't doubt that Fortran would be more popular.
> Interestingly, C/C++ beats Fortran on all but two of the benchmarks, although they are fairly close on most.
Unless I missed something, this directly contradicts the quotes benchmark source[1], where C beat Fortran on all tests except one (where they were tied), and usually by a wide margin. Plus I believe gcc usually produces slower code than Intel's compilers (at least for C), so it's not clear that C-gcc and Fortran-Intel is a fair comparison.
But beyond that, I'm always confused as to why there is a significant difference. I would expect two compiled languages with reasonable compilers to perform almost exactly the same when they're doing nothing but arithmetic operations, since those are almost entirely determined by the CPU, not the cleverness or speed of the standard library algorithms. Can anyone enlighten me?
There are some important things to note about these benchmarks. On the Mandelbrot set, I noticed that the program does not use arrays despite the obvious 2D-array structure of the bitmap. Two of the four cores were also underutilized. Next, the k-nucleotide, reverse-complement, and fasta programs only utilized a single core, compared to the C programs utilizing multiple cores. It is not clear to me why this should be the case, but it leads me to cast doubt on the tests I listed.
I have cut back: x64 single core & x86 quad core are no longer measured, x86 single core updates are not frequent, and x64 quad core updates are less frequent than they used to be (although more frequent than the chart dates might suggest).
I've kind-of been hoping the hardware would fail -- but someone would probably gift me a new machine :-)
From my perspective "defence of the benchmarks game" is a matter of saying "Keep it in perspective" and "Keep it in context" until we do ;-)
Then maybe b is equal to a + 3, so your compiled code has to assume whenever it writes to an element of a, it might have changed an element of b, and so reload b from memory. Alternatively, it can do a bunch of tests at the start of the function, and compile multiple copies of the function for different situations.
In FORTRAN, this isn't possible (you don't tend to pass around raw pointers), so it is much easier for the compiler to reason about what objects might be in the same place, and which aren't.
The restrict keyword goes some way to fixing this in C, but is tricky yo use correctly, and also doesn't solve all problems.
Your Fortran summary is incorrect. F77-style arrays are generally passed as a raw pointer (under the hood.) F90-style allocatable arrays are passed as an array descriptor containing size information, again under the hood.
This doesn't affect aliasing in Fortran at all. Aliasing is simply prohibited, which leaves the compiler free to generate code that gets the wrong answer, if the programmer screws up and has actual aliasing at run-time.
Apologises, you are right. It's so long since I did Fortran I had misremembered that the language made it impossible to alias, as opposed to it being undefined behaviour.
mypair* restrict position = &object->pair;
float* restrict position_x = &position->x;
Is undefined behaviour, as you can't take a restricted pointer to a child of a restricted pointer in the same scope (the rule isn't quite that simple, but that's part of it). In general, rules about pointers to pointers get complicated, and I've had trouble with them in the past.
In general, it's a good idea to try to only use restrict for "leaf" pointers, not intermediate objects, but that does block some optimisations.
There is also the obvious problem of people getting carried away, and marking something 'restrict' where they shouldn't have.
While 'restrict' is very easy to use on simple structures (like just int*), if you look around you'll find very little, if any, material about using it for complicated nested data structures.
There's also the fact that restrict is only a promise to the compiler "this function will not be called with aliased arguments", not also a command "do not allow people to call this function with aliased arguments".
As far as I know Fortran, at least in its original form, enforced the latter (that was fairly easy to do, as it didn't have dynamic memory allocation, pointers or recursion (the compiler determined the location in memory of all variables))
Neither of your options is the actual rule. The actual rule is that when the program runs, there can't be any runtime aliasing.
You can totally call a subroutine with several arguments being the same thing, as long as you don't do anything bad inside the subroutine.
I'm aware of this because I tried to get the PathScale compiler team to add a warning for incorrect aliasing. That wasn't possible. But I did get a flag which made the compiler obey C aliasing rules for Fortran, and it was part of the "got the wrong answer? run with these flags and if you get the correct answer, your program is not standard-conforming" thingie.
Please educate me and explain how that contradict anything I claimed, as I don't see it.
I claim it is only a promise, you claim you can call it with identical arguments. I don't see a contradiction; making a promise isn't the same as keeping it.
Similarly, your argument that it is fine to break that promise as long as you don't do anything bad inside the subroutine, I read as "you sometimes can get away with breaking a promise", but again, that's not the same as not making the promise.
"An object that is accessed through a restrict-qualified pointer has a special association with that pointer. This association, defined in 6.7.3.1 below, requires that all accesses to that object use, directly or indirectly, the value of that particular pointer."
So, it hinges on the meaning of 'accesses'; it is a runtime thing, not a compile-time thing.
> I would expect two compiled languages [...] to perform almost exactly the same when they're doing nothing but arithmetic operations [...]
Different languages are going to have different memory models which is going to result is vastly different performance. E.g. a GC vs reference counting vs linear types. If you are doing arithmetic on something that needs to be managed (e.g. an array), you'll see differences.
Some language features close the door for optimizations. E.g. the ability to dynamically add or remove methods to an object/class can make method dispatch more or less expensive.
Finally, some languages have undefined behavior (e.g. what happens when there's an overflow or a division by zero) while other languages want specific behavior for the edge cases. This all affects the performance of the resulting code, even for basic arithmetic operations.
In this case though, C doesn't have garbage collection or OO features, and (I believe) has unspecified behavior for arithmetic exceptions. I assume Fortran does the same (since it's comparably fast), so I guess I still don't really see where people expect the difference to come from.
C and Fortran both obey IEEE 754 and have well-defined arithmetic exceptions on most platforms -- but programmers usually instruct the compilers to ignore all that in the interests of faster code.
As for what the differences are, try talking to a compiler engineer. The better ones love how Fortran features like the argument aliasing rule for 100% of code makes generating fast code easier. In fact that's one of my tests for compiler engineers in interviews; if they've worked on a C/C++/Fortran compiler and can't say anything good about the challenges and opportunities of Fortran optimization relative to C/C++, then they haven't really drilled down on performance issues!
Fortran hasn't been all caps for nearly 30 years now...
It is very easy for a physical scientist to write Fortran that performs well - the syntax looks very similar to the maths that is being modeled. It is a lot harder for a scientist to write tight C code.
Historically, C compilers just couldn't do such a good job at optimising, because C is pretty complex and has loads of undefined behaviour. Also, Fortran compilers have always been optimised for numerical computation.
Recently * , a lot of effort has gone into making C compilers produce very optimised code. In general, this is a good idea because humans are bad at optimising and also remembering every damned Intel vectorisation instruction.
So historically, Fortran was much faster, recently * with compiler advances in e.g. reordering and AMD64, it's possible C compilers beat Fortran compilers.
* edit: adjust your definition of recently to encompass the entire timeframe of Fortran's existance
C can be faster than Fortan if you optimize it so. Fortran is built to run at very close to the max C speed for these things out of the box (as I understand it after doing some extensive googling a while back).
> I would expect two compiled languages with reasonable compilers to perform almost exactly the same when they're doing nothing but arithmetic operations
Auto-vectorization? These implementations vary wildly between GCC and Intel.
When I was doing my graduate studies in astrodynamics (spacecraft trajectory design) at Purdue University circa 2003, we worked closely with the Jet Propulsion Lab (JPL) and got to use a lot of their excellent astrodynamics code (used for missions like Galileo and Cassini). It was all Fortran.
I heard that there was an effort at JPL to convert over to using Python. I'm not sure how that went. Maybe they were just writing Python bindings for their Fortran code? I know that much of SciPy consists of Python bindings for Fortran and C code [1].
The software I used to do "nonlinear programming" (i.e. parameter optimization), called NPOPT, was also written in Fortran.
The Python replacement succeeded - I slightly know the people involved, who got an award for the work. But I don't use the library so I don't know if the wrapper approach was used. That would be perfectly legit IMHO.
You didn't read the article apparently. Fortran (which officially dropped all caps in the standard in 1990) has had new versions every couple years, up to 2013. Each revision of the language brings modern features into it, and old features go through a process of deprecation and then obsolescence. Most people seem to be unaware that it is an evolving language that has changed significantly since the standard most people are familiar with from 1977. I am a bit biased though: I work on a Fortran compiler, so I've had reason to stay on top of the standards as they come out.
In the 70s, when Fortran 66 still ruled the night, the usual suspects at Bell Labs wrote Ratfor, which was pretty much what Coffeescript did to Javascript, bending the syntax towards C.
When you compare the current Fortran standards to its '66 version, you'll see how much progress it made.
And I hope that you can actually use it these days. Round the turn of the millennium, it was still quite common for people to use Fortran 77 because that's all that was supported or fit in the general codebase.
The author is I think being a bit unfair to C++ when it comes to matrix/array handling. Their example of things that would be difficult to do in C/C++:
A = B
A = 3.24 * B
C = A * B
B = exp(A)
norm = sqrt(sum(A * * 2))
Quoting:
"Similar C/C++ code simply does not exist ... Having to use libraries instead of intrinsic functions means the resulting code is never as neat, as transferable, or as easy to learn."
While this statement is true, I think strongly undersells how well these kinds of libraries can be "baked" into C++ through operator overloading, templating, etc. See for example, the Eigen library [http://eigen.tuxfamily.org/].
Not to mention all of that array behavior is available on the standard C++ library with std::valarray<T>, e.g., https://godbolt.org/g/JBSvuH. There was talk of a more thorough standard multi-dimensional array type, but I have no idea what's the status on that.
std::valarray does not have multidimensional support at all. In fact, std::valarray is a relatively abandoned part of the standard library. You'll have to go with one of the 3rd party libraries for better support.
My point was not so much that valarray is awesome, but that articles like this one tend to fight a very strawman version of C++, even when just sticking to strictly standard components.
If anything, saying that C++ has no support for linear algebra is giving it too much credit. C++ has an abundance of libraries that are broken in various strange and marvelous ways. Typically, you find out about the brokenness only after you made the investment of learning the library and put it into production, so that there is no way back. Your comment about valarray is a good example of this.
That's a criticism I'm entirely on board with, and is not limited to linear algebra libraries. HTTP client libraries are worse, there's not a single good one.
here's the problem: These kind of libraries will never come close touching Fortran's built-in features, simply because the compiler can't automatically optimise for knowledge about the code that only the library has. Even arrays not knowing their size is still an issue, let alone array based ops like shown in the article.
Fortran is not the fastest language around the block, but if you don't want to go into non-portable machine-based hand-tuned optimisations (e.g. loop unrolling) it's hard to beat. Basically, if you optimise for both runtime and programmer time for HPC applications I'd argue that Fortran is top of the crop.
Well, in the case of Eigen it's an entirely header based library, so the compiler can optimize some things. Not to mention the template based meta-programming strategies [1] Eigen itself uses to simplify and optimize during compilation (including vectorization and loop unrolling). The fact that this kind of infrastructure can be built up in a library in C++, i.e. it doesn't have to be baked into the compiler, is a huge plus I think relative to Fortran.
> If a function that takes a ‘real’ is fed a ‘const real’, it will return an error. It is easy to imagine how this can lead to problems with interoperability between codes.
The amount of existing scientific code in Fortran is huge and there's a chicken and egg situation (new code uses/extends old code so new code will also be in Fortran) that will lead to further increasing it!
It's funny that someone has to write an article to defend modern FORTRAN to people who write in JavaScript, which is the best proof we programmers have of entropy.
Yes but if you go into a discussion of Star Trek movies complaining loudly about how bad you think the new Star Wars is, you're completely irrelevant at best.
Yes, and what does that have to do with this article? If you come to a discussion about thing x to complain about thing y, no matter how valid your complaint you are out of order.
To quote The Dude: "You're not wrong, you're just an asshole."
Yes, and what does that have to do with this article?
Excellent question, glad you asked! Just what do you think all the people who are asking why Fortran is still in use are using?
My money is on the bet that they're not using Fortran. I've worked in the finite element analysis / high performance computing fields for quite a few years and I've never heard anyone who used Fortran complain that they were, or had to use it. Could it be that it's because people like that they get high performance from a language out of the box?
Because JavaScript is not very well thought out, by the JavaScript author's own admission. If the father of the language says he just hacked it together, there isn't anything to argue about.
Even if you are using python, R or java you are still best off linking to matrix subroutines in fortran if you want to solve or diagonalize, get eigenvalues and such. This is not just about raw performance but also correctness, ecology of roundoff errors, etc.
For Python, at least, the most popular numeric computation library (NumPy) uses a fair bit of Fortran under the hood. I'm not certain whether the matrix operations do, but it certainly wouldn't surprise me.
The vast majority of BLAS and LAPACK are written in Fortran (at least, the ones provided by netlib.org). Github also reports that OpenBLAS is around 50% Fortran, so there's quite a bit there too [1]. Usually when you call BLAS functions from C you're really calling a cBLAS wrapper to some Fortran function. This is generally why you see -lgfortran -lf77blas when you compile programs that require numerical tools.
At least in OpenBLAS, most (all?) of the BLAS routines are C functions that call kernels written in assembly, but many of the LAPACK routines are the Netlib reference Fortran versions.
Yet, I root for D and its GLAS library [0]. It is a young project, but it beats the usual suspects for single-threaded matrix-multiplication. Stuff like "solve or diagonalize, get eigenvalues and such" is not written yet. It is very promising for a one-man project, though. We'll see in 10 years, I guess.
Fortran certainly has a mythical reputation for performance, but it should be challenged every once in a while.
As a physicist please have faith in us. We don't all use Fortran. I use Ruby as much as possible and it's fun because it forces me to write libraries that mimic ones that are found in other languages. So I end up learning a lot.
When I was in undergrad I did some computer simulation work for my professor. He programmed in FORTRAN while I had been trained in C/C++. I did my simulations in C/C++ and I also wrote some libraries for his FORTRAN code so he could have more dynamic configuration files. At the time I thought that FORTRAN was an anachronism and was surprised that it was still being used in physics.
I've been looking for a Matlab alternative and been disappointed with either the performance or the syntax of Octave, Python, and Julia. I'd known Fortran had modern features and updates every few years, but never tried to pick it up since the excitement seems to be elsewhere. This article is convincing that Fortran is worth trying out.
If you want to get down into the nuts and bolts of numerical computation, with convenient concurrency, you should definitely look into Fortran + OpenMP.
I know a person who works in atmospheric physics group. All the heavy physics simulation and modeling happens in Fortran, and then they do the "lighter" analysis, summary statistics, plots, visualizations, etc with a Python-based stack.
If I understand, your solution, as it is, would still be wrong to be passed to the Fortran code as in Fortran the column index is faster changing. And if you would blindly translate some FORTRAN code to C with the above arrays, even if you'd get the same results the C one would be slower because of that, as the FORTRAN code could have been written with the knowledge of which index arrangement is faster.
Practically you are right, but the article is more right than you. I got into some fights with quite a few elite teachers/coders teaching the way of doing 2D arrays exactly as described, and they wouldn't agree the method you describe is less error prone ... because: it is tradition. (And as every person doing research know tradition is not supposed to change and that what science values).
It seems absurd, but so many things are absurd these days I am not even surprised.
Oh, the method I describe is more error prone than array of arrays, because if you ever modify the two variables you used to specify the array size, the compiler will just use the new values for array dimensions. If I had to do array of arrays though, I'd at least do something like
template<typename T>
T** make_2d_array(size_t m, size_t n) {
// TODO: add padding for alignment,
// but you're unlikely to need something aligned
// on a larger extent than sizeof(void*)
void *memory = malloc(m * sizeof(void*) + m * n * sizeof(T));
T** array = (T**)memory;
T* data = (T*)((char*)memory + m * sizeof(void*));
for (size_t i = 0; i < m; i++) {
array[i] = data + i * n;
}
return array;
}
And you can then delete the entire array with a single free().
Except that as of C11 variable length arrays (VLAs) are optional. So with a fully standard compliant C11 compiler your code could potentially fail. While the old way of using arrays of arrays will work with any C compiler.
(Not that important if you use a pure C compiler, but your code will also fail if you use a C++ compiler.)
My reason for using fortran (when I was studying physics) was that when going through the university library looking for a computational physics textbook, I couldn't find one that wasn't in fortran—actually I found one (computational physics, Giordano & Nakanishi) but it wasn't enough to get by. Fortunately I already knew fortran, because all the legacy textbooks+journals were written in the language.
some of the information here is a bit wrong. the idea that const is weaker than parameter is true, but not for the reasons given. any function taking a float, int etc. in C++ can also take a const float or const int.
its also unfair to suggest that C/C++ is as fault with not allowing a = b * c with array types. this is a failing of the standard library and easily fixed... operator overloading lets you create exactly this functionality, and std::valarray already does this for a lot of what you get in Fortran...
rolling your own is pretty easy, it is a shame that the C++ standard library is a dire wasteland of bad implementations. you are right that valarray will do heap allocations and making it work on the stack is a real ballache.
It seems like a number of R packages use Fortran. I'm on Mac and it needs to Fortran compiler to be installed so that at least some of the R packages can be compiled when doing updates.
Here are some resources I have collected over the years.
"Introduction to Modern Fortran" (Fortran 2003): http://people.ds.cam.ac.uk/nmm1/Fortran/index.html
Fortran Wiki: http://fortranwiki.org/fortran/show/HomePage
List of Fortran tutorials: http://www.fortran.com/the-fortran-company-homepage/fortran-...
Fortran vs. C: http://www-cs-students.stanford.edu/~blynn/c/fortran.html
Fortran's influence on C's "restrict" keyword: http://www.drdobbs.com/the-new-c-it-all-began-with-fortran/1...