> Note that no combination of any of the faster implementations of Python + Numpy libraries has ever been used at the most demanding level of scientific computation. That has always been Fortran, with some C and C++, and now Julia.
This still seems like an overstatement, but maybe it depends on what you mean by "most demanding level." I work on systems for the Rubin Observatory, which is going to be the largest astronomical survey by a lot. There's a bunch of C++ certainly, but heaps of Python. For example, catalog simulation (https://www.lsst.org/scientists/simulations/catsim) is pretty much entirely in Python.
Maybe this isn't the "most demanding" but I don't really know why.
> But if A and B are numpy arrays, then A + B will calculate the elementwise sum on a single core only, correct? It will vectorize, but not parallelize.
By a large-scale calculation I have in mind something like this: https://arxiv.org/pdf/2006.09368.pdf, which is in your field of astronomy. It uses about a billion dark-matter elements and was run on the Cobra supercomputer at Max Planck, which has about 137,000 CPU cores. It used the AREPO code, which is a C program that uses MPI. If you know of any calculation in this class using Python I would be interested to hear about it. But generally one doesn’t have one’s team write a proposal for time on a national supercomputing center and then, if it is approved, when your 100-hour slot is finally scheduled, upload a Python script to the cluster. But strange things happen.
This still seems like an overstatement, but maybe it depends on what you mean by "most demanding level." I work on systems for the Rubin Observatory, which is going to be the largest astronomical survey by a lot. There's a bunch of C++ certainly, but heaps of Python. For example, catalog simulation (https://www.lsst.org/scientists/simulations/catsim) is pretty much entirely in Python.
Take a look at `lsst/imsim`, for example, from the Dark Energy collaboration at LSST: https://github.com/LSSTDESC/imSim.
Maybe this isn't the "most demanding" but I don't really know why.
> But if A and B are numpy arrays, then A + B will calculate the elementwise sum on a single core only, correct? It will vectorize, but not parallelize.
That's correct, but numba will parallelize the computation for you (https://numba.pydata.org/numba-doc/latest/user/parallel.html). It's pretty common to use numba's parallelization when relevant.