The counterintuitive rise of Python in scientific computing (2020)

photochemsyn · on March 26, 2022

Python displaced a lot of very expensive proprietary software in the biosciences arena. Ease of use was also a major factor, as many bioscientists have relatively little background in programming, but the ability to escape the world of expensive restrictive software licenses was very attractive to the scientific community, whose historical norms emphasize the open sharing of methods and results:

> "A program that performs a useful task can (and, arguably, should) be distributed to other scientists, who can then integrate it with their own code. Free software licenses facilitate this type of collaboration, and explicitly encourage individuals to enhance and share their programs. This flexibility and ease of collaborating allows scientists to develop software relatively quickly, so they can spend more time integrating and mining, rather than simply processing, their data."

https://journals.plos.org/ploscompbiol/article?id=10.1371/jo...

Now there isn't any area of molecular biology and biochemistry that doesn't have a host of Python libraries available to assist researchers with tasks like designing PCR strategies or searching for nearest matches on up to x-ray crystallography of proteins.

derbOac · on March 26, 2022

C, C++, Fortran are still used, most Python users just don't see it because it's hidden away underneath the calling function.

I've been surprised by the rise of Python in some ways although not at all in others. Languages like C, C++, Fortran, and dare I say it Rust are too low-level in their raw state for numerical computing. You had the US federal government funding language competitions because of this (see: Chapel). Languages like Python and R (and before that things like Lisp) came along and gave people a taste of something different, and it's obvious what people migrated to.

Part of it is timing: multivariate computational statistics (ML/data science/DL/whatever you want to call it) just sort of started taking off in computer science communities before LLVM-based languages like Julia or Nim could get a foothold. OCaml might have fit that niche but never got there because of a desire to take a different path, or take the path more slowly.

So people looked for a nice expressive language, found it in Python, and buried all the messy stuff behind wrapper functions and called it a day. It was furthered along by Matlab being another comparison on the other side -- Python looks kludgy compared to modern Fortran or C, but not compared to Matlab.

All that wrapper time in Python has its costs, so I suspect as limits get pushed further we'll eventually see a migration to something else like Julia or Nim, or something else not on anyone's radar.

One moral to this story is that expressiveness matters. People will go out of their way to avoid talking directly to machines at a low level.

ip26 · on March 26, 2022

People will go out of their way to avoid talking directly to machines at a low level

I would put it differently. At 30 bugs per kLOC, I'd prefer my codebase expresses a problem & it's solution- and as little below that level as possible.

Each well-vetted layer of abstraction between a scientific programmer and the machine's low level interface eliminates whole classes of bugs that are irrelevant to the problem that user is actually working on.

DasIch · on March 26, 2022

> At 30 bugs per kLOC[...]

Where does that number come from?

ip26 · on March 26, 2022

It’s a median-ish of various studies. You can google “bugs per 1000 lines of code”.

The important part wasn’t the exact number, but rather the discovery that the ratio is pretty stable.

marcosdumay · on March 27, 2022

> but rather the discovery that the ratio is pretty stable

The thing is, it isn't stable. It just doesn't depend on the language, what is very surprising. But it varies enormously from one study to another and, AFAIK, nobody has a good set of factors explaining it.

fauigerzigerk · on March 27, 2022

I don't find it that surprising. I think what programming languages (and styles) do is fill up each line of code with information until a roughly constant level of cognitive effort is required to process that line.

At that constant level of effort, we make a certain constant number of mistakes. And that's what I think these studies show.

marcosdumay · on March 27, 2022

Some languages are very dense, others break things down in more lines. Some languages care about hard to control details of your computer's working, others handle that automatically. Some languages come with builtin validators, others let you write any kind of trash and try to make sense of it.

Personally, I suspect the number of bugs per line is defined by social and psychological factors, and what changes from one language to the other is the amount of effort one has to put into testing and debugging. But well, none of this is obvious to me.

arthurcolle · on March 26, 2022

The context suggests to me that this was a self-reported approximation from GP

catlifeonmars · on March 26, 2022

I believe an appropriate term for that low level in this context is undifferentiated lifting

nneonneo · on March 26, 2022

> Python looks kludgy compared to modern Fortran or C

I’m not sure I can agree with this. Both Python and Matlab provide very nice, high level ways to interact with multidimensional data using simple syntax. Under the hood, both will wind up using fast algorithms to implement the operations. C and Fortran require much more low-level considerations like manually managing memory, futzing with pointers or indices, and generally writing a lot more boilerplate code to shuffle data around.

Matlab, despite all its quirks, could probably have won if it was open source. It’s got a very long history of use in scientific computation and a large user base despite its high price.

auxym · on March 26, 2022

Matlab works fine for anything purely "numerical" but fails hard as soon as you need to do more "general computing". Just string handling for example. Or, as far as I know, it's still not possible to implement a custom CLI interface in a matlab script, like you would with argparse in python.

Matlab also historically was really bad for abstraction and code architecture in general. For example, the hard "1 function per file" rule, which encouraged people to not use functions at all, or if you really had to, write 2 or 3 really huge functions (in separate files). Only in recent years (the past 5 or 10 years) did matlab get OOP stuff (classes) and the option for multiple (private) functions in a single script file (still only one public/exported function is possible per file, because the file name is the function name and matlab uses path-based resolution).

leephillips · on March 26, 2022

Fortran does not require (nor has much available for) manual memory management, and its array syntax is more convenient than Numpy (and far more convenient than Python without Numpy), obviating any futzing around with pointers or indices.

pletnes · on March 26, 2022

Fortran is definitely much more high level than C, and way easier to write performant numerics in, than the C family languages.

I don’t follow why its array syntax is easier than python though. They’re mostly very similar and the numpy developers seem to come from a Fortran background.

zozbot234 · on March 26, 2022

FORTRAN is very high level, that's why you can write FORTRAN in any language.

cb321 · on March 26, 2022

You may know this, but since you always mentioned Nim & Julia together, it might confuse passers by. Nim does not, in fact, need LLVM (though there is a hobby side project using that). Mainline Nim compiles directly to C (or C++ or Javascript) and people even use it on embedded systems.

What seems to attract scientists is the REPL and/or notebook UI style/focus of Matlab/Mathematica/Python/Julia/R/... As projects migrate from exploratory to production, optimizing for interactivity becomes a burden -- whether it is Julia Time To First Plots or dynamic typing causing performance and stability/correctness problems in Python code or even just more careful unit tests. They are just very different mindsets - "show me an answer pronto" vs. "more care".

"Gradually typed" systems like Cython or Common Lisp's `declare` can sometimes ease the transition, but often it's a lot of work to move code from everything-is-a-generic-object to articulated types, and often exploratory code written by scientists is...really rough proof of concept stuff.

nextos · on March 26, 2022

The time to first plots in Julia is drastically lower now. And still, it was something you only paid once per session, due to JIT.

Julia is the first language I find truly pleasant to use in this domain. I am more than happy to pay a small initial JIT overhead in exchange for code that looks like Ruby but runs 1/2 the speed of decent C++.

Plus, lots of libraries are really high quality and composable. Python has exceptionally good libraries, but they tend to be big monoliths. This makes me feel Julia or something like Julia will win in the long run.

exdsq · on March 26, 2022

Julia runs 2x the speed of decent C++?!

nextos · on March 26, 2022

Sorry I meant 1/2 the speed or 2x the time, edited :)

Consider that BLAS written in pure Julia has very decent performance. If you are into numerical computing, you will quickly understand this is crazy.

Carefully written Julia tends to be surprisingly fast. Excessive allocations tend to be a bigger performance problem than raw speed. Of course excessive allocations eventually have an impact on speed as well. There are some idiomatic ways to avoid this.

leecommamichael · on March 26, 2022

Having taught a number of scientists both pre and post grad, I agree with your take on notebooks/REPLs. Data-scientists are not generalist programmers, in some cases, they are hardly more advanced than some plain end-users of operating systems. They shy away from the terminal, they have fuzzy mental models of how the machine operates.

Being a generalist programmer that sometimes deploys the work that data-scientists craft, I'd really like an environment for this that can compile to a static binary.

Having to compile a whole machine with all the right versions of shared libraries is a terrible experience.

derbOac · on March 26, 2022

That's a good point about Nim. Nim has a nice set of compilation targets, which I tend to forget.

You might be right about the REPL aspect of things. On the other hand, R took off with a pretty minimal REPL, and my first memories of Python didn't involve a REPL. I think as the runtime increases a REPL becomes less relevant, and it seems like most languages with significant numerical use eventually get a REPL/notebook style environment even if it wasn't there initially.

cb321 · on March 27, 2022

R had a REPL from day one (or at least near it) because the S it was copying did. You could save your "workspace" or "session" and so on. Just because it was spartan compared to Jupyter or just because that might be spartan compared to MathWorks' GUI for Matlab doesn't alter "waiting/Attention Deficit Disorder (ADD)" aspects.

When you are being exploratory even waiting a half second to a few seconds for a build is enough time for many brains to forget aspects/drift from why they pressed ENTER. When you are being careful, it is an acceptable cost for longer term correctness/stability/performance/readability by others. It's the transition from "write once, never think about it again" to "write for posterity, including maybe just oneself"..between "one-liners" and "formatted code". There are many ways to express it, but it underwrites most of the important "contextual optimizations" for users of all these software ecosystems - not just "speed/memory" optimization, but what they have to type/enter/do. It's only technical debt if you keep using it and often you don't know if/when that might happen. Otherewise it's more like "free money".

These mental modes are different enough that linked articles elsewhere here talk about typeA vs typeB data science. The very same person can be in either mode and context switch, but as with anything some people are better at/prefer one vs. the other mode. The population at large is bimodal enough (pun intended) that "hiring" often has no role for someone who can both do high level/science-y stuff and their own low-level support code. I once mentioned this to Travis Oliphant at a lunch and his response was "Yeah..Two different skill sets". It's just a person in the the valley between the two modes (or with coverage of both or able to switch "more easiliy" or "at all"). This is only one of many such valleys, but it's the relevant one for this thread. People in general are drawn away by modes and exemplars and that represents a big portion of "oversimplification in the wild".

This separation is new-ish. At the dawn of computing in the 50s..70s when FORTRAN ruled, to do scientific programming you had to learn to context switch or just be in the low-level work mode. Then computers got a million times faster and it became easier to have specialized roles/exploit more talent and build up ecosystems around that specialization.

FWIW, there was no single cause for Python adoption. I watched it languish through all of the 90s being largely viewed as too risky/illegitimate. Then in the early noughties a bunch of things happened all at once - Google blessing it right as Google itself took off, numpy/f2py/Pyrex/Cython (uniting rather than dividing like the soon after py2/py3 split), a critical mass of libs - not only scipy, but Mercurial etc., latterday deep learning toolkits like tensorflow/pytorch and the surrounding neural net hype, compared to Matlab/etc. generally low cost and simplicity of integration (command, string, file, network, etc. handling as well as graphics output) - right up until dependency graphs "got hard" (which they are now), driving Docker as a near necessity. These all kind of fed off each other in spite of many deep problems/shortcuts with CPython design that will cause trouble forever. So, today Python is a mess and getting worse which is why libs will stay monoliths as the easiest human way to fight the chaos energy.

Nim is not perfect, either. For a practicing scientist, there is probably not yet enough "this is already done for me with usage on StackOverflow as a one-liner", but the science ecosystem is growing [1]. and you can call in/out of Python/R. I mean, research statisticians still tell you that you need R since there is not enough in even Python...All software sucks. Some does suck less, though. I think Nim sucks less, but you should form your own opinions. [2]

[1] https://github.com/scinim/ [2] https://nim-lang.org/

Robotbeat · on March 26, 2022

It’s because Matlab (and Mathematica, etc) is proprietary, and therefore you always have to pay the Danegeld. So we use numpy instead because it’s extensible, it uses all the super fast C/C++/FORTRAN stuff on the backend, and is fairly easy to learn.

I actually still would prefer Matlab as the syntax is more compact and natural than numpy (which is like a matlabified Python), but that’s probably just due to more experience in Matlab.

kragen · on March 26, 2022

Octave is free software with Matlab syntax and Matlab-style interactivity (autoreload, etc.) I'm not a huge fan of the language (Matlab/Octave) but it certainly does make it quick to whip things up.

Robotbeat · on March 26, 2022

It sucks compared to Matlab, though. Unfortunately. (Sci lab is better, altho not compatible.) But I have also used it in a pinch. Size of the community means Matlab or Numpy are your best options. If you aren’t happy with Matlab due to cost or licensing stuff, numpy is really good. Also integrates with a lot of Python stuff like machine vision, machine learning, etc, which have expensive or nonexistent packages in Matlab.

kragen · on March 26, 2022

Interesting! What are the drawbacks of Octave?

ravel-bar-foo · on March 26, 2022

I used Octave for a year when my institution's Matlab license servers were being improperly administered. (I had a lot of project code written in Matlab, but the license server going down on the weekend before a conference deadline with nobody available to reboot it until Monday was a dealbreaker.) The biggest stumbling block was that Matlab has a huge and heavily used proprietary package library, and a lot of my existing code, (official) tutorials and Stack Overflow code assumed these libraries were available. In Octave I found myself reimplementing the newer parts of the Matlab image processing libraries. This led to the discovery that the Matlab and Octave builtins for handling image data are subtly different, so I ended up having to run tests in the code and write different conditional flows to make it cross-compatible. There are also subtle differences in basic behaviors (was it variable scoping? file handling?) which resulted in some surprise and frustration.

Following the licensing and Octave debacle, all my latest code is written in numpy.

kragen · on March 26, 2022

Thanks! I prefer Numpy myself.

sideshowb · on March 26, 2022

It feels like MATLAB? ;-)

kragen · on March 26, 2022

I mean relative to Matlab :)

pm90 · on March 26, 2022

Yep. When I was in grad school all the labs were furiously migrating away from matlab because of its costs and confusing licensing around running multiple replicas.

adgjlsfhk1 · on March 26, 2022

I'd definitely recommend checking out Julia for this usecase. You get code that looks pretty much like matlab, but which runs like fortran/C++. (Also there is very solid and fast interop with python, so you can call anything you need from the python side).

Robotbeat · on March 26, 2022

What does a Julia environment look like, in practice? Is it anything like the Matlab environment, where not online is there a console and integrated editor and super easy to use debugging/performance measurement, but also all the variables are visible in the GUI?

If so, I'd consider switching (as Matlab does that better than vanilla numpy). Julia is pretty great in theory. It is still a very new language for my uses, which means the documentation and community are orders of magnitude smaller than Matlab or Numpy/python.

SnooSuxx · on March 26, 2022

When I used Julia a few years ago the main environment was Juno https://junolab.org/

But it looks like they're switching to VSCode as a Julia IDE. Which should give you what you need, you just have to spend some time setting it all up.

ForHackernews · on March 26, 2022

Yes(-ish). You can use Julia in a Jupyter (the Ju- is Julia) notebook, just like Python. This is a pretty user-friendly experience for students, academics and data scientists.

https://towardsdatascience.com/how-to-best-use-julia-with-ju...

Robotbeat · on March 28, 2022

Jupyter notebook is a lot like Mathematica. I’m wondering if visibility to variables is similar to Matlab? With Matlab, I have a list of all the variables in a nice little box, with summary of their contents (byte size, dimensions, type, contents if it’s small enough to be displayed, etc), and to see the full contents, I just double click on it.

Looks like Jupyter Lab and Atom/Juno is what I’m looking for. Still not as well-integrated as Matlab is.

I suppose that’s my attraction to Matlab. There’s not a bunch of different programs/environments to juggle to get a good, consistent experience for rapidly developing scientific code (for simulation, modeling, etc). All still possible.

aimor · on March 26, 2022

I have the same experience, but it's more than just syntax. The Matlab IDE pulls together so much in a polished and robust product. Python notebooks and IDEs (Spyder, Jupyter, PyCharm, VSCode among a few others I've tried) are frustrating to use in comparison.

Robotbeat · on March 26, 2022

Yup, I agree 100%. I've been trying to use just vanilla python because of interdependency hell (and changing terms of service for anaconda), and I've been succeeding, but it's a LOT more work and less clear what's going on.

kzrdude · on March 26, 2022

Python was pragmatic and adopted changes that numpy needed and advocated for. Maybe Julia is the only other worthy comparison?

Also, dynamic typing is a boon - and default & keyword arguments is a great feature for complicated, versatile, useful algorithm implementations and interfaces to them. Both of these features have a cost in bigger programs, but they really make Python stand out.

michaelsbradley · on March 27, 2022

   before LLVM-based languages like Julia or Nim could get a foothold.

Nim isn't an LLVM-based language (not 100% sure you were implying that), though nlvm[1] is a thing.

Regardless, Arraymancer[2] is quite a gem in the scientific computing ecosystem (doesn't build on nlvm, i.e. the mainline Nim compiler is sufficient).

[1] https://github.com/arnetheduck/nlvm

[2] https://github.com/mratsim/Arraymancer

pdonis · on March 26, 2022

> C, C++, Fortran are still used, most Python users just don't see it because it's hidden away underneath the calling function.

Yes, the article talks about this: Python is a glue language and the actual heavy duty computation is being done inside an extension module like numpy that's written in a faster language.

joshuamorton · on March 26, 2022

OpenBLAS and LAPACK are mostly Fortran and numpy will use them if present on the system.

jurschreuder · on March 27, 2022

A single threaded, not vectorized for loop in c++ runs faster than calling BLAS from Python with Numpy though. The Python glue makes everything slow. For example if you read out the camera in Python using OpenCV you'll get a way lower framerate than if you do the same from c++, even if in both cases you use OpenCV which is c++.

zozbot234 · on March 26, 2022

> You had the US federal government funding language competitions because of this (see: Chapel).

Wait, weren't they supposed to be using Ada for everything anyway? What's wrong with Ada?

tormeh · on March 26, 2022

Different design objectives, I believe.

agumonkey · on March 26, 2022

Isn't it the usual "dynlang as prototyping clay" story ? python (with FFI -> native libs) gets you iterate over ideas faster and leaner.

_dain_ · on March 26, 2022

Nim is not LLVM based, it compiles to plain old C/C++.

adenozine · on March 26, 2022

Nim actually has LLVM support via nlvm

https://github.com/arnetheduck/nlvm

It's not officially blessed, but it does work.

scythe · on March 26, 2022

As a physicist, having spent eight years in academia, Python did not win by beating Fortran. Nor did it beat C++. It didn't really compete with Ruby or Lisp, although Lua (Torch) was a briefly serious competitor before everyone realized that a language developed by four people, one of whom doesn't get along with the others, couldn't be responsive to users' needs.

Python defeated Matlab. I know because I cheered it on. I was there. I watched my roommates and friends struggle with introductory scientific computing in Matlab and I joined the chorus that was practically begging for Python, even though I didn't really like it. I can't even begin to explain how awful it is to try to teach programming concepts in Matlab. But something like Python or Matlab had to be the choice because the schools wanted to teach programming through a language where you could just call "graph" and the computer would display a graph.

Python's team, unlike Lua's, aggressively courted educational institutions by offering scientific, numerical and graphical libraries within a programming language that works like a programming language, not a glorified computer algebra system. They even added a dedicated operator for matrix multiplication. It's a great example of finding a niche and filling it: I still don't like using Python, but I can't dispute that no other language/ecosystem comes close to offering what we need to teach programming to physics students.

You want to beat Python? Build a type system that can capture dimensional analysis. Warning: it won't be easy.

poleguy · on March 26, 2022

I'm in engineeering at a major engineering company historically using simulink and matlab. Python took over here in large part because matlab licensing caused so much friction, and we wanted to scale the simulink and matlab models up to run on a cluster of machines. We wanted to give scripts to people without matlab licenses quickly. etc. It was not the cost per-se, but the red tape.

We also ditched simulink because it is very difficult to version control and collaborate with a graphical interface.

Matlab is pushed heavily in the schools so all the engineers knew it and were comfortable with it. Matplotlib and numpy mimicing matlab very closely allowed the transition to be easy. We're not looking back. Only a handful of people still use matlab for their individual work because the python camp hit critical mass and the transition is not hard.

Matlab working to control serial ports, ethernet, visa/gpib instruments, all without the friction of getting extra licenses was icing on the cake. Matlab has a buy the cadillac model: the wheels, doors, hood, gas cap, mirrors are all optional add-ons. Each point causes friction, as only a few people had the whole tool, and therefore nobody could reliably share code.

rsfern · on March 26, 2022

> You want to beat Python? Build a type system that can capture dimensional analysis. Warning: it won't be easy.

Curious about your thoughts on pint and Unitful.jl — pint doesn’t really go all the way to a full type system, and Unitful.jl doesn’t work with everything (autograd is a problem still I think). But Unitful.jl is super cool.

https://pint.readthedocs.io/en/stable/wrapping.html#checking...

https://painterqubits.github.io/Unitful.jl/stable/

BeetleB · on March 26, 2022

> Python defeated Matlab.

This is the answer. Scientific Python was originally an alternative to MATLAB. When I was in grad school, I did most of my research in MATLAB. Then we had a visiting student who was doing very similar computations in SciPy, and he assured me performance was not a problem. I migrated my MATLAB scripts to Python and never looked back.

It was only after being a viable alternative to MATLAB did people decide it can be used for much more than what you typically get with MATLAB.

analog31 · on March 26, 2022

I think a factor in Python vs Matlab is that Python grew into areas where Matlab was not entrenched. Also, students with an aptitude for programming and an eye for the market want to learn languages that are used by software developers. Very few engineers actually want to program in Matlab. If they can program, then they want to market themselves as programmers.

A benefit of Matlab remains that it all comes from one place, with one installer, meaning that you can get a classroom full of students up and running almost instantly. And it offers some relief for students who will never grasp programming, through its collection of pre-written apps.

da39a3ee · on March 27, 2022

> where you could just call "graph" and the computer would display a graph.

Hang on! In what world can you just call "graph" in Python and it would display a graph?

In matplotlib on MacOS at least you try that and you get some bizarre shit about how Python isn't a framework, and you google it and find you have to do some obscure import and the import has to be in a particular order relative to other imports (totally unpythonic). https://stackoverflow.com/a/34583958/583763

Jupyter notebook... don't get me started! You do one thing and it starts a "server", and then you use that to start a "kernel" (and if my CS is dodgy and I don't really know wth these things are then I'm not having a great time already). Then this kernel thing is running Python. But oh, what version? And is it using my virtualenv? And then you google some matplotlib imports. And finally, yes you call "graph" and an ugly matplotlib png is displayed rather small in your web browser.

R however. There you just call "graph".

adw · on March 26, 2022

As a physicist who spent a decade in academia, including a PhD where all the new work was done in Python, it absolutely won in some fields by beating - or rather, by conveniently wrapping - Fortran.

(In particular, that’s how things have gone in the materials physics/solid-state/quantum chemistry field. It absolutely beat out Matlab in other fields. One of the underrated benefits was being a lingua franca across more of physics!)

dboreham · on March 26, 2022

Always nice to hear an authentic telling of history from someone who was there and had the necessary insight to interpret events and motivations. So much of what we read is "the victors' written revision".

_aavaa_ · on March 26, 2022

Oh god, the atrocities I have seen colleagues do with MATLAB scripts…

elil17 · on March 26, 2022

It’s amazing how often the authors point of “agility” arises in real world circumstances. I’m not a programmer, but I use Python a lot in my engineering job. There have been 3 times in the past month where I got an order of magnitude speed up because SciPy implements a very complex but highly efficient algorithm which I would never have had time to deploy.

JuettnerDistrib · on March 26, 2022

> There have been 3 times in the past month where I got an order of magnitude speed up because SciPy implements a very complex but highly efficient algorithm which I would never have had time to deploy.

Yes. I feel like the author conflates the language with the package ecosystem. Pure Python is pretty horrible for scientific computing (3*[3]=[3,3,3] is about as counterproductive to scientific computations as it gets), but Numpy changes the semantics of those operations.

In other words, Python has an absolutely stellar package ecosystem. There have been attempts to bring a package ecosystem to C, but it never took off. However, I do wonder how C would fare if it had.

Enhex · on March 29, 2022

there are package managers for C and C++, but people aren't told about them.

https://conan.io/

https://vcpkg.io/en/index.html

this implies Python's advantages is not having a package manager, but better teachers or at least teaching better practices, so it isn't even language related.

oh_my_goodness · on March 27, 2022

Yes, sure, the strength of Python is the libraries.

iainctduncan · on March 26, 2022

If you know actual scientists, this isn't counter intuitive at all. My partner is a scientist, so now I know tons of them, and I have done a bunch of Python coding and support for scientists, have been a Python programmer (as well as other languages) since 2005-ish. I saw this coming (as did many) 15 years ago.

Most scientists, and their grad students, are trying to do a whole bunch of things in their research, and programming is just one of them. Field work, experiments, data wrangling, writing papers, defending papers, teaching, etc. And most of them do not have access to budgets for programmers or when they do, it's for a limited amount of time and work, meaning they need to be able to pick up and run with whatever the programmer did. So the fact that with Python they and their grad students (who might be there for only 2 years) can be working productively, and figure out what the hell the code did when they come back to it months later, is HUGE. As in, literally blows every other consideration to smithereens. This has meant that over the last 20 years the scientific libraries in Python got mature faster than in any other language, and this in turn has had a snowball effect. And when speed is necessary, C++ extensions can be written. But honestly, most of the time speed is not the main factor.

The downside of Python in my experience is that junior teams can make heinous atrocities when a project gets really big (I have had to step in as CTO to one of those messes, so much as I love Python, I must admit this is true!) But the stuff the scientists are doing is very rarely that big. It's tools programming, scripting, making utilities, data analysis and so on.

Readability counts. In some fields, it counts more than anything. I've worked in about 10 languages now over the last 20 years, and Python is still the easiest to read when you come back to some old code or have to pick up code for a small job, or hand it to a beginner to extend without having them create an unreadable mess. This is what scientists need to do all the time.

Re other people's comments on Python packaging and setup being hard, well honestly I've had just as much pain with Ruby or Node. The shining exception there is R, which is giving Python a run for its money in many scientific areas. R Studio has the best "hit the ground running" experience out there and is really slick for data programming.

analog31 · on March 26, 2022

My partner's partner is a scientist too. ;-)

In addition to not having budgets for programmers, we also don't know how to manage them, for instance how to communicate our needs, decide if their implementation plans make sense, or gauge their progress. Nearly half a century after The Mythical Man Month, managing software development is still generally acknowledged to be an unsolved problem.

The other two obstacles are that most programmers hate the scientific work environment, with its ever-changing requirements and frequent dead ends. And, the programmers who can work on math related stuff are in the highest demand.

iainctduncan · on March 26, 2022

Spot on with my experience! Much of our work was helping them manage the project and figure out how to work with us. And someone went on sabbatical, and then someone dropped their program, and someone else left for another school, and someone was stuck managing the program for a semester who had literally no time or experience doing that, etc. It's a Dynamic Environment. lol.

There is no other language I have used that makes it as easy to read code from somebody else, especially where that contributor is likely to be a domain expert with very limited programming experience. It's not actually my favourite language anymore (hello Scheme!) but if you want me to do work in that environment, I'll reach for Python first.

protomolecule · on March 26, 2022

"the programmers who can work on math related stuff are in the highest demand."

Could you suggest the best way for finding a remote job for such programmer?

iainctduncan · on March 26, 2022

Network with scientists. Doing some small jobs or favours for scientists who will tell other scientists about you is the way to go. Universities are a good source of connections.

Icathian · on March 26, 2022

If labs are really struggling to find math-literate programmers, I would imagine it's in part because the process for matching them with the work is so terrible. Generally speaking, skilled programmers do not want to (and certainly don't have to) shake hands and do favors to find work.

I wonder if there's any concerted effort to fix that for academia, or if the "shortage" of math-literate programmers just isn't a problem worth fixing.

iainctduncan · on March 26, 2022

That was kind of my point. What skilled programmers do to find work is one thing, what scientists, who just need some help for a short project, do to find people is another. Assuming you are a programmer who wants to do work for scientists for some reason, you need to go where they are - they won't find you in your regular tech recruiting circles, which tend to be all about full time jobs. I happen to like doing some work for scientists so that my career isn't entirely about making private equity companies richer, but I don't expect them to pay my enterprise rates or find me on Linked In.

sjackso · on March 26, 2022

To make matters worse, university staff software engineering jobs usually pay 1/3 to 1/2 of comparable jobs in industry (even after excluding FAANG-level outlier salaries), and in most cases offer no meaningful career progression.

I think universities will never be able to compete for engineering talent until they can create attractive career paths for people who aren't professors.

jltsiren · on March 26, 2022

Universities will never pay competitive salaries, because academic research is not supposed to create direct monetary value to the employer. An engineer does not create enough value in the academia to justify anything approaching a competitive industry salary.

It's also ethically difficult to advocate for higher salaries in the academia, if you are already living a comfortable middle-class life. The money would ultimately come from taxes and tuition fees. If you think that those should be increased, the money would be better spent on helping your colleagues who are earning poverty-level wages.

Engineering is a support role in the academia, because pure engineers don't teach or set research directions. Most labs and most departments are too small to employ more than a handful of engineers, if any. Only large research institutes have enough engineers working on similar topics to justify creating senior engineering roles.

analog31 · on March 27, 2022

Anybody who wants to do this has to be willing to step off the gravy train. That sounds snarky, but it just reflects the Hard Problem of a skill that's of value in two sectors with vastly different economics.

There are people who have stepped off the gravy train because they don't like it, or they don't fit into the enterprise workplace for whatever reason. I might be one of those people. I work in industry, but in an early-stage R&D team.

Maybe the status quo is a reasonable solution: Find grad students who are willing to do the work in return for a chance to sharpen their programming skills. This process could be improved by providing scientists with training on how to write better code. The result will be a certain amount of attrition of scientists into software development jobs, but we have to get used to the idea that attrition into a more employable field is actually a good thing, and there will be plenty of scientists.

iainctduncan · on March 27, 2022

the Software Carpentry lecture group is an attempt to do this, it's pretty cool. Programming literacy workshops for scientists.

I've personally done this by finding normal work that is part time, so I can round it out doing work for scientists, artists, and musicians.

protomolecule · on March 27, 2022

>I've personally done this by finding normal work that is part time, so I can round it out doing work for scientists

That's what I did for the first 10 years after graduating from university. Eventually I transitioned to a full time 'normal' job but that made me unhappy.

chasely · on March 26, 2022

Early stage startups founded by scientists? At least that's my use case.

- get more traction

- I'm less able to focus on the software and have to focus on business development

- I'm going to need to hire someone who is a competent programmer but can also deal with the mathy bits

dotnet00 · on March 26, 2022

Yep, can confirm. Ended up doing some physics simulation work for my PhD (as a computer engineering major), my advisor constantly emphasizes that I focus on picking up the math so he doesn't have to put as much effort into explaining exactly what he wants.

It's pretty fun to do for me, but it's certainly challenging to balance with programming despite my advantage of having taken a lot of extra math classes.

blunte · on March 26, 2022

> Python is still the easiest to read when you come back to some old code

Lucky you. You must not have seen the "pythonic" monstrosities I've seen.

Python has such a low barrier for entry that one can "get stuff done" with absolutely atrocious and often very overly complicated OOP-ish code.

Ruby is not my favorite language, but I would bet real money that without dependence on libraries, nobody could show me Python code which I could not show more logical, consistent, and readable Ruby code which solves the same problem. I say Ruby because it's of the same "type" and follows similar methodologies.

Python suffers from far too many years under the leadership of one odd person. It has a cult-like following, whereby anyone who disagrees is an outcast. Where else could you hear comments like, "why would you ever need a switch statement? if/if else works fine!" That's just the tip of the iceberg.

Python is great for integration glue code, but only because of the libraries it has. But now it is becoming more Javascript like, and the dependencies are multiplying to the point where you're better off writing your own left-pad instead (or even re-evaluating your approach) instead of taking on new duct tape like django-database-view.

Sometimes the bar needs to be high enough to force the juniors to actually learn something before they start building "MVP" startups. On the other hand, who cares if the MVP is a horror show as long as you get that IPO and take your f-u money and leave.

iainctduncan · on March 26, 2022

So my real job is technical due diligence on companies being purchased. I get the keys to the kingdom when we do a diligence and trust me, there are just as many people making unmaintainable monstrosities that get bogged down in tech debt in Ruby. Looking at this scenario is literally my job, and the company I work for does more of these than anyone in the world.

Bad coders can make terrible stuff in any language, and with two as similar as Python and Ruby, the minor differences are a drop in the bucket in the grand scheme of things. Both Django-database code and RoR's Active Record have bogged down many a startup when they got big enough that DB size and query performance mattered.

None of which, as I pointed out, is relevant to the vast majority of scientists writing code.

danielmarkbruce · on March 27, 2022

In your experience, what are most buyers looking for when they get you to do technical DD? Is there a specific set of things they are worried about? Specifically looking to confirm, etc?

iainctduncan · on March 27, 2022

Typically they are looking to us to surface areas where they will have to make "disproportionate investment" (as they say) to allow the company to support a ramp up in growth. "What will be a problem when you have 2x as many customers? 10x as many DB entries? 10x as many customers?" etc. Because private equity funds buy companies that are growing and (usually) already profitable, this often equates to tech debt that happens once the DB is big. A very common scenario is that reporting, for example, has become a problem and it's time for the target (as we call them) to be using heavier weight architecture patterns like command query segregation or dedicated reporting databases or materialized views and such. So in the case of Ruby and Python shops, we will definitely be asking if their domain layer is working ok and trying to find out if they've written themselves into a corner by having the code assuming it will always be an RoR app or whatever. I have interviewed more than a few that were in serious trouble from not isolating their Active Record dependency and thus got themselves in a situation where efficiently fixing the database was going to be a lot of rewriting. We see this in other languages too, but Active Record is absolutely a smoking gun there.

The takeaway: always, always, have a domain layer that allows you to refactor your model access without changing tons of code. Data load grows in ways people don't predict. If your company succeeds, in five years you'll wish you had it!

Tech debt kills loads and loads of companies and most folks never hear about it because that's not the stuff that gets publicized or written about. We call it the silent killer...

IshKebab · on March 27, 2022

> Tech debt kills loads and loads of companies

That's really reassuring to hear. Whenever I say we should spend time on tech debt it's always greeted with "we can worry about that later when we're really successful" as if there will suddenly be an opportunity to completely rewrite everything (because that's what it will take).

iainctduncan · on March 27, 2022

Yeah, the idea that tech debt doesn't matter and can just be fixed later is the biggest bullshit myth out there in tech land. The thing is, no one notices or writes about when a startup does an "underwater sale", which is typically an investment firm's portfolio company (company they already own) buying up a competitor for less than the company was worth on the last round, done usually in order to buy the customers, staff, or IP. It happens tons. It's a "cut bait" scenario (i.e. let's lose less now instead of everything later) for the selling company/owners and is usually a result of technical debt.

danielmarkbruce · on March 27, 2022

Thanks for the explanation, much appreciated. Any chance we could chat further on it? My email is username at the common google service.

knighthack · on March 27, 2022

> Ruby is not my favorite language, but I would bet real money that without dependence on libraries, nobody could show me Python code which I could not show more logical, consistent, and readable Ruby code which solves the same problem. I say Ruby because it's of the same "type" and follows similar methodologies.

I keep hearing all the stories about Ruby supposedly being more logical and sound than Python. I really would love to see actual source code being cited to back those claims.

blunte · on March 29, 2022

Quick list, not organized:

- The OOP features of Ruby are consistent and ubiquitous (everything in Ruby is an object); Python depends on manual patterns to do OOP (self as the required first argument). Python also depends on special decorators to indicate what functions are instance vs static. Ruby does have a difference in definition, but it's simpler and more obvious (and requires one line fewer of code to define)

- Only finally has Python gotten a switch statement, and surprisingly it has adopted some Elixir-ish pattern matching features. Incidentally, some in the Python community are strongly against this new thing. "Why would you need that!?" Prior to 3.10, you would need more complicate if/else if structures in Python to do the same thing you could do in a concise and clear Ruby case (switch).

- Operations on collections: this is often described as functional programming, but it really is just "doing stuff on collections of data". And in that story, Python's list comprehensions are arguably less readable and less logical than Ruby's. Many of the tools you need in Python must be explicitly included from functools module.

- Ternary operator: many languages have `expression ? do_true_path : do_false_path`. It's a very common pattern which is concise and honestly quite clear. "This thing is true? then do this; else do that". But in Python you break that up into "do_true_path if expression else do_false_path".

- Everything in Ruby has a return value, but not so in Python. So in Ruby you can make assignments (or return values) from the result of if/case. For example, assume you want to return a specific value based on some series of conditions, such as handling an error and returning some enriched data based on the error code. In Python you will have to define a local variable and explicitly set that variable equal to some value in each branch of the conditional. Then afterward, you can use the value of the local variable. Or you would have multiple returns, one in each conditional branch. In Ruby you can simply do x = case ..., or because the last statement of a function is the return value, you wouldn't even have to return it. You just have 'case ...', and the value of the branch is what is returned.

There's a lot more. Some of it is subjective, but my belief is that once someone really knows both, they will prefer the Ruby way. And the more languages you know, the more you develop refined tastes. Ruby still holds up well after knowing 10 languages for me.

(added): the whole whitespace as code thing of Python. It does have one common pitfall, and that is when a line of code gets accidentally indented or unindented below a block which was indented. That line changes scope, likely changing the runtime result; but it may be technically valid, so the developer may not notice the mistake. This is just not a problem with languages that have { } or begin/end delimiters.

xtracto · on March 27, 2022

I wish all the scientific libraries had been implemented in Ruby. I think it will be easier to adopt by non developers, in contrast with python.

blunte · on March 27, 2022

At least it would be more consistent and uniform.

The significant dangers in Ruby lie in the metaprogramming, which probably scientist devs aren't exercising.

zozbot234 · on March 26, 2022

> Readability counts. In some fields, it counts more than anything. I've worked in about 10 languages now over the last 20 years, and Python is still the easiest to read when you come back to some old code or have to pick up code for a small job, or hand it to a beginner to extend without having them create an unreadable mess. This is what scientists need to do all the time.

Meh. Python might be readable at the smallest scale, but then COBOL is even more readable. What matters is large-scale development, and your implied point that large Python projects turn into unstructured big-ball-of-mud monstrosities is well taken. A big ball of mud is not surveyable, or "readable".

Which is where other modern languages (e.g. Julia in the scientific programming domain, heck even Go or Rust) will probably have an advantage.

iainctduncan · on March 26, 2022

think what you will, the scientists disagree. Which was the point. Not holding my breath to find many scientists matching my description who would rather learn Go or Rust...

shpongled · on March 26, 2022

I'll be the counter example. PhD in life sciences, but I've also been programming since I was a teen. Rust is by far my most used language for both general fun projects and in my role as a programmer in the life-sciences. Python is OK for ad-hoc analyses, but I cannot stand to use dynamically typed languages for anything "real" given how much difficulty dynamic typing imposes on reading and understanding code.

iainctduncan · on March 26, 2022

Sure, but by your description, you aren't really the people I'm describing. If you've been doing this since you were a teen, you're a "Real Programmer". My point was that people who have to do this as item 7 of 10 things in their job description are very much less likely to learn something like Rust than Python. That is undeniably a bigger lift to a non-programmer. Python's success in the sciences is in large part due to how good a fit as a language it is for part-time occasional programmers.

I like all kinds of languages, but the only ones I would encourage my partner to bother with as tools for her science work would be R and Python.

IshKebab · on March 27, 2022

I don't think the scientists disagree. They just don't really care if their code is a big ball of mud.

zozbot234 · on March 27, 2022

They might even care, they just don't know any better. When the typical scientist "learns to code", there's no one around telling them how to do proper software engineering.

At best, they engage in the polite fiction that it just doesn't matter, because all that code is inherently "throwaway" stuff that's only used for playing with in the context of research. Of course even that is wrong, the code doesn't really disappear like that.

iainctduncan · on March 27, 2022

No. Scientists are smart people. It's not that they don't care or don't know better, it's that they have different priorities. Every scientist I know is smart enough to be well aware that they don't have the know how for proper software engineering, but they also do not have the time or resources to learn to write code the way you would for a long term product.

I would not be patronizing to these people, they are very smart. They just live and work in a world that is completely different from technical product firms in pretty much every regard.

zozbot234 · on March 27, 2022

You're just rephrasing the "throwaway code" polite fiction. Increasingly, publishable-quality research is expected to be publically reproducible, and that means the code must stay around, potentially in the "long term". Every scientist loves it when their research gets cited a lot, right? Well, those citations become worthless if you can't reproduce the research because the code is an unsurveyable mess relying on bitrotted, unsupported external components.

icedchai · on March 26, 2022

Some of the best code I’ve worked on has been python. Unfortunately, also some of the worst. 5000 line, single file “modules”, with spaghetti class hierarchies (5+ levels deep) and dynamic method calls making it nearly impossible to debug.

bigbillheck · on March 26, 2022

> What matters is large-scale development,

Not for the kind of computing being talked about.

zozbot234 · on March 26, 2022

In a relatively terse language like Python, anything beyond a few screenfuls of code is already "large scale" development. It's unwise to keep it all in a single module.

ip26 · on March 26, 2022

Not to mention, what they are working on is often very abstract compared to the math many programmers are used to doing. I write a lot of boolean; my scientist partner writes regressions, surface transformations, eigenmaps, linear algebra, and so on. Imagine being something other than a programmer by trade, and trying to apply linear algebra to your problem without good tools or libraries.

liamwestray · on March 26, 2022

Literally can't add anything to this.

You nailed it.

I suspect MicroPython is going to do the same thing to Arduino/C as Python just did here in Academia as well.

rleigh · on March 27, 2022

I'm afraid I don't agree. MicroPython is neat, but if there's one thing Python is not suited for, it's microcontrollers. Coupling one of the slowest scripting languages with low latency near-realtime requirements is not a recipe for success. It might be useful for teaching basic concepts, but it is not going to be useful for real applications. And Arduino already has the teaching of basic concepts nailed very effectively.

I certainly think that MicroPython serves a niche, primarily very simple hobbyist/educational roles. However, I do not regard it as suitable for anything beyond this. It's the wrong tool for the job, and if you want a scripting language for low-latency low-overhead use, there are smaller and more efficient languages which fit better into an embedded role.

liamwestray · on March 27, 2022

You’re an engineer. And engineering programs will continue to use engineering focused tools. The goals of engineering program projects are vastly different than other areas of academia. Developing the tool is the project for engineering.

Bioscience / environmental science programs will find micropython good enough for their needs. The tool itself is just the means to the end if real science. Micropython let’s you deploy in lower power applications without having to learn much beyond what you already know from Jupyter notebooks.

I really don’t know any PhD students or post-docs in microbiology/environmental sciences who have the time to learn embedded C or similar languages.

jrochkind1 · on March 26, 2022

As a rubyist, it makes me sad that python ended up here rather than ruby. And I sometimes wonder why.

> As the name suggests, numeric data is manipulated through this package, not in plain Python, and behind the scenes all the heavy lifting is done by C/C++ or Fortran compiled routines.

So I wonder, was it easier to write C/C++ or fortran compiled extensions in python than it was in ruby?

largbae · on March 26, 2022

Readability, 100%. I have programmed in large projects in both Python and Ruby.

Ruby is very productive to write, because everything and the kitchen sink is at your fingertips at all times.

But because of Ruby's many ways to skin a cat, everyone's code is very different. Add to that the penchant for domain-specific sub-languages in Ruby: new syntaxes that you might have to learn half a dozen of to integrate a large project, all of which end up being more limiting than if you could just, you know, write Ruby.

Contrast with Python, which goes so far at normalizing as to have a language-wide coding standard in PEP8. Python has its problems, package management and distribution is still ugly for example. But I can read any project I find and understand it without loads of context.

civilized · on March 26, 2022

My impression is that Perl, Ruby, and Lisp all suffer from this issue.

Even proponents will say things like "this language is so expressive, I feel so productive in it, but people do such idiosyncratic and clever things that it's hard for anyone else but the author to understand".

That sort of "solo rockstar" programming culture doesn't really lend itself to large-scale FOSS projects, which need to be inviting to wide participation.

pwnna · on March 26, 2022

A language like Ruby can be very productive for someone who has climbed the learning curve to learn all its ins and outs. However, in my experience, this turns into a large productivity drain the moment someone else (who is less of an expert) has to touch it.

For large projects with multiple developers, readability should win over writability every time, most code are read more than they are written (my hypothesis). You can see evidence of this, given the success of languages like Python and Go.

That said, for scientific compute, in a lot of cases, writability matters way more, as your job as a scientist is to produce results as fast as possible, code quality be damned. However, only a small number of scientists are expert developers and have climb the learning curve and can write code with ease. The vast majority of them are junior at best, and Python's approachability (which is rooted from its readability of course), wins. With most of the people using Python, the ecosystem develops and there are no other viable alternatives. In the long run, I suspect even languages like MATLAB and Mathematica will die out as the open source stack becomes more mature and (eventually, if not already) significantly more capable. Julia might be a wildcard due to its (potential) performance advantages, but the aesthetics of the programming language is simply not in the minds of 99% of the scientific compute users out there.

jrochkind1 · on March 26, 2022

I am totally interested in hearing opinions from people who have done serious hours of programming in both ruby and python, as to readabilty comparison.

If it's just people who have done a lot of work in ruby and little in python saying ruby is more readable, and vice versa, I don't find it very useful even anecdotally.

pwnna · on March 26, 2022

As someone that have spent a lot of time in both Ruby and Python (and specifically a lot of hours in the Python scientific compute stack), I would say that Python is significantly more readable. Python is also significantly easier to teach as opposed to Ruby, especially if the target audience already has a bit of programming experience (from MATLAB, or other courses).

I suspect the main reasons are:

1. Python's guiding philosophy of "There should be one-- and preferably only one --obvious way to do it". With later additions to the language, this is getting less true (3 different ways to format strings, asyncio, type hinting, etc). Some libraries also don't conform to this (matplotlib). That said, it's a lot better than the Ruby code I've encountered, which is like the wild west.

2. Python's syntax is reasonably simple to teach. The object model could be condensed into something very simple if you don't need a lot. With very basic knowledge, you can go a long way. Ruby's a bit more chaotic with things like inheritance, extend, and include; proc, block, lambda; having to use attr_accessor; syntax things like a = b could be a function call or not; if/unless; and many more things that are confusing.

3. Even basic things like loops in Ruby is not idiomatic as it wants to apply a function/block instead. Beginners, especially those with a bit of background, like their loops better than functional programming.

As I've spent many years working on Ruby code base I still get lost all the time. Python in my experience has been a lot better, although recent Python versions have regressed a bit as it introduced more syntax to do the same things.

auntienomen · on March 27, 2022

I've done both, in particular learning large codebases, and vastly prefer reading Python. Ruby code is too verbose; it reads like Java.

jasode · on March 26, 2022

>So I wonder, was it easier to write C/C++ or fortran compiled extensions in python than it was in ruby?

Don't know about technical aspects of "easier" but it may have simply been an accident of history.

E.g. in 1995 (before Ruby 1.0 December 1996[0]), David Beazley was already wrapping C Language code to Python. Deep link to presentation: https://youtu.be/riuyDEHxeEo?t=52m27s

So DB's Python code for scientific code was released in Feb 1996 and presented in July 1996. Python being released in 1991 was already talked about in magazines in 1995. David's presentation also references Jim Hugunin[1] and he authored the 1995 Numeric package which was the ancestor to NumPy. Once an ecosystem gets started, it can attract more mindshare and snowball into an insurmountable lead that neither Ruby nor Julia will ever catch up to.

In other words... If the opposite timeline happened and Ruby was released earlier in 1991 and Python later in 1996, things may have played out differently.

So folks like David Beazley and Jim Hugunin chose Python as the scripting host language for their C Language code probably because Ruby wasn't mature and well-known back in 1995. Apparently, Ruby didn't widely spread outside of Japan until 1998 when the first documentation in English appeared.[2]

[1] https://youtu.be/riuyDEHxeEo?t=30m04s

[2] http://blog.nicksieger.com/articles/2006/10/20/rubyconf-hist...

guidoism · on March 26, 2022

In 2009 I began writing code for a new company doing natural language processing. I was the engineer at the time and got to pick my tools. I started with Ruby because I was sick of C++ and Perl and Ruby looked like the future. But I soon discovered the NLTK and then Numpy and so I started playing around in Python. I never again wrote line of Ruby… until the later hired front end devs threw a fit of not being able to use Rails.

It was clear at the time that there basically was no non-web Ruby community. Ruby was Rails and Rails was Ruby. Ruby had a nice little niche in 2009 but the Python had Numpy and there were a lot of ML people doing lots of math and Ruby wasn’t going to cut it unless they wrote their own libraries, which wasn’t worth the effort since Python and Nunpy already existed and already had a growing community behind it.

it_does_follow · on March 26, 2022

> And I sometimes wonder why.

Numpy.

I honestly think it all boils down to numpy being developed long before matrix libraries became a standard part of software development.

Ruby's early "killer app" (remember that term?) was Rails. Even to this day there is almost no major code out there built in Ruby that isn't ultimately related to building CRUD web apps. While Ruby may be losing popularity now, it moved the web-development ecosystem ahead in the same way that Python has moved the scientific computing world ahead.

20 years ago if you wanted to use open source tools to performant vector code there was Python and a hand full of oss clones of commercial products. Given the Python was also useful for other programming tasks in a way that say Matlab/Octave is not, it was the choice for more sophisticated programmers who wanted an OSS solution and need to do scientific computing. This creates a positive feed back that persists to this day.

Given that Python remains a decent language relative to it's contemporary peers and it has a massive and still growing library of numerical computing software it is extremely unlikely to be dethroned, even by promising new languages like Julia.

Even to this day there is nothing even close to numpy in Ruby. I do DS work in an org that is almost entirely Ruby, but we still use python without question because we know re-implementing all of our numeric code into Ruby would be a fools errand.

Had ruby had early support of matrix math, it wouldn't have surprised me if it would have replaced Python.

jrochkind1 · on March 26, 2022

I think it's clear numpy is a huge part of it.

But that begs the question -- why did numpy develop in python and not ruby?

The rest of the thread offers some suggestions though. One is simply that python was born first, and got the numpy precursor before ruby 1.0 even happened. Which seems like a thing.

jasonwatkinspdx · on March 26, 2022

Ruby had a numpy style library since the early 00's, I forget exactly when. But it never got the kind of momentum numpy and the Python ecosystem surrounding it did.

Lots of comments in this thread from people who's Ruby experience is only from the post Rails era after ~2008, and don't understand that the post Rails culture wasn't really a thing when Python was first gaining momentum for scientific computing.

cutler · on March 26, 2022

You're forgetting that Python was designed by a mathematician looking to replace ABC.

wwfn · on March 26, 2022

Perl was my horse in the race. I attribute it's, lisp's, ruby's, etc loss to 1. "There should be one-- and preferably only one --obvious way to do it" being part of python's ethos. 2. ipython repl

1. pairs with jaimebuelta's artistic vs engineering dichotomy, but also plays into the scientist wearing many more hats than just programmer. Code can be two or more degrees removed from the published paper -- code isn't the passion. There isn't reason, time, or motivation to think deeply about syntax.

2. For a lot of academic work, the programming language is primarily an interface to an advanced plotting calculator. Or at least that's how I think about the popularity of SPSS and Stata. Ipython and then jupyter made this easy for python.

For what it's worth, the lab I work for is mostly using shell, R, matlab, and tiny bit of python. For numerical analysis, I like R the best. It has a leg up on the interactive interface and feels more flexible than the other two. R also has better stats libraries. But when we need to interact with external services or file formats, python is the place to look (why PyPI beat out CPAN is similar question).

Total aside: Perl's built in regexp syntax is amazing and a thing I reach for often, but regular expressions as a DSL are supported almost everywhere (like using languages other than shell to launch programs and pipes -- totally fine but misses all the ergonomics of using the right tool for the job). It'd love to explore APL as an analogous numerical DSL across scripting languages. APL.jl [0] and, less practically april[1], are exciting.

[0] https://github.com/shashi/APL.jl [1] https://github.com/phantomics/april

jltsiren · on March 26, 2022

From what I remember, people were actively promoting Python as the first programming language already in the 90s. Many universities started teaching Python, creating a steady supply of non-CS majors who were familiar with Python but no other language. And because the community was there, people started building the ecosystem.

In contrast, I've never really encountered anyone advocating for Ruby outside web development.

zozbot234 · on March 26, 2022

Python got its start as a pure teaching language. It's what the language was designed for in the first place, a modern alternative to old BASIC.

eesmith · on March 27, 2022

van Rossum was one of the implementers of ABC, which indeed was created to experiment with how to develop a programming language for beginning programmers. (Note: he was not one of the designers of ABC.)

While van Rossum drew from that experience when making Python, the initial driving goal was as a scripting language for system admin tasks in the Amoeba distributed operating system.

https://docs.python.org/3/faq/general.html#why-was-python-cr...

bsder · on March 26, 2022

> And I sometimes wonder why.

David Beazley talks about this in a YouTube video somewhere. (Can't find it right now, maybe someone will in the comments.)

It was a lot of serendipity. Python was up and running when the US national labs wanted to collaborate and their tools all sucked. Since they wanted visualization this left only Tcl/Tk or Python/Tk. And Beazley was hanging around as a grad student in a national lab with a connection machine, no real boss, no real oversight, and very little budget. He built stuff out of Python, and it snowballed to other labs.

(Found it: see jasode's response) https://www.youtube.com/watch?v=riuyDEHxeEo&t=1804s

civilized · on March 26, 2022

Timing could be a factor. Python was released in 1991. Numeric, the ancestor of NumPy, followed in 1995, the same year Ruby was released. So Python already had its hooks into scientific computing before Ruby even started.

adw · on March 26, 2022

Fortran interop (f2py in particular) was a significant factor, and as soon as you get one thing (in this case LAPACK and BLAS bindings) it snowballs. Also, Python is significantly more initially familiar for informal programmers and that’s critical; the hard part of learning a language is often believing that you can -and Ruby looks weirder than Python, so it makes people doubt themselves.

belval · on March 26, 2022

I don't know how easy it is in Ruby so I cannot give you a comparison.

However it is very very easy to write Python bindings for a C/C++ library with minimal work. Solutions range from "just works" like ctypes to "actually integrates with the language" like Cython. You also have automated tools for wrapping like pybind11 which does a lot of the heavy lifting for you.

prpl · on March 26, 2022

It was multiple things, really. I would attribute ute some of it to Swig, Perl attrition, SCons/Software Carpentry, integration with GUI libraries, good documentation, and various other efforts in the mid 2000s. A lot of those things were solving research problems simply, and Python’s use just kept expanding.

Python was already taking over in many use cases by late 2000s.

Ruby was known, but it didn’t have the following at multiple levels in academia like Python did

jrochkind1 · on March 26, 2022

You describe what happened, which I saw happen too. The question I have is why though. Right, why did python's use in scientific computing keep expanding, and not ruby's? Why was python already taing over many use cases by the 2000s, but not ruby? Why did python develop the following at multiple levels in academia, and not ruby? (Why is Perl attrition relevant, when ruby was in fact explicitly based on Perl?)

That's the question, not the answer!

It seems like a lot of the answer is NumPy, which makes the question -- why did NumPy happen on python, not ruby?

Certainly one answer could be "nothing having to do with the features of the language, it's just a coincidence, they chose to write it in Python, if those working on numpy had chosen to use ruby instead, history would be different."

But one hypothesis is that maybe NumPy wouldn't have been as easy in ruby as python.

Someone else suggested the first numpy release happened before the first ruby release, so that could also be an answer.

jaimebuelta · on March 26, 2022

I think the difference is in the community. I've used both Python (extensively) and Ruby (a little bit). While the capacities of the languages are relatively similar, the people around the languages, at least the ones creating packages and driving the discussion in conferences are actually quite different, for some reason.

People attracted to Ruby are mostly of an "artistic mindset", they want to be expressive, write code that doesn't look like programming code and using "magic" like dynamically created methods, monkey-patching, etc is accepted or even encouraged.

On the other hand, Python attracts more people with "engineering mindset", they like straight forward code that's readable, clear and understandable, even if it's not as expressive. "Magic" elements are frowned upon: for example, imports are explicit and always included in each file.

Obviously, I'm exaggerating it, but I think is a clear differentiation between the communities.

My guess is that the "Python mindset" got into creating better integrations for "engineering applications", like NumPy or SciPy, and that created some positive feedback in certain environments. The main strength of Python is its rich ecosystem of third party packages. There's a compounding effect, making it grow faster and faster.

kstrauser · on March 26, 2022

I think that’s exactly it, and that there’s much less understanding required to start reading and writing Python code. Ruby has some beautiful features, but they make it much less clear to newbies who are trying to figure out what on earth’s going on.

hnfong · on March 27, 2022

You're definitely not exaggerating.

Ruby makes it easy to do "magic". Which is fun to write, but painful to read for others.

I've encountered real cases of ruby code where a simple code snippet behaves differently with and without a `require` (IIRC, some utility function added to a class with monkey patching). In another case I've also had to modify (and to some extent maintain) a codebase that relied on overriding `method_missing` in the happy case / normal flow. I was trying to find out where some method was being defined by grepping the whole codebase. It probably cost me half a day of unabridged profanity.

In theory you can do the same thing with python -- thing is it usually doesn't happen for some reason (likely the ones you mentioned). Something about the language features and the culture in the community lead to devs doing different things with the different languages. But the effect is real, and I know which language to avoid if I had the choice.

dalke · on March 26, 2022

I was using Perl and Python in the 1990s for scientific work.

Around 1993 I got hooked on Perl. I read the Perl book and it was great. But 1) I couldn't figure out how to handle complex data structures (this was Perl 4), and 2) I couldn't embed it into other projects.

More specifically, worked on a molecular visualization program called VMD. It had its own scripting language. I wanted a language to embed in VMD that was usable by my grad student users. This is when I first learned about Python, but I chose Tcl because it fit the existing command language almost perfectly.

At around the same time, UCSF started embedding Python for their molecular visualization package, Chimera, so it was already making in-roads in structural biology.

I later (1997) went into more bioinformatics-oriented work, where I did a lot of Perl. I tried out one implementation (a Prosite pattern matcher) in Perl - which took me reading an advanced Perl book to learn how Perl 5 objects worked. I then tried the same in Python, a language I wasn't as familiar with. And it was just so much easier!

At this time Perl was THE language for bioinformatics, but I thought it was a difficult language for complex data structures. (Bioinformatics at that time was mostly string related, plus CGI and databases - Perl was a great fit.)

I then moved over (1998) to cheminformatics, working more directly on molecular graphs. Python was a much better fit for those data structures than Perl. I started using Python full-time, and it's been that way since.

We used a third-party commercial package for the underlying cheminformatics called the Daylight toolkit. It had C and Fortran bindings. Someone else had already written the SWIG configuration to generate Perl, Python, and Tcl bindings, but these still meant manual garbage collection.

I was able to use __getattr__, __setattr__, and __del__ to turn these into a natural-feeling high-level API, hooked into (C)Python's reference-counted garbage collector.

I presented a couple of talks about this work, got an article in Dr. Dobb's (!) and got consulting work helping companies which either had existing Python work, or were moving to Python.

By contrast, I don't think I heard about Ruby until 2000 or so, years after Python started entering structural biology/cheminformatics. [1]

I wasn't particularly cutting edge - others had already developed tool like SWIG, which was because Beazley and others were using Python at LANL. Numeric Python started in part because of work at LLNL and other research organizations. The concept already firmly established was that Python would be used to "steer" a high-performance kernel.

And Python in turn changed, to better reflect the needs of numeric computing, in particular, the "..." notation in array slices was added to make matrix operations easier. (This was 20 years before '@@' was added to simplify matrix multiplication.) I believe the needs of numeric computing also influenced the changed to "rich" comparisons.

This all took place around the time Matz started developing Ruby. Python had a clear head-start. And except for bioinformatics, Perl never had much presence in the fields I worked in.

So:

> why did python's use in scientific computing keep expanding, and not ruby's?

Because Python was in-use several years before Ruby, and already rather visible as one of the three main languages to consider in that space (Tcl and Perl being the other two).

> Why was python already taing over many use cases by the 2000s, but not ruby?

Because people didn't really know about Ruby, while Python already had a pretty large user community. Probably also because Python's work was all in English, while a lot of the Ruby community was using Japanese.

> Why is Perl attrition relevant, when ruby was in fact explicitly based on Perl?

Perl attrition started before Ruby was much known. The complexity of the language, and the cumbersome need to roll-your-own OO, made it difficult for me to recommend to the typical software developers I work with - grad students and researchers in the physical sciences with little formal training in CS. Python by comparison which easier to pick.

So a language which explicitly based on Perl also picks up that negative impression.

(FWIW, I think Tcl is an easier language to start with than Python.)

> why did NumPy happen on python, not ruby?

Numeric computing in Python started before Ruby was much known. Quoting https://en.wikipedia.org/wiki/NumPy

"""In 1995 the special interest group (SIG) matrix-sig was founded with the aim of defining an array computing package; among its members was Python designer and maintainer Guido van Rossum, who extended Python's syntax (in particular the indexing syntax[8]) to make array computing easier."""

Quoting https://en.wikipedia.org/wiki/Ruby_(programming_language)

"""The first public release of Ruby 0.95 was announced on Japanese domestic newsgroups on December 21, 1995. ... In 1997, the first article about Ruby was published on the Web. ... In 1999, the first English language mailing list ruby-talk began, which signaled a growing interest in the language outside Japan."""

[1] Ha! I found a comment I made in 2003 saying I had looked into Ruby "a few years ago", at https://groups.google.com/g/comp.lang.python/c/xBWUWWWV5RE/m... . I also wrote:

""" I think my criteria for selecting Python over Perl is still true for Python over Ruby, in that it has too many special characters (like @ and the built-in regexpes), features (like continuations and code blocks) which are hard to explain well (I didn't understand continuations until the Houston IPC), and 'best practices' (like modifying base classes like strings and numbers) which aren't appropriate for large-scale software development."""

calmdown13 · on March 26, 2022

This episode of the lex fridman podcast gives a good overview of how python's scientific computing community developed. https://youtu.be/gFEE3w7F0ww

beagle3 · on March 26, 2022

Nitpick: Numpy is the newest, revised and reconciled vector library for Python; The first one was called “Numeric”; then there was “Numarray” which was not fully compatible, which caused a bifurcated ecosystem; and then IIRC it was Travis Oliphant who decided enough is enough, created Numpy which was somehow magically backward compatible with both, and reunited the community.

indymike · on March 26, 2022

> It seems like a lot of the answer is NumPy, which makes the question -- why did NumPy happen on python, not ruby?

One of Python's original use cases was as a macro/script language you could import into your C application. Adding python to your C app took a day or so, and a side effect often was you'd make a Python library out of your app's library and suddenly, you could write standalone Python that called your app's code. Because it was so easy to write a python wrapper for an existing C library, by 1995/96 when Ruby hit the scene, Python already had quite a bit of importable functionality. The first serious web framework for Python was Zope, and it came out in 1998. I think Rails was around 2005/2006, and it was really cool, but one of the rubs was Ruby didn't have the libraries that Python did. In reality, it's amazing how good Ruby and Python have been.

mountainriver · on March 26, 2022

It’s all about the community. As soon as a language gets attached to a profession it’s hard to break. Ruby has primarily been a web dev language, also the syntax is bad =P

masklinn · on March 26, 2022

> As a rubyist, it makes me sad that python ended up here rather than ruby. And I sometimes wonder why.

Work on numerical packages and scientific computing started almost as soon as the language did, for instance the origins of Numpy lie in the Numeric package which was introduced in 1995.

And the core team introduced several niceties at the behest of the scientific community (advanced slicing for instance, more recently the matmul operator).

blondin · on March 26, 2022

not sure. there are many factors that contributed to python's success.

i discovered the language in 98 or 99. it came with some obscure linux distribution and the tkinter module stood out for me. it showed pretty scientific graphs and charts. but the language has to reinvent its community many times since then.

my intuition is that it was popular in europe in the scientific community. not sure i can say the same for ruby.

socialdemocrat · on March 27, 2022

The performance Python is a real problem but Python has succeeded because scientific computing really needs interactive and dynamic programming languages. You need something which lets you easily experiment with data, plot, change code in rapid iterations without constant recompilations and reloading of data.

This has been recognized for some time. The compromise had been to build performance sensitive parts in C/C++ and do the experimental/iteration part in Python.

But today you don’t really have to compromise anymore. We got Julia. It solves the whole problem. You get the interactivity you need combined with the performance.

Of course in my his industry momentum matters. Python has built up the momentum of an oil tanker. Even if you shut off the engines it is going to keep going for many years.

But Julia is the obvious end station. It does all the things HPC and scientific computing needs. But building mains share, documentation, community, polish tools etc will of course take time.

akhmatova · on March 27, 2022

It solves the whole problem. You get the interactivity you need combined with the performance.

There are other aspects to the "whole problem" -- you also need a massive ecosystem with adoption across disparate communities (devops, web development, etc). And decades of momentum.

That's why Python isn't going away anytime soon, despite its slowness and warts.

sega_sai · on March 26, 2022

As many already noticed, the rise of Python is not counter-intuitive at all. (I'm a scientist myself).

Basically modern python offers you a spectrum from easy to understand and quick to write python programs (those will be slow), to purely glue code that connects a lot of high performance c/C++/fortran code. And many scientists will start from pure python code with the help of numpy. In many cases it will be good enough. But if needed you can always interface with other libraries, or write yourself high performance c/c++/fortran code for the most performance critical bit, and use python to glue it together. That flexibility where you can trade speed of writing the code with the speed of execution is very valuable.

chalst · on March 26, 2022

At this point we can say that against the two criteria of a spectrum from prototyping to heavy lifting and ease of embedding external high-performance libraries, Julia is simply better than Python. Julia does have two drawbacks of being tied to the one, rather heavy metal, implementation and lacking the wealth of libraries outside scientific computing.

sega_sai · on March 27, 2022

From just my personal experience, I've had a python code interfaced with C that I rewrote just for fun in pure julia. It was significantly slower then the C code and I couldn't as easily use OPENMP parallelization (although the symbolic derivatives are great). Obviously I know julia much less well, but so far in a few cases I tried, I could not convince myself that julia offers me enough advantages over my current approaches that rely on python & C/C++

chalst · on March 27, 2022

As I guess you're hinting at, writing performant pure Julia is a quite different kettle of fish to writing performant C: it's not surprising that a first attempt isn't a rousing success. But there's a spectrum of rewriting possibilities: you can write Julia-interfaced-with-C as a direct analog of the existing code, and you can convert just those parts of the C code that you think would most benefit from Julia's JIT into Julia, leaving most of the heavy lifting in C.

ChrisRackauckas · on March 27, 2022

This is changing BTW. There's been lots of improvements in escape analysis and hoisting allocations out of loops in the upcoming v1.9 that will start to make "bad codes" a lot less bad. In fact, it's already starting to impact how to write tutorials on what is a bad code haha.

bernulli · on March 26, 2022

Its third, and major, drawback is that Python has a legacy in many groups.

chalst · on March 27, 2022

True, although the Python-Julia interop is surprisingly good. Gradual migration of legacy code from Python to Julia might be a possibility for some of these groups. But I was really thinking about the situation for new projects when writing that comment.

bernulli · on March 27, 2022

Same thing, you will want to reuse existing code, and you do not want to split knowledge in your student group. There is no such thing as 'gradual' migration: I either have to support Python in my group, or Python and Julia -- until everything is migrated.

If I have a working ecosystem using Python, with students trained in Python, and all previous work in Python, there's a whole lot of opportunity cost associated with me deciding to have the next student use Julia. I'd rather have that student build on existing tools and knowledge and do something new with their time.

gaze · on March 27, 2022

This article has been written a hundred times. "We abandoned a fast language for a language that is slow but can use fast libraries, and so the result is fast. It's faster because the programmer discovered existing libraries that do a better job of what they were doing already."

There's so many convolved factors here I don't even know where to begin, so I guess I'll just say that I'm glad Julia exists. The author glosses over many decades of programming language and compiler research -- which makes sense, because this is not their specialty. However, what I see is the field of scientific computing migrating from a dinosaur language (Fortran isn't, actually. It just is used this way) and dinosaur practices of writing everything oneself, to one of the slowest interpreted languages that happens to be the most difficult to JIT, and saying this or that about how interpreted languages are slow but library calls are fast to justify this. At the same time they're learning to build a functioning library ecosystem.

Basically, grad students are learning proper programming practices and collaboration after switching to a more expressive language, they just managed to pick the slowest and most difficult to optimize one. Maybe they just managed to wipe some of the slate clean by switching away from Fortran and its culture (the culture being the bad part), and the culture of Python filled the space, creating a net positive but somewhat unfortunate situation.

Just one more time -- the idea that you can call a "faster" language to do the heavy lifting is true of every language and does not justify the choice of Python in particular. The justification for Python is the momentum, and this is in my opinion the only one.

fastball · on March 27, 2022

Your conclusion seems a bit reductive/circular and begs the question "yes but why does Python have momentum?"

Python is/was chosen because the syntax is clean and expressive and obvious if you speak English (which most people in this context do) and because although performance is almost always worse in interpreted languages, there are clear productivity benefits when doing the kind of programming that is demanded from data science / data processing / etc. Same for dynamic vs static typing and a number of other choices made by Python.

Specifically, many (most?) programs are not long-term maintained projects in this space. A lot of them are just little scripts to convert one dataset into another format, or scrape specific content from somewhere, or support a scientific paper that will not get updated after publication.

whatever1 · on March 26, 2022

Julia is the next big thing. I am always blown away by its readability and speed.

But it will take years to build a library ecosystem that can rival the python one.

pwnna · on March 26, 2022

Python is sufficiently readable, and with the right extension, it is sufficiently fast for vast majority of the purposes. For Julia to truly gain momentum, I think it needs a "killer app/library". However, I'm not sure what it would be that would not already be built for Python.

My personal killer app would be a significantly revamped plotting library/app. While matplotlib is great, it is fundamentally based on imaged-based plotting. The next generation of data visualization, imo, will likely be interactive. Having an interactive plotting library that allows you to produce publication-quality plots faster and simpler (think of all the time spent aligning text manually..) could be a big deal, but it could also not matter as no one else wants the same things I do.

hpcjoe · on March 26, 2022

Have a look at Makie.jl[1] in Julia. I've been using it for exploring large data sets recently. Ticks your boxes. Jupyter version is image based though, as Jupyter is inherently static. You could use Pluto.jl[2] to build a reactive page.

[1] https://github.com/JuliaPlots/Makie.jl

[2] https://github.com)fonsp/Pluto.jl

pwnna · on April 2, 2022

Thanks for the link. Makie.jl looks interesting. I didn't find it last time I looked into Julia. I'll get it a shot at some point to see how usable it is.

nicolaskruchten · on March 26, 2022

What are your thoughts on PlotlyJS.jl? https://plotly.com/julia/

II2II · on March 26, 2022

That is a big part of the author's point: the library ecosystem is here for Python today. While there is a heavy penalty for anything written in Python itself, it doesn't really matter since there isn't much of a penalty once the data is passed to highly optimized libraries and those libraries allow developers to select efficient algorithms rather than implementing their own algorithms (which are likely to be less efficient).

a9h74j · on March 26, 2022

In this thread I am seeing a number of explanations, including:

Ecosystem; mind-share; readability and engineering mind-set; history/Numpy/Matlab; teachability and academic focus.

There are also comments emphasizing the "dynamic" scientific environment and need to just pick up code left by others.

In terms of the latter, could one apparent requirement be this: The main contact should be with top-level code which at least looks like it is interpreted -- even if through compile-with-run-combined and/or memoization? Need part of the user interface, so to speak, be to hide all intermediate artifacts, even the very thought of object code and executables? That such stuff is for, say, "module creators" not primary users?

bee_rider · on March 26, 2022

Do you think Julia will chip away at Python's marketshare, or Fortran's? I thought it was aiming to be more of a replacement for the latter, but I've never written a line of Julia in my life, so I am very uninformed.

adgjlsfhk1 · on March 26, 2022

Imo, it eats away at both. Julia makes it relatively easy to meet or exceed Fortran performance, but also gives you the high level abstractions and ease of use of a language like python. I think the biggest problem for Julia currently is the difficulty of AOT compilation and the lack of tiered compilation (like Java/Javscript). Making the story for either of these better would be a significant quality of life improvement for Julia, and would make it pretty much unrivaled for scientific computing in my opinion.

Bromeo · on March 26, 2022

Julia advertises itself as solving the "two-language problem". This assumes that people first write exploratory code in python or something similar, and then rewrite it in Fortran etc. So in this scenario, Julia takes marketshare from both.

Personally, I find that many Fortran codes are still used because they have been build for many years, and they can't be rewritten easily. On the other hand, new data science projects start all the time, and the transition to Julia is easy (and worth it in my opinion). That means that in my experience, Julia is mostly competing for marketshare with NumPy/SciPy/SKLearn/Pandas/R/Matlab.

a9h74j · on March 26, 2022

Remind me if anyone might, is there any story for Julia making use of existing python libraries?

sundarurfriend · on March 26, 2022

If you mean whether it's possible, PyCall.jl has existed since nearly the beginning of Julia, and PythonCall.jl [1] is a more recent package for the same core functionality - calling into Python code.

[1] https://github.com/cjdoris/PythonCall.jl

dekhn · on March 26, 2022

Counter-intuitive? I picked it because it was the closest scripting language to C (see the select and socket APIs for good examples). And it had numeric array support early-on (making it an attractive replacement for matlab).

uoaei · on March 26, 2022

Python is an API to efficient scientific computing code. It's good for that, assuming you're using old and more verbose languages.

Look into Julia as a promising alternative -- the language itself is superbly fast (aside from initial compilation) and there's an impressive scicomp ecosystem to say the least, all written in native Julia. This allows for program rewriting / metaprogramming more broadly and is insanely powerful once you get a feel for it.

laichzeit0 · on March 27, 2022

No need for any alternatives. Python as an API is good enough.

uoaei · on March 27, 2022

It is really remarkable how much more expressive some languages are over others. If you are satisfied with Python for everything you do, then you are not hitting the limits of its expressiveness. But for more naturally expressive code, other languages may have huge advantages for certain applications.

fancyfredbot · on March 26, 2022

I feel like python acts like a kind of bus in scientific computing, connecting various high performance libraries and DSLs together.

That said, this article's story of someone using the wrong algorithm is a bad example in my view. Python hasn't succeeded because people are more likely to use more efficient algorithms due to easier experimentation, it has succeeded because the of the size of the ecosystem and the fact such algorithms are easily available.

jackjackk0 · on March 26, 2022

I recommend one of the recent videos by Dave Beazly [1]. He lived through and contributed to the raise of Python in scientific computing first hand in the 90s, and offers some interesting insights. Plus he's always quite an entertainer.

[1] https://youtu.be/4RSht_aV7AU

bernulli · on March 26, 2022

For those unfamiliar, CERFACS (Centre Européen de Recherche et de Formation Avancée en Calcul Scientifique, i.e. European center for research and advanced training in scientific computing) is a leading research institution, with two main branches: meteorology, and engineering computational fluid dynamics. I am not affiliated and can only evaluate the engineering part, their combustion modeling group is one of the best in the world.

Sugimot0 · on March 27, 2022

I feel like nim[0] should replace python as it matures, but I would be curious to hear others perspectives, since mine is mainly based on reading.

[0]: https://nim-lang.org/

sytelus · on March 27, 2022

A lot of thanks should go to Oracle. Back in the days Java was go-to language for everything. After Oracle acquired it in 2009, the only respectable languages with good numerical libraries were Python, Julia and R. Unfortunately, Julia’s marketing wasn’t strong enough and R was decisively an ugly thing to work with.

wheelerof4te · on March 26, 2022

One thing that I don't like with Python's scientific libraries is how they change the overall Python syntax.

There are so many ways to slice an array or a dataframe, and only a few of them are valid Python code.

Keeping the language API should have been a priority, but that is a consequence of operator overloading features.

kzrdude · on March 26, 2022

Can you explain what you mean more in detail? Libraries can't change the syntax of the Python language, not in the formal sense.

Is this about things you want to be able to express in syntax but can't? Or the other way around - stuff that uses syntax/operators but should really be methods?

ohyoutravel · on March 26, 2022

Numpy syntax comes to mind. The extra commas often aren’t valid pure Python but are required for some operations on numpy arrays. I don’t know how this works under the hood, but expect it’s a state machine under the numpy ndarray looking for the extra commas and such.

i.e. some_array[0:5,0] which isn’t valid pure Python notation.

kzrdude · on March 26, 2022

Extra commas are "valid in pure python" in the following sense that I can demonstrate.

Open ipython3

    In [3]: class Test:
       ...:     def __getitem__(self, index):
       ...:         print(index)
       ...: 

    In [4]: Test()[1, 2, 1:3, ..., :]
    (1, 2, slice(1, 3, None), Ellipsis, slice(None, None, None))

It's valid and we get the complicated tuple of integers, slices, ellipsis etc as printed.

Numpy has existed for a long time. Its needs have been taken care of in upstream Python, to a big extent, and other libraries can use the same features.

ohyoutravel · on March 26, 2022

Interesting! Neither myself nor my coworkers could get the snippet I posted working outside the context of an ndarray, so I had speculated at that time that it there was something else going on under the hood.

You seem to have a much better grasp of Python than us, would you mind posting an example where the snipped I posted successfully accesses data from an array in pure Python? That way I can not only take the L, but correct the record and learn something in the process.

kzrdude · on March 26, 2022

This program is quick & lazy but it uses a 1D python list and pretends it's a 2D list. It implements 2D slicing, giving you a square subset just like ndarray. It doesn't intend to be all correct or nice or useful.

https://www.pythonmorsels.com/p/2rk5t/

Is laziness a virtue? I reused the slicing implementation in `range(foo)[index]` so I didn't need to do that logic myself.

ohyoutravel · on March 26, 2022

Very nice!

canjobear · on March 26, 2022

I'm not sure what you're referring to. Nothing you import into Python changes its syntax.

Maybe you're thinking of things like x[:, np.newaxis] where x is a numpy array? This is valid Python code outside of numpy as well, although the built-in data structures like lists and dicts won't know what to do with the :.

wheelerof4te · on March 27, 2022

To be precise, you could model the same behavior on your custom types by using the dunder method magic. In that case, everything is "valid" Python code.

Numpy and Pandas libraries have some non-standard ways to slice arrays, get the subarrays and the data from them.

lvass · on March 26, 2022

What language wouldn't suffer from this, besides APL? Even very recent and well designed libraries like Elixir's Nx look like another APL-like language bolted on. Pipe syntax helps but not much.

efxhoy · on March 26, 2022

I wrote scientific python for several years at a university research project, coming from a statistics background. I wrote a forecasting tool and related plotting, simulation, ML, evaluation etc tools.

The reasons for python’s success are obviously the ecosystem. Numpy is the foundation. On top we have sklearn, statsmodels, pandas, matplotlib. Before our project most work in the department was done in Stata, a proprietary language/tool that works well for some classical regression and stats work but falls apart as soon as things get complicated. Moving to python allowed us, a group of social scientists, to work on some really hard problems.

Now we have boosted tree models and other tools that just can’t be used in the old tools like Stata.

Python and R run the show in social science.

guidorice · on March 31, 2022

I am really curious how Zig lang eventually does in scientific computing. It's already speedy compiler, language server (zls), and upcoming hot code reloading feature, makes me think that reactive coding and visualization notebooks in Zig should be feasible. Although, Zig has no operator overloading, and no dynamic dispatch though, making it fundamentally pretty different than say, Julia lang. Just as an aside: for my day job, I write Python in a scientific computing (geospatial and ML).

teleforce · on March 27, 2022

I'm genuinely surprised that no one here is mentioning D language in addition to Nim or Julia for replacing Python. D has already beaten Fortran in speed more than 5 years back, the legendary scientific programming language that's mentioned in the article [1]. The Fortran based libraries that are overcome by the D language apparently are still being used by Python, Nim and Julia for most of their high speed processing until today. As they always said the proof is in the pudding, and compare to all alternative D language is designed to have a similar feel to Python. By default it supports GC for easier and manageable scientific programming that is very attractive for the type A data scientist that are mainly deals with analysis and exploratory programming [2]. The latest D language is now also natively support the C language (lingua franca of scientific programming) in its compiler thus can import and compile C files directly [3].

[1] Numeric age for D: Mir GLAS is faster than OpenBLAS and Eigen:

http://blog.mir.dlang.io/glas/benchmark/openblas/2016/09/23/...

[2] There are two types of data scientists — and two types of problems to solve:

https://medium.com/@jamesdensmore/there-are-two-types-of-dat...

[3] Adding ANSI C11 C compiler to D so it can import and compile C files directly:

https://news.ycombinator.com/item?id=27102584

smitty1e · on March 26, 2022

> Of course, If the best algorithm is known beforehand or the manpower is not a problem, a lower level-language is probably faster, but this is seldom the case in real life.

One is wary of one-dimensional analysis of anything in a software context.

Who cares if the Fortran library runs like the blue blaze, if it cannot be readily maintained?

Bostonian · on March 26, 2022

It is possible to write maintainable modern Fortran without gotos with small functions and subroutines. OOP with inheritance and dynamic polymorphism is possible since the Fortran 2003 standard.

oh_my_goodness · on March 26, 2022

This article expresses the ancient Python(/Matlab) v Fortran argument beautifully ... but it's kind of shocking that the argument is still going on at all. My generation came out of school happy to use FORTRAN indirectly, via a scripting language, for rapid prototyping. That was 30 years ago.

keskival · on March 26, 2022

I don't think Python displaced Fortran in HPC as much as it displaced Matlab (and Octave) and R in scientific computing.

Displacing Fortran was a side-effect of that trend, as now it wasn't about productionizing Matlab code into Fortran, but Python could do general purpose computing adequately as well.

hulitu · on March 26, 2022

I try to love micropython. However, its UI is at ed level. It only says "Syntax error".

the__alchemist · on March 26, 2022

Python excels in several domains. For example, the non-speed-critical numerical computing this article is about. It's also nice for backend web development, and scripting. Embedded isn't one of its strengths, and I'm suspicious micropython was an attempt at bringing embedded programming to people who don't want to learn more than one language.

musicale · on March 27, 2022

I gave up Matlab and never looked back.

As the article notes, various numerical kernels have been wrapped as Python compiled modules/libraries, and numpy and other systems seem to work OK for many applications.

jurschreuder · on March 27, 2022

People always give the argument that python calls c++ libraries, but I use both Python and c++ a lot, and writing c++ directly, calling the the same libraries, is way faster.

ctchocula · on March 27, 2022

I suspect the reason people claim that is they are training ML models, which may take O(hours ~ days) to run anyways. In this usecase, C++ calling C++ is faster than Python calling C++ slightly in O(seconds) is outweighed by the fact that Python is more convenient for the ML practitioner trying different models.

nanochad · on March 26, 2022

Python is what has been popular for the last 15 years. Scientists are not programing language geeks, they just use whatever is popular, viable, and established.

amelius · on March 26, 2022

New languages should always provide bindings to call into Python modules, so you get the immediate benefit of the largest ecosystem on the planet.

StreamBright · on March 26, 2022

Python is the common scripting language of C, C++, Fortran.

jrm4 · on March 26, 2022

Yet another hardcore programmy type discovers that usability is infinity more important that how many clock-cycles you save.

Programming languages aren't for computers, they're for people.

dang · on March 26, 2022

"Don't be snarky."

"Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."

https://news.ycombinator.com/newsguidelines.html

jrm4 · on March 27, 2022

Harsh isn't the same as shallow. I don't believe snark is ALWAYS uncalled for, only that you deploy it when it's reasonably necessary. And this is a thing that people, especially HERE, get wrong a LOT. That it's unpleasant or hits people where it hurts doesn't mean it's not important, and I think it is.

dang · on March 27, 2022

The issue is the effects that it has on the ecosystem. It dumbs down threads and makes people nastier.

If someone is wrong and you know more than they do, you can share correct information without putting them down.

matsemann · on March 26, 2022

I wonder how many days have been wasted on non-programmers trying to get their Conda environment up and running or similar. Half the data science stuff isn't reproducible, not because of the science, but because getting the notebooks running with its dependencies is almost impossible.

nerdponx · on March 26, 2022

I think a lot of this has to do with just how bad/incomplete the docs are, how unnecessarily janky the shell integration is, and how the Anaconda launcher itself makes a huge mess and actively works against best practices.

The docs for building your own packages are even worse, to the point where you basically are left copying snippets from Conda Forge to build anything nontrivial.

Basically Conda is a tremendous engineering achievement, but it's very much still a "first draft" in a lot of ways, and Continuum/Anaconda made some weird decisions that work against its user-friendliness. Imagine for example if third-party repos on anaconda.org could have a description box, link to a homepage, etc...

belval · on March 26, 2022

> non-programmers trying to get their Conda environment up and running

I see this issue brought up a lot, but I have yet to see a language that addresses this reliably. By definition setting up an environment for non-programmer is a tall order, what language should they use?

matsemann · on March 26, 2022

I'm just grumbling because even I as a professional dev can sometimes spend days getting some python project up and running correctly. Then I feel sorry for non-devs for which all this is only a tool.