Hacker News new | past | comments | ask | show | jobs | submit login
The homogenization of scientific computing (2013) (talyarkoni.org)
71 points by stickhandle on May 11, 2014 | hide | past | favorite | 54 comments



I heard a saying at work: "Python is the second best language for anything." Scientific computing is a perfect example. By being close to the best, Python ends up taking over each sub-section of the SC world.


One of the best description for python that I've ever heard


Man, if Python is the second best in readability, I want to become best friends with number one. (and I don't mean ugly syntax like forwarded arguments that makes Python look like ass, I mean normal scripts that read like prose)


What would be number one... does Ruby have any claim?


So true. And being second best with a very friendly syntax often makes it a first choice.


> Scientific computing is a perfect example

Just out of curiosity, what's the best one?


The quote (which I really like, though I only use Python for a very small subset of things, generally quick throwaway scripts) doesn't have to imply that there is an agreed upon best.

Bob likes R. But he'll take Python second. Ted likes Matlab. But he'll take Python second. Etc.

The point is everyone agrees Python will do. Consensus plays in to the OP; even if Python isn't perfect for the domain, it's probably good enough, and the lingua franca.


So, in other words, it doesn't excel at anything in particular but is more of a jack of all trades. Got it. Thanks.


You know, the quote actually goes:

    Jack of all trades, master of none,
    Often times better than a master of one.


For raw speed at crunching through arrays (which is still vitally important in many areas of scientific computing), it's still FORTRAN. It also has very good (and well-tested) libraries for all sorts of mathy-sciency things.

A PhD friend of mine who does scientific computing works almost exclusively in FORTRAN.


I believe the comment was in reference to the supercomputing (HPC) world where Python is indeed very popular and has great libraries for doing large-scale scientific computing. You lose quite a bit of performance on the hardware but it makes up for it in speed of programming. For small-scale scientific computing other languages tend to be used.

The best language for large-scale scientific computing is C/C++, which surprises some people. Python binds to these libraries. C/C++ has two big advantages: it can run much faster than any other language and the language is better suited for designing massively parallel codes than most others. The latter point makes sense when you realize that HPC software only runs a single process per core and explicitly, adaptively schedules execution and messaging in order to optimize throughput. It is a bit like very old school single-threaded UNIX server programming.


For high-energy experimental physics (HEP or particle physics), most tend to use a CERN developed C++ framework called ROOT[1]. It's not overly pleasant, but it gets the job done.

There are Python bindings to ROOT (pyROOT) but I've found Python in my experience to be a bit too slow when handling the large (10TB+) datasets.

As an aside, it's interesting how ROOT attempts to provide C++ with some basic reflection[2] and saving of C++ objects to dis. Unfortunately it doesn't necessarily do a very good job of it, but perhaps things will change with ROOT6 as it transitions to being based on clang, as opposed to in-house C interpreter.

[1] http://root.cern.ch/ [2] http://root.cern.ch/drupal/content/reflex


R is by far the most commonly used for statistical sciences. As far as I know, the list continues as follows: Matlab & Labview for experimental physics. SPSS for social psychology. SAS for the pharmaceutical industry/medicine research.


It really depends on the application and area of study. There is no "one best" language for all of scientific computing. For some things R is the best, for others SPSS, and for some matlab. The larger the datasets and the more computationally intense the task, the more low-level languages like C++ or Fortran are best.

But across subfields and concentrations Python is almost always the second best language. Also Python kills for rapid prototyping and testing a concept on a small scale before the much more labor intensive porting to a lower-level language.


http://arstechnica.com/science/2014/05/scientific-computings... suggests its probably still Fortran. The new breed of contenders: Haskell (new?), Clojure, or Julia.


Depends entirely on what you are doing for. Basically for any give task you'll probably find a language better suited than python, but it will be a different language for each task. That is why python excels, no other language is "good enough" for so many different things.


R I thought.


If python is good enough to replace R or Matlab for you, then you are using a negligible fraction of what those platforms have to offer.

R is a lot like vim or javascript. It has a lot of warts, but it's an incredibly expressive toolkit for its task. I usually buy into a language once I find a few extremely gifted developers working with it (and who seem to do so voluntarily). For instance: for vim it's Tim Pope, for R, it's Hadley Wickham, for javascript it's Mike Bostock.

Python, despite its many good decisions, is likewise full of warts. So, who are good developers to follow in the python/numpy community?


> who are good developers to follow in the python/numpy community?

Off the top of my head: Travis Oliphant, creator of numpy and founder of two companies in the scientific Python space; Jake Vanderplas, enthusiastic developer and blogger and Fernando Perez, creator of IPython (disclaimer: I work for Fernando). In the broader Python world, Kenneth Reitz, the author of requests.


That's how idiomatic one-way-to-do-it generic languages slowly win. They raise the bar for the specialty languages so that, for example, after R the next statistical language really, really needs to shine in what it does in order to justify the trouble of learning another language over using a Python with a statistics library.


There's a catch though - once you start using numpy/scipy, there's no longer one-way-to-do-it and code becomes pretty messy.



Thanks. Burying as dupe.


I'm seeing this phenomenon in my own work. I do a fair amount of computational stuff (in the old sense of computing something, as opposed to just using a computer), and I find myself gravitating more & more to Python.

Performance is, of course, an issue. But in one case I noted that the week or so it took me to write and execute some Python code was almost certainly less time than it would have taken just to write the code in C/C++. I concluded that -- for this problem at least -- the extra performance I might have gotten out of C++ simply did not matter.

However, these days I do not do much with large scientific models or anything resembling big data. Performance is not the issue for me that it would be for others. And with articles having titles like "Why Python is Slow: Looking under the Hood"[1] out there, I find it difficult to see how Python can displace Fortran (or maybe C) in the realm of traditional supercomputing.

[1] https://jakevdp.github.io/blog/2014/05/09/why-python-is-slow...


remember there are ways to squeeze performance out of python, projects like cython for example. you can also make a python library in C... I don't know how practical it is though


If you have a C function, calling it from python is trivial. It even has great support for passing numpy arrays to functions that expect C arrays, and taking C arrays and treating the like numpy arrays.


> If you have a C function, calling it from python is trivial.

with this ?

https://docs.python.org/2/extending/extending.html


I use ctypes with numpy.ctypeslib. The end results aren't as robust as 'real' python extensions (you have keep track of types yourself and write your own python wrappers), but it is much quicker and easier to get started. You don't have to learn anything about the python API, your C code remains 'pure' since it doesn't have to know anything about python and there is almost no boilerplate. For simple one off bindings to C functions it's great, especially if the data you're passing back and forth are numpy arrays.

There is also CFFI, which I haven't really used, but lots of people seem to like it.


I think the author really needs to define scientific computing. Some people would think of that as simulations, clusters, astronomy, epidemiology and fluid dynamics. That maybe typified by a scripting language wrapper(inc Python) to C, C++ or even Fortran programs.

He himself seems to be talking more about standard stats and data analysis or prediction. In this field R is growing at least as fast as Python. R is the no1 Kaggle language and the no1 academic stats language, Python 2nd and stable there. The software carpentry movement to train scientitsts concentrates on both R and Python.

Then the biggest scientific programming field is bioinformatics - which really is a mish mash of Python, Perl, Java, C++ whatever piped CLI and a lot of R again. Here Python is growing mostly at the expense of Perl (there is a big Perl legacy) but as most software is designed for piping together in Bash scripts the diversity is not too big a deal. My own perusal of Job adverts on this area sees employers asking for "R Perl or Python" which are viewed as interchangeable.


But he also talks about document parsing and web dev. I think Python finds its edge when you have to combine things on the sciencey/mathy side of things with more user-facing software development. As the author said, it's nice not having to code switch between the a dozen best-in-field languages, when Python is highly capable across the board. Besides maybe C# and maybe Java, what other languages are as versatile?


True but have you seen most academic websites? They really aren't that into web dev. They are mainly concerned with static publication. But yes Python is growing in this area and generally as command line scripting glue. I'm not sure its got all our lunch yet though.


The easy objection is that none of what he's doing actually depends on Python; those libraries could have been as easily linked to Javascript, Ruby, Lua, Clojure, etc, the critical point: NumPy does not really take advantage of any specific feature of Python other than its popularity. Python isn't really taking over when most of the actual code that runs was written in C, but it's the best glue right now.

But really, just this second part is the point. Why wasn't this post written five years ago, when Python had already become a popular and established language? Well, if you look at it a little differently, it took roughly five years from the time Python took over undergrad CS courses to the time it became the lingua franca of scientific computing -- see a connection?


This is a partial example of worse-is-better. No one wants to have 10 languages in their toolbelt.

I personally find I use C# for almost everything. You can find good libraries for virtually everything you need. And it has the best tooling I've found. Is it actually the best language for everything? Absolutely not, but its close enough that I'm probably the most productive in virtually everything with it.


Python is such a ubiquitous language, even outside of scientific computing. It seems like mobile dev is the only area where its not present.


Python faces the same problem that Clojure does on mobile: start-up times are too slow and battery use is too high.

People have made valient efforts to overcome this issue in both Python and Clojure: Kivy, Py4A and Clojure on Android, in particular. But they still can't provide reasonable performance compared to native mobile frameworks.

It's just a product on Python implementation: too much complexity and too heavyweight objects (and almost everything is an object!).

To a certain extent, the same applies to Clojure -- there is a price to be paid for abstractions and (and in the case of Clojure) immutability, and it rears its ugly head the most when trying to fit these language implementations on resource-constrained mobile platforms.


The whole freesmartphone.org middleware was initially implemented in Python and there is bunch of Python apps in distros like SHR.

Python is great for mobile when it comes to hacking on-the-fly - I actually traced down and fixed few bugs in early Openmoko while getting bored in tram etc. thanks to the most useful parts being implemented in Python. Unfortunately, on Freerunner the difference in performance was very visible - however it shouldn't be as bad on more recent hardware.


The Kivy project seems to be making progress recently. However, I believe it still has a limited scope. It has poor support for native framework features/UI.


I bought an Android 3.x tablet in rapt anticipation of using the Android Python interpreter, but what killed me was a lack of packages that I could easily deploy and use, most notably Tkinter.


If I analyzed the toolset I used years ago vs. now, I'd conclude the world is moving from Python to Haskell :)


I wonder what it'll take for JavaScript to start taking over Python's niches. Is there a numeric library (like Scipy/pandas) hooked up to asm.js yet? I can imagine that being faster than Python.


Javascript will need to fix the mess that is its numeric types. As it currently is, you either need a bignum library with an ungodly syntax, or you are open to losing precision above a certain level, without warning. I tried solving Euler problems with JS, but it's so painful I switched back to Python after a few.


The value objects proposal for es6/es7 might fix that:

http://wiki.ecmascript.org/doku.php?id=strawman:value_object...


> Is there a numeric library (like Scipy/pandas) hooked up to asm.js yet? I can imagine that being faster than Python.

Faster than Python using LAPACK and other native libs?


CoffeeScript proved to a lot of people how much more enjoyable JS could be with some of the lessons in language ergonomics learned from Python and Ruby. I think that ES6 really reflects this. With some of the long awaited improvements, like better lambdas, generators, Set/Map, assignment sugar, better metaprogramming of objects, and standard-blessed implementations promises and classes, I think that JS is finally close to being able to compete in the multi-paradigm space. Sweet.js (or some other macro layer) will also really help in letting people develop new syntactic abstractions for domain-specific stuff. It'll be interesting to see how it all develops.


Python is good for scientific computing. But this is one person's experience. Users of other languages may sleep calm.


Isn't Golang slowly eating Python's stolen lunch though?


In scientific computing? No way. I'd expect something like Julia to overtake Python before golang does.


Not outside the HN bubble


I don't think so, but should be siphoning off people that use Cython.


::shudders::


I hate Cython that much. I mean, find it non-optimal in a disagreeable way.


What about Zope/Plone?

Not good enough, or any other reason?

I'm curious.


In the early 2000s, the Zope community felt that Zope 2.x had grown long in the tooth, so they decided to create a new, modern code base that they called "Zope 3," initially released in 2004.

Zope 3 was not backward compatible with Zope 2.x, nor with the impressive ecosystem of Zope 2.x plugins, and for years there was confusion about the direction, momentum, and relative importance of those two parallel projects. Zope 3 never gained any significant traction, because in adopting a "component architecture" they decided to use XML to connect those individual components, and it felt like you spent more time writing XML than Python. IMO this was a strategic mistake.

In 2005 the Django framework was released. In 2006, Ruby on Rails was released. After a couple years of confusion about the Zope roadmap, developers now had multiple options. You couldn't sell management on new Zope 2 apps, Zope 3 wasn't ready for prime time, and Rails was a significantly(!) more productive environment than Zope 2. (Django presumably is, too, but I have no Django experience and so cannot say.)

In 2010, the Zope community renamed "Zope 3" to "bluebream" to clarify their messaging, but that was after six years of ambiguity. Developers moved to other tools and frameworks, and Zope's developer community shrank until it no longer had a critical mass of developer interest.


Thank you for that history lesson!




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: