Python is fast enough, until it isn't and then there are no simple alternatives.
If your problem is numerical in nature, you can call popular C modules (numpy, etc) or write your own.
If your functions and data are pickleable, you can use multiprocessing but run into Amdahl's Law.
Maybe you try Celery / Gearman introducing IO bottlenecks transferring data to workers.
Otherwise you might end up with PyPy (poor CPython extension module support) and still restricted by the GIL. Or you'll try Cython, a bastard of C and Python.
Python has been my primary language the past few years and it's great for exploratory coding, prototypes, or smaller projects. However it's starting to lose some of the charm. Julia is filling in as a great substitute for scientific coding in a single language stack, and Go / Rust / Haskell for the other stuff. I've switched back to the static language camp after working in a multi-MLOC Python codebase.
> Julia is filling in as a great substitute for scientific coding in a single language stack, and Go / Rust / Haskell for the other stuff.
I've been wondering about why so many python devs have migrated to using Go recently instead of Julia, given that Julia is a lot closer to python and has performed as good as, if not better than, Go in some benchmarks [1]. Granted I've really only toyed with Julia and Go a few times as I've never really needed the performance much myself, but I'm curious about your preference of Go/Rust over Julia for "the other stuff".
What would you say makes Julia less suitable (or Go more suitable) for nonscientific applications? Is it just the community/support aspect? Cause that seems like an easy tide to overturn by simply raising more awareness about it (we see Go/Rust/Haskell blog posts on the front page of HN every week, but not too many Julia posts).
Just curious cause I'm not nearly experienced enough with any of these young languages yet to know any better, and have only recently started to consider taking at least one of them up more seriously.
The sorts of Python developers who have migrated to Go are not the sorts of Python developers who use it for scientific purposes. Julia is rather targeted towards the scientific crowd in its marketing[0], so it's likely not even on most Python developers' radars. Marketing means a lot.
You can write Julia code in an IPython notebook, which is great. And (as I recently learnt) there's a PyCall package to easily call Python functions from within Julia, so you can take advantage of Python packages too.
I'm yet to try Go, but Julia is great for me: it's like all the good parts of Python + extra speed. You get the interactivity, the good documentation, the community, the packages, and all bundled into something that's easy to use and gives you code that runs many times faster than native Python with no real extra effort.
Static typing is a boon when refactoring large codebases, even with >90% test coverage.
I'm migrating an in house ORM to SQLAlchemy. Lack of compiler support and/or static code analysis makes the transition more difficult than it needs to be.
Dynamic typing allows one to defer error handling to the future, essentially creating technical debt for the sake of developer speed and convenience. For many use cases this is an acceptable trade off.
However as a codebase grows in complexity, it's better to handle errors as early as possible since the cost of fixing an error grows exponentially the farther it is from the developer (costs in ascending order: editor < compiler < testing < code review < production).
A very large Smalltalk application was developed at Cargill to support the operation of grain elevators and the associated commodity trading activities. The Smalltalk client application has 385 windows and over 5,000 classes. About 2,000 classes in this application interacted with an early (circa 1993) data access framework. The framework dynamically performed a mapping of object attributes to data table columns.
Analysis showed that although dynamic look up consumed 40% of the client execution time, it was unnecessary.
A new data layer interface was developed that required the business class to provide the object attribute to column mapping in an explicitly coded method. Testing showed that this interface was orders of magnitude faster. The issue was how to change the 2,100 business class users of the data layer.
A large application under development cannot freeze code while a transformation of an interface is constructed and tested. We had to construct and test the transformations in a parallel branch of the code repository from the main development stream. When the transformation was fully tested, then it was applied to the main code stream in a single operation.
Less than 35 bugs were found in the 17,100 changes. All of the bugs were quickly resolved in a three-week period.
If the changes were done manually we estimate that it would have taken 8,500 hours, compared with 235 hours to develop the transformation rules.
The task was completed in 3% of the expected time by using Rewrite Rules. This is an improvement by a factor of 36.
from “Transformation of an application data layer” Will Loew-Blosser OOPSLA 2002
As a Pythoner previously I have big interest in learning Julia and it indeed looks nice. But based on some small project I tried, it didn't show a significantly improved performance vs. Python. On the other hand, it doesn't have all the Python eco-system which is super nice for anybody who don't want to reinvent the wheel. Therefore, I don't have strong motivation to switch from Python to it for algorithm/learning related task.
Moreover, the benchmark you listed here are still mostly scientific computation related. Just from the benchmark, I'm not convinced enough that Julia beat Go from the perspective of a backend system, which is what I primarily use Go for.
Go and Julia really serves totally different purpose. Go want to replace Python in backend and the Python ecosystem matters little here. Julia want to replace Python in scientific computing and Python ecosystem matters a LOT here.
> As a Pythoner previously I have big interest in learning Julia and it indeed looks nice. But based on some small project I tried, it didn't show a significantly improved performance vs. Python.
Are you sure you were using the language properly?
Some constructs are still slow, and others prevent type inference, which can also slow things down...
- It doesn't have Google / Thompson / Rsc / etc behind it
- It looks more like Ruby than C (say what you will about C-like syntax, but I
think the success of Java and C++ has proved that point)
I can't find the source, but I recall Rob Pike talking about how the Go language and syntax are designed to "scale". Naturally, Mr. Pike would opt for a C style syntax, but with tools like go fmt and the package manager built in to the tooling from day 1, he kind of has a point.
I'd rather expand the numerics support/coverage in Haskell, get a nice and easy-breezy type system, nicer refactoring/static analysis, and a more powerful language.
If you think of CPU cycles as currency, "fast enough" shows its true colors: if you're profligate in your spending of CPU cycles, you simply don't have any left when you really so need them.
Living CPU paycheck to paycheck and on occasion taking performance payday loans (breaking out to C) is not an efficient way to manage resources.
Given the cost of CPU cycles vs Developers I don't think this analogy works.
Sure, you can hire an extra C++ developer for 150k or just give your python developer a company credit card to use for extra AWS machines. I'm quite sure the second option is a lot cheaper.
There are actually two common fallacies here; I'll try to handle them separately.
CPU Cycles vs Developers
This is a common false dichotomy: that expending more CPU cycles on the language runtime makes a language more efficient in terms of developer productivity than a language that expends fewer CPU cycles on its runtime.
The inherent truth of this statement isn't deductively obvious, however.
If you assume, for example, that dynamic languages are more productive for developers (I don't, but it's a common argument, so we'll go with it), then this is easily disproven simply by comparing the performance of a highly optimized JIT -- such as V8's -- against Python's interpreter.
Irrespective of the language, there exists differing levels of quality of runtime. V8 is faster than Python; this is simply because V8 is a better runtime JIT than Python is an interpreter.
If we discard the unproven "dynamic languages are more developer efficient" hypothesis, things become even more stark. The JVM, for example, with real threads, highly optimized JIT, and the advantages of operating on a much more well-typed system, is in fact faster than V8 -- all with a "managed" language, and not C++.
Taking that a step further, we've recently begun rediscovering ahead-of-time compilation of so called "managed" or "high-level" languages. The favoritism given towards JIT arose out of efforts to achieve high performance in dynamic languages were very little can be statically guaranteed. What has recently become clear is this: in more static languages, we can achieve the same level of "managed" runtime without introducing the overhead or complexity of JIT at all!
All combined, I see very little argument for a dichotomous choice between "inefficient, high level language" or "efficient, low level language" -- the choices seem to simply be "inefficient" vs "efficient".
Relative Value of High-Paid Developers
Lastly, I wish to address the "extra C++ developer for 150K". I'll keep this one brief -- simply put, I would hypothesize that a $150K expert-level engineer is worth anywhere from 2-10 $80K non-expert engineers.
This is simply due to an expert-level engineer's experience and deep knowledge of the technology stack allowing them to architect systems to achieve maximum maintenance, developer and system efficiency over time.
Computing is a value multiplier: the potential gains and losses of lower multipliers can be objectively enormous.
Having managed teams where I've inherited cheaper, more junior engineers, versus teams where I've hand-picked a small group of extremely experienced engineers, I've saved time, money, and headaches with the more expensive engineers every time.
Very interesting response. I strongly agree with most of it.
However, I think there's another false dichotomy there: a $150K expert-level engineer versus $80K non-expert engineers. In reality, there are only expert and non-expert engineers---pay doesn't seem to be a particularly discriminator. Further, in all respects it is difficult to tell the difference between an expert and a non-expert, hence all the fun interviewing follies. And even the best of those don't work particularly well.
Thanks for your reply; I concur with both of your assertions.
When it comes to differentiating expertise, we often have to look for secondary indicators; pay has an extremely poor correlation with competence, especially in high-demand job markets.
Besides Julia, I think another alternate language to Python for scientific computation would be Scala. Breeze (from ScalaNLP project) is an effort to bring Numpy and Matlab syntax to Scala: https://github.com/scalanlp/breeze/wiki/Breeze-Linear-Algebr...
This is actually not true. All Fortran routines are packaged with SciPy, as one of the explicit goals of NumPy is to be installable without a Fortran compiler.
Do you have any specific publicity available code which you claim is too slow in python? Otherwise, I don't believe you.
For example, hg is now faster than git. Git is written by great C hackers (including Linus no less), and yet hg is faster than git. See the facebook benchmarks for evidence.
Python has static checking of interfaces, and types if you want. It also has IDEs which can check a lot of things for you. It turns out that dynamic typed languages can be checked for quite a lot of things.
Check out using mmap based datastructures to do shared memory between workers.
'And yet, statically typed languages can still be checked for a lot more things.'
What is your evidence? Can you provide a citation?
Modern tools can statically check python for a lot of things.
Here are some things you can statically check with python:
* implements an interface
* unused names
* assigned but never used
There's hundreds more things you can check for with tools like pylint and some of the IDEs.
That 'implements an interface' one is important. Since with interfaces you can specify things like return types, and type arguments.
There are also @param markers in doc strings, where you can specify argument and return types. These can be used by tools like IDEs (eg pycharm) and other static checkers.
Full program type inference is now doable for large parts of python. See the shedskin, and Rpython restricted subsets for two implementations.
Using a Numpy-based program as an example of how Python can be fast is a bit strange.
It shows that Python can be fast enough if you leave the heavy computational parts to libraries written in other languages. Which is interesting, but doesn't say much about the speed of Python itself.
Yet, numpy exists for python, so why can't you use it in a comparison? Why can't you use libraries for languages? The python approach is all about gluing bits together, not staying within a pure walled garden.
Also, python is not an implementation of python.
These benchmarks are always funny, because real systems use different components, yet the benchmarks stick to some fake, non-real-world way of measuring.
Oh, the garbage collection in java pauses for multiple seconds soemtimes... but it's not slow because we ignore that in our benchmarks. Oh, it's not fast the first time, because the jit hasn't warmed up? Let's ignore that in our benchmarks too. Um... yeah. Good one.
This benchmark is also flawed to since people would probably use numexpr in the real world. Which is much faster than plain numpy. So python would be even faster than they say.
Mercurial(hg) is now faster than git. Git is written by great C hackers (including Linus no less), and yet hg is faster than git. See the facebook benchmarks for evidence.
Using the right tool for the job can mean using multiple languages together for where they are best. Want clarity and performance? Then C/asm + python is an ok combination.
Well, this particular example shows instead that the overhead of calling libraries written in other languages can be so large that a pure python solution can be faster. The current top answer to the question shows a pure python version that is about 20 times faster, commenters explain that the arrays are so small that the overhead in calling numpy outweighs any avantage numpy's speed gives.
Measuring implementations of the same algorithm is how you benchmark algorithms, not a a language as a whole. If a language or library allows for enhancements based on it's strengths (in this case, the ability to code in early exits), those are perfectly valid in benchmarking languages and libraries.
Put another way, one of the benefits of Python (and drawbacks of using an external library) is that you have more control over the algorithm and exactly what it does.
That said, a sample size of one will hardly give you an accurate picture.
Python itself includes all the options; PyPy, Cython, Psyco, Numpy and other C extestions, Jython and more. It's all Python. That is one one of the strengths of Python.
Would you exclude use of stdlib parts that are written in C? Would you say Javascript can't run serverside cause Node.js is not part of your imagined "core language"? Or it doesn't say much about the speed of C when you use a better/more optimized compiler or compiler flags?
It isn't strange, it's standard practice. What would be strange is to force Python to hold one hand behind its back and be used in a totally unrealistic way that doesn't reflect normal practice. And by strange I mean basically dishonest about the performance available when using Python.
Python apps CAN be fast. Python itself isn't. It's dishonest to do a NumPy benchmark to show how "Python" is fast if the article implies that all code written in pure Python will be as fast. Then newbies will come, try to write fast algorithms and find out, they have to rewrite them in a faster language to get that advertised performance.
If someone wants to do the kinds of computations done by NumPy, what's relevant is how fast that actual practice is. Not some stupid alternative implementation intended to show that Python is slow, but which nobody would ever actually use.
In other words, it's nonsense to discard Python with NumPy as an option for fast computations just because we can imagine an imaginary world where NumPy was written differently, purely to penalize this environment for having used more than one language.
To be honest, we should benchmark what we actually have, not imaginary made-up handicapped versions of the things we are judging.
You only need to know C to get fast Python code for a tiny minority of situations. For one, most interesting C wrappers have already been written and second, the majority of C code you will use in normal business software contexts isn't used by FFI.
For a normal web application the request flow goes something like: nginx (WSGI) -> uwsgi -> web application code -> psycopg2 driver -> postgres. Only the web application part is written in Python, so for practical purposes you actually have a C stack that uses Python for business logic. Loads of libraries, from json parsers to templating libraries include optional C code for speedups.
In the database? About 90% of bottlenecks boil down to something along the lines of "this query is slow" or "there are a lot of queries here". This has been true since the dotcom boom
The whole rise of memcached and NoSQL should pretty clearly indicate that many developers are finding their database to be the bottleneck.
There's much less of a push for high performance languages, even though there are many that are also quite nice to work with (eg, Clojure). Since this is a Python discussion, searching for "Django PyPy" and "Django NoSQL" should be instructive.
You're combining a false dichotomy with snark, which really shouldn't have a place here on HN.
Those are fairly uselss synthetic benchmarks. How many apps do you have in production that are purely key value stores with a random access profile?
And if you really have these performance characteristics (hint: it's unlikely), maybe it's time to get a edge caching CDN? Cutting 400ms of latency as you cross from Amsterdam to California is going to be the easiest performance boost you ever got
The point of these benchmarks is that, given the same database and the same queries, languages have different performance profiles.
If I build an application in Java and Python, with the same database backend, running the exact same set of SQL queries, the Java application should perform better.
If you look at those benchmarks, for the same db queries on the same db backend, throughput and latency can differ by an order of magnitude. We're talking the difference between 200 or 2000 requests per second, or 50ms vs 500ms latency. That's not a rounding error.
The ubiquitous implementation of Python is written in C. So, for example, the list-sorting algorithm is written in C. If you want to eliminate C from the equation completely, I guess we now have to rewrite the C sort algorithm to be slower to show off "how slow Python really is" in isolation from C. But this is a benchmark of a fantasy world. That Python isn't the real Python we use. In the real world Python is not in isolation from C, by design, and this works well.
No. The language in which the interpreter is written is not the point here. The point is the interpretation overhead of the best (or "only" if we're being pedantic since Python is de facto whatever CPython does) interpreter.
That overhead is the price you pay for using Python so it's what matters when you compare languages for CPU bound projects.
It's a little strange to insist that CPython is the only Python interpreter and then reject all the C libraries that you can access for performance (because those C libraries are the big loss when shifting from CPython 2.7 to Pypy).
I think interpreter overhead and language ecosystem performance are probably best looked at as separate questions.
Anecdotical datapoint: For a great many years I used to consider python a very slow language. Then I switched to Ruby and realized how slow a language can really be and yet still be practical for many use-cases.
I suppose it depends on how you define "that long ago" and "mainline implementations". From my point of view it's coming along to about half a decade, but I guess 1.8's remained popular in many places as recently as 2-3 years ago.
The difference does not primarily stem from raw interpreter performance but rather from the different community mindsets and resulting ecosystems.
The average code-quality in rubygems is just not very high. Consequently most libraries are completely oblivious to performance aspects.
This reaches straight into the core infrastructure (rubygems, bundler) and all major projects (Rails). Leading to the simple fact that Ruby loses out on many practical benchmarks before your script has even loaded.
Likewise the synergies of less than bright behaviors from all the gems in your average Rails project (and no least Rails itself) do indeed make the performance gap towards an average django project much larger than the mere difference in sheer interpreter performance.
That's all not meant to bash Ruby anyway. It's a trade-off me and many others are willing to make, for the convenience that ruby provides after it has finally set its fat belly into motion.
But let's not pretend these differences don't exist when everyone who has ever used both languages knows them all to well.
Obviously, this is a silly benchmark and we should stop giving it any credit.
However, even "real world" anecdotes in this area can be a minefield.
Take, for example, an existing Python application that's slow which requires a rewrite to fix fundamental architectural changes.
Because you feel you don't need necessarily need the flexibility of Python the second time around (as you've moved out of the experimental or exploratory phase of development), you decide to rewrite it in, say, Go, or D or $whatever.
The finished result turns out to be 100X faster—which is great!—but the danger is always there that you internalise or condense that as "lamby rewrote Python system X in Go and it was 100X faster!"
I spend a lot of time debating program speed (mostly C vs MATLAB), but the problem is that the programming and compile time usually makes more of a difference than people consider.
If my C is 1000x faster and saves me 60 seconds every time I run the program, but takes an extra 2 days to write initially, and the program is seeing lots of edits meaning that on average I have to wait 2 minutes for it to compile then I am MUCH better off with the slower MATLAB until I am running the same thing a few thousand times.
Plus there is the fact that I can look at HN while a slightly slower program is running, so I win both ways.
I think a lot of that delta is going to prove to have been an accident of history, though. In the past 10-15 years, we've had a lot of "dynamic" languages, which have hit a major speed limit (see another comment I made in this discussion about how languages really do seem to have implementation speed limits). Using a "dynamic" language from the 1990s has been easier than using gussied-up static 1970s tech for a quick prototype, but what if the real difference has more to do with the fact that the 1990s tech simply has more experience behind the design, rather than an inherent ease-of-use advantage?
It's not hard to imagine a world where you instead use Haskell, prototyping your code in GHCi or even just writing it in Haskell directly, pay a minimal speed penalty for development since you're not being forced to use a klunky type system, and get compiled speeds or even GPGPU execution straight out of the box. (And before anyone freaks out about Haskell, using it for numeric computations requires pretty much zero knowledge about anything exotic... it's pretty straightforward.) It's not out of the question that using Haskell in this way would prototype even faster than a dynamic language, because when it gives you a type error at compile time rather than at runtime, or worse, running a nonsense computation that you only discover afterwards was nonsense, you could save a lot of time.
I don't think there has to be an inherent penalty to develop with native-speed tech... I think it's just how history went.
> In the past 10-15 years, we've had a lot of "dynamic" languages, which have hit a major speed limit
Exactly. I think that is correlated very well with single core CPU speedups.
Remember when Python was rising the fastest, single core CPU speed was also pretty much doubling every year. SMP machines were exotic beasts for most developers back then.
So just waiting for 2-3 years you got very nice speedup and Python ran correspondingly faster (and fast enough!).
Then we started to see multiple cores, hyperthreads, and so on. That is when talk about the GIL started. Before that nobody cared about the GIL much. But at some point, it was all GIL,GIL,GIL.
> It's not hard to imagine a world where you instead use Haskell
Hmm interesting. I wonder if that approach is ever taken in a curriculum. Teach kids to start with Haskell. It would be interesting.
I share that theory. Part of what led me down this road was when I metaphorically looked around about two years ago and realized my code wasn't speeding up anymore. Prior to that I'd never deeply thought about the "language implementation speed is not language speed" dogma line, but just accepted the sophomoric party line.
"Hmm interesting. I wonder if that approach is ever taken in a curriculum. Teach kids to start with Haskell. It would be interesting."
To be clear, I was explicitly discussing the "heavy-duty numerical computation" case, where typing as strong as Haskell's isn't even that hard. Learn some magic incantations for loading and saving data, and it would be easy to concentrate on just the manipulations.
But yes, people have done this and anecdotally report significant success. The Google search "Haskell children" (no quotes in the real search) comes up with what I know about, so I'll include that in this post by reference. It provides support for the theory that Haskell is not that intrinsically hard, it's just so foreign to what people know. If you don't start out knowing anything, it's not that weird.
Makes sense if you are the only person running your programs (and you are allowed to ignore things like hardware and power costs).
Also, 2 minutes per change to compile the object files affected and link the executable seems a bit excessive considering the entire Linux kernel can generally be built from scratch in less time than that (assuming a modern system).
You didn't read the thread. The OPs code used very small arrays and using numpy was slowing the code down by an order of magnitude. The pure python solution is 17x faster.
It's really important to remember that the interface of a VM can be one of the slowest parts. When your LuaJIT code is making a ton of calls to tiny C functions, it's gaining hardly any benefit from the JIT.
Yeah, things like marshaling data across the interface barrier, or chasing the pointer indirections inherent in calling functions, can have a significant cost. Usually it isn't significant enough, but as always the devil is in the details.
Your other post [1] is marked as dead. The filters here will do that to duplicate posts (maybe within a time limit?). You may want to reply to that comment again with a rephrasing of this post rather than the same content.
NumPy contains no Fortran code, and can be compiled without a Fortran compiler. What NumPy does provide is f2py, which is used to compile SciPy, which does have Fortran code.
I have found on this page: http://docs.scipy.org/doc/numpy/user/install.html following quote: "Various NumPy modules use FORTRAN 77 libraries, so you’ll also need a FORTRAN 77 compiler installed." And, on the same page: "NumPy does not require any external linear algebra libraries to be installed. However, if these are available, NumPy’s setup script can detect them and use them for building. A number of different LAPACK library setups can be used, including optimized LAPACK libraries such as ATLAS, MKL or the Accelerate/vecLib framework on OS X." The linear algebra libraries mentioned are obviously Fortran libraries. So, it seems like I was wrong saying that Numpy modules are written in Fortran, but it's still not clear to me if only SciPy modules are the ones which call Fortran libs. I will delete my comment above, anyway.
LAPACK and BLAS have many implementations, some of which are written in Fortran, but as you've said external LAPACK and BLAS libraries are not necessary to successfully compile NumPy, though they do accelerate certain NumPy routines.
Why would you use Numpy for arrays that small? Oh, looks like someone actually just wrote it in CPython, no Numpy, and it clocked in at 0.283s. Which is fine. It's Python.
This thread reminds me of the scene in RoboCop where Peter Weller gets shot to pieces. Peter Weller is Python and the criminals are the other languages.
Judging by the top submission being also written in python, I think this just shows how unoptimized OP's original code was rather than how slow the language is.
Not that python is fast, it isn't. And using numpy seems a bit disingenuous anyways "Oh my python program is faster because I use a library that's 95% C"
The same author previously posted this code as a question on Stack Overflow: http://stackoverflow.com/questions/23295642/ (but we didn't speed it up nearly as much as the Code Golf champions).
This sort of thing comes up a lot: people write mathematical code which is gratuitously inefficient, very often simply because they use a lot of loops, repeated computations, and improper data structures. So pretty much the same as any other language, plus the extra subtlety of knowing how and why to use NumPy (as it turned out, this was not a good time for it, though that was not obvious).
You can make this far faster by changing the data representation. You can represent S as a bit string so that if the i'th bit is 0 then S[i] = 1 and if the i'th bit is 1 then S[i] = -1. Lets call that bit string A. You can represent F as two bit strings B,C. If the i'th bit in B is 0 then F[i] = 0. If the i'th bit of B is 1 then if the i'th bit of C is 0 then F[i] = 1 else F[i] = -1. Now the whole thing can be expressed as parity((A & B) ^ C). The parity of a bit string can be computed efficiently with bit twiddling as well. Now the entire computation is in registers, no arrays required. The random generation is also much simpler, since we only need to generate random bit strings B,C and this is already directly what random generators give us. I wouldn't be surprised if this is 1000x faster than his Python.
It's really fast to develop in, and with NumPy/Pandas/Scipy it runs numerical models fairly fast too. You do have to spend time getting to know `cProfile` and `pstats`; saved over 80% on runtime of something the other day.
I no longer accept the idea that languages don't have speeds. Languages place an upper bound on realistic speed. If this isn't true in theory, it certainly is true in practice. Python will forever be slower than C. If nothing else, any hypothetical Python implementation that blows the socks off of PyPy must still be executing code to verify that the fast paths are still valid and that nobody has added an unexpected method override to a particular object or something, which is an example of something in Python that makes it fundamentally slower than a language that does not permit that sort of thing.
The "misconception" may be the casual assumption that the runtimes we have today are necessarily the optimal runtimes, which is not generally true. But after the past 5-10 years, in which enormous amounts of effort have been poured into salvaging our "dynamic" language's (Python, JS, etc.) run speeds, which has pretty much resulted in them flatlining around ~5 times slower than C with what strikes me as little realistic prospect of getting much lower than that, it's really getting time to admit that language design decisions do in fact impact the ultimate speed a language will be capable of running at. (For an example in the opposite direction, see LuaJIT, a "dynamic" language that due to careful design can often run at near-C.)
(BTW, before someone jumps in, no, current Javascript VMs do NOT run at speeds comparable to C. This is a common misconception. On trivial code that manipulates numbers only you can get a particular benchmark to run at C speeds, but no current JS VM runs at C speeds in general, nor really comes even close. That's why we need asm.js... if JS VMs were already at C speeds you wouldn't be able to get such speed improvements from asm.js.)
This is also related to the "sufficiently smart compiler" argument - to have a Python compiler that can produce output equal to C, it would basically have to output nearly identical binaries to ones that an expert C programmer and compiler would produce. That looks to me to be an almost practically impossible task, since there are so many things you can do in C, which can improve performance drastically (e.g. avoiding dynamic allocation, pointer tricks, etc.) and simply can't be done in Python code, "Pythonic" or otherwise, and it would have to recognise and apply all of these rules. Even with C -> Asm which is a far more direct mapping, and on which a great deal of effort has been expended, the compilers still have nowhere near the intelligence to make use of the machine's instructions in a way an Asm programmer would.
It is always a matter of ROI, how far one is willing to invest, money and time, in a compiler/interpreter/JIT implementation for the use cases a language is targeted for.
As for the current state of native compilers for dynamic languages, they suffer from the fact that past the Smalltalk/Lisp Machines days, very little focus has given to them in the industry.
Hence little money for research, while optimizing compilers for static typed languages where getting improved.
Dylan was the last dynamic language with a AOT compiler targeted to the industry as system programming language, before being canceled by Apple.
If it wasn't for the JavaScript JIT wars, probably the situation of compilers for dynamic languages would be much worse.
Lisp Machines never had sophisticated compilers. The compilers for the Lisp Machines were primitive in their capabilities. The compiled to a mostly stack architecture, where some speed was recovered in hardware from generic instructions. The work on better compilers for Lisp came most from other places: CMU for their Unix-based CMUCL, Franz with Allegro CL, Harlequin with LispWorks, Lucid had a very good compiler with Lucid CL, Dylan at Apple for early ARM, CLICC in Germany, SBCL as a clean up of CMUCL, Scieneer CL as a multi-core version of CMUCL, mocl as a version of CLICC for iOS/Android, ... plus a few special purpose compilers...
Once you add sophisticated compilation, dynamic languages implementations are no longer 'dynamic'.
This topic is pretty much solved for Lisp. On one extreme we have fully dynamic interpreters + then dynamic AOT compiler based ones. For delivery there are static delivery modes available (for example with treeshakers as in LispWorks).
On the extreme side you get full program compilers like Stalin (for a subset of Scheme) or like mocl (a recent compiler for a static subset of Common Lisp, for iOS and Android).
"It is always a matter of ROI, how far one is willing to invest, money and time, in a compiler/interpreter/JIT implementation for the use cases a language is targeted for."
That's just a long way of agreeing with me. If one language can require a great deal more effort than another to run quickly (and I'm not assuming the slow language even makes it to the fast language's speed here), then it is therefore the case that languages designs do impact the experienced run time performance. That's the entire point.
And given where the plateau on the dynamic languages seems to be getting drawn, I see no reason to even hope that a dynamic language will ever run as fast as C in general. They are plateauing way too high for that to be the case, probably even with infinite investment on current hardware.
Go has an extremely fast compiler. If they ever add modules to C/C++ they should get a big bump in speed too. A lot can be done to fix the slow compile cycle of some languages.
Summary: Question asker wrote a program in Python using numpy (A Python library that calls C code) which could've been more performant if written in pure Python (something to do with array sizes being used) and Python in general is slower than C/C++/Fortran/Rust. Anything else new?
Yet another attempt at a comparison scuttled by using randomness.
Different things use different types of randomness. Some are fast. Some are slow. If your comparison is not using the same type of randomness, that comparison is comparatively useless.
It's not being scuttled by randomness, since the times are so wildly different; my measurements of my Rust code indicate "only" 20% of the time is being spend in the RNG ranging up to 35% if I use StdRng rather than XorShiftRng, this difference is peanuts compared to the 750× (not percent) speed-ups the Fortran/Rust/C/C++ sees over the original Python (and even compared to the 30× speed-up seen over the optimised Python).
OK, I retract the word "scuttled". But the comparison is still meaningfully damaged by it once you get to closer comparisons, say between two of the top performers.
If your problem is numerical in nature, you can call popular C modules (numpy, etc) or write your own.
If your functions and data are pickleable, you can use multiprocessing but run into Amdahl's Law.
Maybe you try Celery / Gearman introducing IO bottlenecks transferring data to workers.
Otherwise you might end up with PyPy (poor CPython extension module support) and still restricted by the GIL. Or you'll try Cython, a bastard of C and Python.
Python has been my primary language the past few years and it's great for exploratory coding, prototypes, or smaller projects. However it's starting to lose some of the charm. Julia is filling in as a great substitute for scientific coding in a single language stack, and Go / Rust / Haskell for the other stuff. I've switched back to the static language camp after working in a multi-MLOC Python codebase.