Pystone isn't a performance benchmark, or at least it isn't a useful one. It's m...

toolslive · on June 26, 2013

http://hg.python.org/cpython/file/16fe29689f3f/Python/ceval....

looks like a switch to me.

Anyway, I've let them tell me the CPython interpreter is very simple on purpose to allow it to function as a standard 'definition' of the language behaviour. A simple jit does wonders, as does a less brain dead gc. Superinstructions, threading, ... are all possible. But you're absolutely right: It's really difficult to predict how much each improvement would contribute.

xenonflash · on June 27, 2013

Have a look at the lines starting at line 821 in the very file you referenced. I have quoted a bit of it here:

"Computed GOTOs, or the-optimization-commonly-but-improperly-known-as-"threaded code" using gcc's labels-as-values extension (...) At the time of this writing, the "threaded code" version is up to 15-20% faster than the normal "switch" version, depending on the compiler and the CPU architecture."

They also have an explanation of the branch prediction effect which I mentioned earlier.

They have both methods (switch and computed goto) since some compilers don't support computed gotos, and some people want to use alternative compilers (e.g. Microsoft VC).

In my own interpreter, I tried both switch and computed gotos, as well as another method called "replicated switch". I auto-generate the interpreter source code (using a simple script) so that I could change methods easily for comparison. In my own testing, computed gotos were about 50% faster than a simple switch, but keep in mind that is strictly doing numerical type code. More complex operations would water that down somewhat, as less of the execution time would be due to dispatch overhead.

Computed gotos aren't really any more complex than a switch once you understand the format, and as I said above you can convert between the two with a simple script. What does get complex is doing Python level static or run time code optimization to try to predict types or remove redundant operations from loops. CPython doesn't do that, while Pypy does this extensively. It's these types of compiler and run-time re-compile optimizations which make the big difference.

Overall, my interpreter is currently about 5.5 times faster than CPython with the specific simple benchmark program I tested. However, keep in mind it only does (and only ever will do) a narrow subset of the full Python language. Performance is never the result of a single technique. It's the result of many small improvements each of which address a specific problem.

toolslive · on June 27, 2013

so the conclusion really is: CPython is way slower than it should be. Question: if the subset is small, isn't it better to use something like 'shed skin' ? http://code.google.com/p/shedskin/

I once looked at it, and it does a fairly literal translation. The only problem is that it changes semantics of the primitive types. For example a python integer becomes a C++ int. (and overflow semantics change)