Mixpanel: Scaling to the Internet - Writing C extensions for Python

stavros · on Oct 1, 2010

This is needlessly complicated. You can just use Cython and annotate the types, or, even better, use shedskin to compile your Python code, usually with no changes, to C++ and make a module out of it.

There are many options to speed up your Python code by ~50 times effortlessly. See my blog posts:

http://www.korokithakis.net/node/109

http://www.korokithakis.net/node/117

samuel · on Oct 1, 2010

I don't understand why stock Python doesn't include some sort of optional type annotation, as SBCL and other Lisps. Some years ago there was a lot of fuss in the Python community about including optional type checking in Python 3... what happened? At the end, I found most changes from version 2 cosmetic and not really exciting.

illumen · on Oct 1, 2010

There are now annotations. But they are not used for anything by default.

stavros · on Oct 1, 2010

I can't either :/ It would be fantastic to be able to annotate your code a bit with types and speed it up many times. Hopefully, with Python 3, many tools (e.g. Cython) will be able to do just that.

hartror · on Oct 1, 2010

These are great when you're worried about performant code:

http://docs.python.org/library/profile.html http://wiki.python.org/moin/PythonSpeed/PerformanceTips

stavros · on Oct 1, 2010

Yep, they won't speed your code up 50 times, though.

hartror · on Oct 1, 2010

I am yet to be sold on using Cython or it's ilk but it is philosophical/personal preference thing rather than a commentary on the quality or usefulness of these projects. As I have no trouble switching code to C++ and integrating it with python these projects don't offer me the same thing as a coder with less C++ experience.

As an aside some simple coding changes will allow Psyco to do much of the by hand optimisations that Cython gives you. Not all and it is easy for another coder to come along and ruin everything. All it takes is some understanding of how Psyco performs its magic.

cdavid · on Oct 1, 2010

Psyco does not work on 64 bits, which makes it nearly useless for most servers-oriented kind of work

hugh4life · on Oct 1, 2010

Keep an eye out for Nuitka...

http://kayhayen24x7.homelinux.org/blog/2010/09/nuitka-releas...

stavros · on Oct 1, 2010

I saw that, it looks quite good but I'm not sure how much better than Unladen Swallow it can get... RPython is another promising contender, if the PyPy guys ever polish it for end users...

kqueue · on Oct 1, 2010

>So imagine: You want to stick to Python because it’s so fast to develop in but need the performance of C/C++. Let me introduce you to C extensions in Python.

Let me introduce you to Cython.

j_baker · on Oct 1, 2010

Does anybody write C extensions like this anymore? Most of the time it's easier to use swig or cython. If it's really simple, you can also use ctypes.

jnoller · on Oct 1, 2010

Actually, a lot of people do write them this way. When you take something you can write quickly and with no dependencies other than python, and instead pull in swig or cython - you now have two problems instead of one.

As the author shows, writing a python c extension is fundamentally easy which is why Python has such a huge ecosystem of them to begin with.

cdavid · on Oct 1, 2010

Writing C extensions is only easy if you do trivial things. As soon as you do something significant, writing all this C code by hand becomes increasingly complex, especially because of reference counting and error management (http://docs.python.org/release/2.5.2/ext/thinIce.html).

If dependency for distribution is an issue, you can just include the generated C code (as we do in numpy and scipy, as a matter of fact). Also, cython can generate both python 2 and 3 compatible code.

mturmon · on Oct 1, 2010

It depends on what kind of "trivial". I have a lot (> 30000 SLOC; hundreds of functions) of C code that does intricate math/stats/optimization, but from a control flow and memory allocation point of view is simple. This kind of code is perfect for the simple c-extension API of Python as described in the OP.

In fact, if you set up the gateway interface to your C code cleanly, you can have the same C code callable directly from (e.g.) Matlab via the "mex" mechanism.

cdavid · on Oct 1, 2010

Yep, in that case, that's relatively easy - but cython is even better. Cython is used a lot in the scipy community - if you don't know it, you may want to look at it. Since I am using cython, I am trying to avoid writing raw C API as much as possible, there is just no point.

slug · on Oct 1, 2010

I wrote a few things in cython to give it a try and it's amazing what static typing can do to improve the efficiency of the code. Just don't look at the auto-generated .c file though, it looks scary, but this seems to be a common occurrence (swig comes to mind).

sfk · on Oct 1, 2010

SWIG has its own problems. Dependencies have already been mentioned, trusting obfuscated code is another issue:

http://www.artima.com/weblogs/viewpost.jsp?thread=95863

"But even if I find the cause of the leak, I may not be able to plug it. M2Crypto is written using SWIG, which contains lots of obfuscated code and doesn't exactly give me a warm fuzzy feeling that it's always doing the reference counting correctly."

I prefer a clean handwritten C extension any day. There is a lot of repetitive work involved, but most of that is really just copy&paste, so it tends to go quickly.

hartror · on Oct 1, 2010

My programming knowledge base is strongest with C++ and Python. I always start writing projects in Python. Then if psyco hasn't gotten me the performance in the (often very small) sections of code that do the heavy lifting a quick C++ extension is whipped up.

This method is so productive it always surprises me how quickly things get done!

stavros · on Oct 1, 2010

It's the best of both worlds, really. You write everything in a language that's fast to write, then you write the bottlenecks in a language that's fast to execute.

cdavid · on Oct 1, 2010

This works only if your bottlenecks are a small portion of your work. For example, if your twisted application is slow, you will have a hard time optimizing it by writing a bit of C code.

hartror · on Oct 1, 2010

Confused. Why would replacing larger chunks of code be a problem?

cdavid · on Oct 1, 2010

Because of the python <-> C marshalling cost. Let's say you have some code which uses a lot of small objects which interact with each other, maybe each object has a small list with a few integers. Replacing this in C would make sense because you can be much faster than python. But you still have the cost of interfacing with python (creating the objects, boxing integers back and forth, etc...).

That's why something like numpy manages to be so fast if used well: it internally uses efficient (C-like) representation, but as soon as you need to interact with every item in your array, then it is not only much slower, but even slower than using standard python containers because of this cost.

jrockway · on Oct 1, 2010

What language do you think CPython's internal data structures are implemented in?

I know when I write an XS extension in Perl, though, that getting the C struct associated with my Perl object requires one C function call -- no overhead at all. Passing an object to Perl involves allocating memory and adjusting the pointers. All very fast. If this sort of thing is your bottleneck, it's time to step back and rethink what you are trying to achieve.

cdavid · on Oct 1, 2010

Being implemented in C does not make things fast. Integers are implemented in C in python, but they are really slow, because of the boxing/unboxing thing (you need to chase one pointer pointer to get the actual C int from the Python object PyObject_Int, and that cost alone is high). Numpy is much faster for this kind of thing because it does not have this cost, and run the core loops + function calls in pure C, without going through the python runtime at all.

jrockway · on Oct 1, 2010

Yes, exactly. As you context switch between the high-level and low-level language, you sacrifice performance. If you want to write a fast extension, you don't do this; you let the high-level language call your pointer an object, and when you invoke a function, you invoke it on that pointer. This way, you don't marshal back and forth (except perhaps incidental parameters and results or exceptions).

cdavid · on Oct 1, 2010

What I am saying is that the whole concept of doing the heavy logic in the high level language and do the work in the C implementation cannot work when your logic depends on handling many objects.

For example, I wrote a few years ago a small audio app to track frequencies lines in audio. Each track was a list of positions (integers), generally a few tens items max, and I needed to handle millions of such lists. Now, just optimizing the track object to be in C is not that efficient because I needed to access the content of each list in the high level logic.

Basically, If your logic needs to be able to handle those millions small objects, you end up writing almost everything in the compiled language. Abstracting this becomes very difficult.

jrockway · on Oct 1, 2010

I see. Incidentally, Haskell has a nice solution to this problem:

http://hackage.haskell.org/packages/archive/vector/0.7/doc/h...

cdavid · on Oct 2, 2010

Oh, python has a nice solution in this problem space - NumPy. But again, this only works in some cases. Unfortunately, the model fails performance-wise once you use non native type - think array of arbitrary precision floats. Haskell being much more powerful and expressive, I would expect it to be able to express those concepts in a more elegant manner, but I could be wrong (I know next to nothing about haskell).

metamemetics · on Oct 1, 2010

Is there any scalable\cloud python hosting that allows C extensions? I know app engine doesn't unless they changed it recently. I'm guessing a VPS is necessary?

gpjt · on Oct 3, 2010

I think you can upload your own extensions to Picloud.

nphase · on Oct 1, 2010

Wow, this looks so much easier than writing a PHP extension!

slug · on Oct 1, 2010

Until you forget a decref/incref somewhere ;) Personally, I prefer to use either boost.python, cython, ctypes or swig. Although writing C routines to directly use numpy arrays is fairly easy.