Hacker News new | past | comments | ask | show | jobs | submit login
CPython internals: A ten-hour codewalk through the Python interpreter (2015) (youtube.com)
338 points by melqdusy on March 9, 2017 | hide | past | favorite | 51 comments



I put together a ebook on the internals of the python interpreter. Get it for free at https://leanpub.com/insidethepythonvirtualmachine


This is awesome - I wish every large software project had something like this that was a prep-course to be able to start contributing meaningfully!


Not a prep course, but for example Redis has a very good source code overview here https://github.com/antirez/redis#redis-internals

More remarkable is the fact that antirez updated the documentation in response to a post in Reddit. https://www.reddit.com/r/redis/comments/3re0aw/any_pointers_... Thank you antirez! :-)


Also check out libpng's source.

https://github.com/glennrp/libpng


Oh wow. That's beautifully done. Simple comments that explain clearly what the code is doing, pretty clear choice of variable names so little head-scratching going on.


The documentation of redis is really good for a large open source project. I am not a contributor, but still read the source code from time to time. Full credit to antirez for taking the time to make it easy to contribute to redis!


Shameless plug here but I have put together a free ebook detailing the inner workings of the python virtual machine @ https://leanpub.com/insidethepythonvirtualmachine


I agree - this basically gives you enough information to bootstrap your own learning about the CPython internals. I feel like all companies/projects should have a similar intro which gives new-joiners enough information that they can figure out most things themselves without too much pain, and without spoon-feeding too much.


I feel like I should understand this but I don't: What names are looked up by name vs. by number in CPython?

That is, I think local variables and constants are looked up by a small integer which the CPython compiler produces by stack analysis.

But any globals must be looked up by name: functions, classes, modules, global variables. And methods on classes, attributes on classes.

I'd be interested to get clarity on that, and any pointers to relevant code/docs. Is this addressed in the videos? I have looked through the CPython source a lot, and even patched it, but the lookups are a little hard to follow. I've played with the "dis" module and code objects.

EDIT: Answering my own question, it seems like I was confused about the index into co_names, which is a small integer into a list of strings, and then the lookup of that string. So it's a 2-step process?


You can find out by using dis and checking what is load_fast, load_const or load_global. Attribute lookups are always just that as far as I know. The dot operator has a bunch of paths.


Yes thanks, I think that is right... LOAD_FAST and LOAD_CONST are by number, and used for local variables and constants.

LOAD_NAME, LOAD_ATTR, and LOAD_GLOBAL are all lookups by name, and are used for everything else: globals, object attributes and methods, modules, etc.

It seems that if Python had a static module system, all the lookups by name could be compiled down into lookups by number.

https://docs.python.org/3/library/dis.html#python-bytecode-i...


I'm not clear on the status of PEP 509, but it could/should make LOAD_NAME and LOAD_GLOBAL approach the speed of LOAD_FAST. It'll set a flag on the globals dict (or any dict?) that trips when the dict is mutated. Non-mutated dicts can have fast repeated lookups.

https://www.python.org/dev/peps/pep-0509/

Dictionaries got some sweet upgrades for v3.6.



Do you need an understanding of compilers to go through this? What are the prerequisites?


No. They skip the python to bytecode compiler and go straight to the interpreter and runtime. More or less. You should know C.


Curious: At any point, is it explained why the Global Interpreter Lock is necessary? If so, I'll spend the time to watch.


It's not necessary, it was a design choice that made sense back in the 90ies. In a multithreaded environment you can lock at a fine grained level or on a coarse grained level - or you can crash, but let's ignore that as an option. Python chose coarse grained, giving up parallel interpreter computations, but gaining a lot of thread sync overhead. All attempts so far to remove the GIL have resulted in a (usually much) slower interpreter, but the latest attempt shows some promise and it's thinkable (but not guaranteed) that in a few years there will be an official GIL-less cPython.


Removing Python's GIL will never make much sense. Not today and not in future. If you need CPU-fast code and would bother to multi-thread, it's much more worth it to write the code in Cython.

If your code is CPU bound and you're using native Python, you're going to be making a tonne of heap allocations and pointer dereferences. This will be very slow.

If you implement the relevant stuff in Cython, even without using multi-threading you'll likely see 10x performance improvement, and can often see up to 100x.

Removing the GIL makes Python worse at the stuff it's good at, for questionable improvements in the areas Python is really terrible. This is not a good trade.


What if you were trying to thread an non-processor bound task?


Do you mean waiting for I/O? It's already possible to do I/O asynchronously or with separate locks. The interpreter lock only applies to the interpreter.


Do you happen to have any papers about the current efforts to remove the GIL?

I love Python, and use it a lot for ETL type work, but if threading worked well, I could/would possibly use it for far more purposes.


Larry Hastings - Removing Python's GIL: The Gilectomy - PyCon 2016 https://youtu.be/P3AyI_u66Bw

(I'm pretty sure this is the video I'm thinking of) It's 30m, but worth it if you're interested. Not sure what progress has been made since then.


http://pyparallel.org is one of the more interesting experiments currently going on in the GIL area . They're basically working on removing all the practical limitations of the GIL without actually removing the entire GIL.


PyParallel v1 was a nice checkpoint. I'm working on the next incarnation of it now.


Looking forward to it. PyParallel is one of the more exciting python implementations out there


Please remember that threads are not the only way. If you can simply break your function/routine into a smaller piece that is independent, you can easy get by with a fork. (well, unless you are on windows..)


Been a since I tried to use the multiprocessing module. But, last time I did try, I ran into issues with it interacting poorly with pyodbc. It's been years, so I don't recall what the problem was, but I spent a few days trying to resolve or work around the issue with no satisfaction.

Also, most of my Python scripts run on both Linux and Windows, so I have that restriction, as well.


Can you give an example where the GIL is really holding you back?

Because with multiprocessing and greenlets, 99.99% of concurrency problems are trivilially solved by current Cython.


actually GP, but it has held me back in the past.

I'm writing a transpiler that uses global information from codebases, and so it transpiles potentially hundreds of files at once and creates rather complex data structures. Compute bound for quite a while, so I tried speeding it up with multiprocessing (since multithreading would be useless). But with multiprocessing it took longer to serialize/deserialize the complex datastructures for each process, so I had to give up. Next time I have time for this I'd probably try to use Jython as a drop-in replacement and see whether I can get it to run with GIL-less multithreading.


It sounds like you have a couple of hot paths and are not optimizing them. I can't tell for sure without seeing any code but nothing in your post screams out "this will be slow" or "I need parallism/concurrency". Perhaps it's the data structures you are using?


I already did extensive profiling and performance improvements, at this point I'm quite sure that if I could do multithreading on my lab's 24 core Xeon Haswell machines I'd be getting a nice speedup.


Sounds like you might be iterating dictionaries. That's much faster in Python 3.6 due to the compaction of dict storage.


2 TB hash join


Isn't that up to the RDBMS whether than's multithreaded or not? Unless the RDBMS is implemented in Python, CPython doesn't force extension code to be single-threaded. Just Python bytecode.


That's pretty much the point, isn't it? If I need true multithreading, then I am forced to write an extension in C or offload the multithreading work to another process (such as RDBMS in your case). It would have been nice if true multithreading was possible in Python itself. It would immediately make Python more useful in a variety of scenarios where splitting the work into multiple processes is not optimal or more convoluted.


Sure, I guess I'm just used to writing my performance bottlenecks in a lower level language already, so I'm used to the GIL not actually being held most of the time in any intensive computation.

So if I want to call two Fourier transform functions at the same time in Python I can, because neither of them is implemented in Python and so they don't hold the GIL.

That's the kind of parallelism use case I most often see come up, so although the GIL dismayed me early on I've come to see it as pretty irrelevant.

But maybe it makes more sense for other applications, for the performance critical parts of the code to be actual pure Python. I do mostly numerical simulations, so pure Python is usually a non-starter, you fix that long before you think about parallelism.


If your answer to any GIL's is to write in a different language then I guess you don't have a problem.


It's not that I write the whole program in another language, it's that I either write the bottleneck in another language (usually Cython), or it turns out that the Python package I'm calling already has its bottlenecks written in another language, whether I wrote it or not.

Day to day, I'm writing Python code which is actually parallel because a large fraction of the run time is dominated by the by things that aren't pure Python. I suspect this is true even for people who are not going out of their way to make it true. It's simply the case that most RDBM systems, Fourier transforms, etc with Python bindings are not written in Python.

The GIL sounds scary, but I think people overestimate the fraction of time it is actually held in their code.


Exactly, see my own post a few branches up.


It still makes sense today, for the trade-off you described. Also because getting around the Gil is easy.


The first 5 minutes of this covers that in general: https://www.youtube.com/watch?v=P3AyI_u66Bw


In the first video he stats that every language have a compiler.

A interpreted language does not need to be compiled into bytecode. Some languages are compiled to bytecode some are interpreted as is.


It seems to be about Python 2. Too bad it's not about 3.


Don't worry about Python 2.x vs 3.x here. Under the hood there's not a great deal of difference in the areas that this course covers. The "dis" module is as useful as ever, all python objects will still be of type (PyObject *), the main execution loop is still there, the concept of frames is relevant still, etc.

The lectures are very interesting and if you have a spare evening it's possible to just blast through the first 3 or 4 without sweating too much.


I watched this series more than once, it has so much details. I believe python-3 is not complete rewrite of python-2. So there must be lot of common code between them. So its useful regardless of its python-2 series or not.


> I believe python-3 is not complete rewrite of python-2.

Python 3 is not even remotely close to a Python 2 rewrite. Much changed UI-wise, but the core is very similar if not identical.


This series of blog posts is also great:

https://tech.blog.aknin.name/tag/internals/

I spent a lot of time reading it while working on the mixed-mode Python debugger for Visual Studio (which, coincidentally, supports both Python 2 and Python 3 - they do really share a lot of things). Much of that work involved parsing and writing internal Python data structures directly, since Python interpreter may be unusable when the current instruction pointer is inside native code (GIL not held, various system locks held etc).


Between Python 2 and Python 3, what are are the differences in CPython?


that's a great question! i never did a diff of the source, but a good place to start is to diff ceval.c, which contains the main interpreter loop.


Dr. PG has a Youtube channel? I never knew.

This looks awesome!


Kinda painful to watch... thank god for the playback speed X 1.5




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: