The Python GIL Visualized

jcl · on Jan 5, 2010

For readers in the future, here's a link to the article instead of the whole blog:

http://www.dabeaz.com/blog/2010/01/python-gil-visualized.htm...

rbanffy · on Jan 5, 2010

I found the post about block towers fascinating.

ash · on Jan 6, 2010

Yes!

http://www.dabeaz.com/blog/2009/11/fun-with-block-towers.htm...

pradocchia · on Jan 5, 2010

Indeed, I have some new ideas for playtime tonight.

gthank · on Jan 5, 2010

Thanks. I must have mangled the link during the submission process. Can someone fix it?

tel · on Jan 5, 2010

Though this has nothing to do with GIL or Python, the presentation reminds me of threadscope:

http://raintown.org/?page_id=132

Take a look at it in action

http://www.youtube.com/watch?v=qZXq8fxebKU

diN0bot · on Jan 5, 2010

anyone know of a tool that will generate charts like that when i run my program? perhaps by instrumenting my code?

i'd be very curious about this, but i wouldn't want to invest a lot of time into it. right now i use python for my web site serving (django).

dmaclay · on Jan 5, 2010

Just don't use threads in python, use multiprocessing!

thorax · on Jan 5, 2010

Sure, except when that doesn't make any sense.

I embed Python in my application. My highly graphical application requires XY,000 KB of memory. It's probably not wise to re-invoke my entire app Z times simply so that Python can properly take advantage of the user's multi-core OS while scripting inside my application.

Any time you embed Python (e.g. a game, a browser, etc), the multiprocessing support isn't going to work very well as it is today. Ideally we get to the point where threading works great in the core interpreter.

I asked Guido about this in front of his keynote audience a few Pycons ago, and his answer was a sad shrug and saying to use Jython or IronPython instead.

I'd love to see the GIL improved/removed because CPython-embedding apps don't have the same options as everyday pure scripting.

malkia · on Jan 5, 2010

In case with Lua - I can simply create new context, and voila a whole new lua vm is there. Not sure whether that could be done with embeddable python.

codexon · on Jan 5, 2010

Does Lua also have a GIL?

silentbicycle · on Jan 5, 2010

No, it doesn't. The whole interpreter was designed from the beginning to be embedded in another (typically C or C++) project, and they didn't want to have any concurrency infrastructure in the core language that would clash with the surrounding project's design. (It has excellent support for coroutines, though.)

Rather than having static variables and a GIL, the interpreter keeps all its state behind an opaque pointer to a VM context. The VM is quite lightweight (a few hundred K), and running in each thread or process is not a big deal. With forking, you even share the standard libraries. In practice, this just means that every call to the Lua C API starts with ("lua_State *L"). Not a big deal.

Lua also has real tail-calls, closures, and lambdas. It's a great language. Rather like a smaller, cleaner, less opinionated Python.

malkia · on Jan 5, 2010

Also lua right now has the one of the best JIT's out there (luajit). The guy behind the project is simply genius. Really good stuff. And also kept small. It's x2-x3 faster than V8 on specific benchmarks that I've did (mainly dealing with 3d model meshes and modifying them).

It's only for x86 right now, but he's working on x64 port.

silentbicycle · on Jan 6, 2010

LuaJIT is really impressive, but honestly, the stock interpreter is fast enough for me in general, and I'm fine with just moving hotspots to C as needed.

I'm more impressed with how concise Lua is. I've got the Lua 5.1 Reference Manual sitting in front of me. It's 103 pages for the whole language (roughly half of which is the C API), the entire syntax fits on page 95, etc. I can keep it all in my mental L1 cache, so to speak. Yet, it's expressive enough to have displaced Python as my go-to language. It's not perfect, but I'm very happy with the trade-offs it makes.

I'll be psyched once it's ported to a couple more hardware platforms, though.

codexon · on Jan 6, 2010

I just read the Lua manually, and it appears as though Lua has a GIL.

http://lua-users.org/wiki/ThreadsTutorial

ANSI C, the standard to which Lua complies, has no mechanism for managing multiple threads of execution.

The method above for implementing threading in Lua using a global mutex is inefficient on multiprocessor systems. As one thread holds the global mutex, other threads are waiting for it. Thus only one Lua thread may be running at a time, regardless of the number of processors in the system.

Am I misinterpreting what's being written here?

silentbicycle · on Jan 6, 2010

It's a bit confusing. Essentially, "the method above" (using multiple OS threads running on the same Lua VM) extends Lua, adding a GIL along the way.

Lua itself does not use pre-emptive threads (the kind where you need to worry about locks, race conditions, "thread safe" libraries, etc.). It has co-routines, co-operative threads, in which independent light processes run and yield to each other. Only one is running in the interpreter at a time, so locking and race conditions are much less an issue. If something needs to happen atomically, you just don't yield and resume in the middle of it. (Coroutines are as versatile as one-shot continuations, incidentally - each can be implemented in terms of the other.)

BUT, since Lua is designed for use in other projects, it may be running as a scripting engine inside of (say) a multi-threaded C++ game. In this case, Lua can have multiple threads running in the VM at once. There are no-op functions (lua_lock and lua_unlock) defined in the code around operations which need to be atomic. Such a project could recompile Lua with them calling appropriate locking operations. You can run Lua multithreaded, if you want, but then you have to worry about thread safety. (I have to admit some ignorance here, though - I think running multiple independent VMs is much simpler than having to deal with debugging multi-threaded code.)

FWIW, the source for Lua 5.1.4 is ~626 KB. Adding a customized version of it to a project's source tree is nowhere near a big deal as including, say, all of Python (which is, what, 40+ MB?). In the Lua source, luaconf.h collects some items that are likely to be tweaked in project-specific ways.

Anyway, Lua does not have any threading implementation of its own because it would make it far more difficult to run inside of other projects. The Lua VM itself is pretty light (a few hundred KB), and since all VM state is specific to each instance, you can run several without running into the sort of problems that people have with Python's GIL.

There are several extensions for multitasking in Lua, beyond using coroutines. Here is a listing on the wiki (http://lua-users.org/wiki/MultiTasking) and a comparison by the author of Lanes (http://kotisivu.dnainternet.net/askok/bin/lanes/comparison.h...).

This is one of the cases where keeping Lua small, portable, and easy to embed means that functionality people expect ends up in an extension rather than as part of the "official" standard library. I'm happy with it as a trade-off - it makes Lua as expressive as Python but keeps the core language as small and clean as SQLite.

est · on Jan 5, 2010

I guess because python vm is 2MB while Lua is just hundereds of kilobytes

scott_s · on Jan 5, 2010

I agree with you, but in practice, there won't be as much memory replication as you assume. Linux (and I assume most modern OSes) implement copy-on-write: no data is copied until a write occurs. So as long as the separate process only read that large data set, that large data set should exist in memory only once.

FooBarWidget · on Jan 5, 2010

Exception multiprocessing does not use fork() (without exec()) because it has to be compatible with Windows.

rbanffy · on Jan 5, 2010

Like I use to say: that's Microsoft holding everybody back. ;-)

Now, seriously, can we implement a better multiprocessing module for those with modern OSs and leave a less functional one (one that will raise a LesserOSError when certain functionality is invoked)?

jnoller · on Jan 6, 2010

Incorrect; On linux and OS/X it uses fork. On windows it uses a best-effort clone.

kilowatt · on Jan 5, 2010

Just because your frontend uses XY,000 KB of memory doesn't mean that worker processes have to too. The working set size for a fresh Python 2.6 run on my Windows 7 box is 6,620KB. See Chrome for a good example of this approach :)

tentonova2 · on Jan 5, 2010

Chrome's approach was not chosen because of its general purpose suitability as a multi-core processing model, but because of the security advantages of relying on OS process separation in a web browser.

Chrome is a very specific use case with unusual requirements. If you take anything away from the Chrome example, it would be that securely replicating the functionality easily available from shared-memory threads in a multi-processing model is both very complicated and not very portable.

thorax · on Jan 5, 2010

We're talking about the multiprocessing package/modules in Python:

http://docs.python.org/library/multiprocessing.html

It uses fork under the covers, so small worker processes aren't an option for that module (unless we forego interpreter embedding entirely which isn't the point).

There are other techniques for spawning processes to do some external process work, but those, of course, need entirely separate plumbing.

mattmcknight · on Jan 5, 2010

The added benefit is that the multiprocessing approach scales more easily to multiple machines. Same rule applies to Ruby.

ajross · on Jan 5, 2010

And the concomitant disadvantages are that multi-process solutions eat memory like there's no tomorrow, and can share cached data only via IPC. So you get your scalability at the cost of a very large constant factor drop in overall performance.

Some problems don't care. Some do. But for those where threading is a more appropriate solution, the Python GIL is a significant limitation. And this is a great exposition as to why.

Announcing that threading should never be used just tells the world that you only work in one problem domain (probably web content generation) and haven't thought through the details elsewhere.

gte910h · on Jan 5, 2010

I honestly would say MOST people who reach for threads do so instinctively, not after actual analysis of what type of concurrency would serve the project they're at best.

Lots of devs come from the java world or windows world where threads are done all the time but multiprocess concurrency is rare.

I'd say that out of the projects I've seen people grab for threads, only about 10-15% actually need the type of heavily shared data or marginally smaller memory footprint that threading gives you. Of the rest of them, it was a toss up, or leaned towards the simplification of data structures, easier porting to multiple processors and overall less complex components that IPC gives you.

Additionally, I'd contend you're doing something wrong if you see a HUGE increase when you go from threads-> multiprocessed solutions. Usually you just see a small overhead increase unless you're doing something questionable like loading data structures over and over in all processes and the like.

May I recommend Stevens Unix Network Programming Vol 2: IPC to see some common designs of MP programs? You may see some large memory footprint gains by just looking through some of the programs in that book and rethinking your processing/responsibilities of your different processes.

I by no means think EVERYTHING should be multiprocessed, I just think MOST things can be equally well done with threads or processes, and MANY programmers have never seen multiprocess programs, especially large multiprocess programs, where most have worked in multithreaded programs (where there are more than 1 user threads).

Having worked with both, the errors you see in multiprocess programs are MUCH more traceable and manageable. Multi-threading errors can take months to debug sometimes, and weeks to even get a good reproduction scenario for. Then again, for a SMALL amount of concurrency, threading is simpler to setup and do for sure.

ajross · on Jan 5, 2010

That's a fair case, but it's not exactly a rebuttal to what I said either. If you can fit your problem into a multi-process solution (and most web stuff qualifies), then yes, there are many advantages there. But some problems benefit from both CPU parallelism and fast shared memory (think numerics work, graphics, games, data stores...). And this is an area that (due to the GIL contention visualized in the linked article) Python serves poorly.

But don't fool yourself that "MOST" of anything works with any one architecture. It's a big world.

And I assure you I've read Stevens.

gte910h · on Jan 7, 2010

>But don't fool yourself that "MOST" of anything works with any one architecture. It's a big world.

Of methods of concurrency that people use, there are two main approaches, threads, and multiprocessing. I just contend only about 15% of solutions are super clear winners for one or the other, leaving 65-70% of apps that can go either way.

And I'm not a web developer, I'm an embedded software developer with years of experience in embedded video, streaming video, and hardware emulation. Lots of THAT stuff also fits just fine in either system too. As does the test systems, large analytics programs, etc that I've seen unrelated to the field.

Actually python largely keeping to the inconsistency of the underlying OS with regards to mmap and other shared memory solutions is the biggest obstacles there. Oh, and the fact that hardcore numerics are still callouts to C libs in python.

>And I assure you I've read Stevens. Lots and lots of people read Vol1, it seems like 1/10th as many read Vol2. Hell, I'm pretty sure most people who read Vol1 haven't a clue what's in Vol2.

I'm by no means saying I want the GIL to say the same, I just think numerics work, graphics, games need threads, after that, you get to use either one (aka, 15% need threads, the other 85 need processes or can use either).

illumen · on Jan 5, 2010

pygame does a lot of its threaded graphics, and sound work entirely in C world, because python threads are so limited.

Some of us have been using mmapd surfaces with python/pygame for years though. For graphical work, it's not all that hard to share data with python. You can use the same approaches with numpy. See this cookbook example: http://www.pygame.org/wiki/MmapSurfaces it's only a dozen lines of code to share data via mmap. There are lots of issues with the python, and numpy mmap modules though... and mmap is very different across platforms (eg, linux, windows, macosx have quite different behaviour).

Another approach good for some graphics problems is forking. Forking is pretty fast, and it allows you to share memory in a fairly nice way. However mixing forking and threading is a quite tricky with python. This is also not so good on windows - mainly its a good method on *nix.

cu,

gte910h · on Jan 7, 2010

I'd love a good windows fork.....win7 hasn't implemented it yet I'm guessing?

fauigerzigerk · on Jan 5, 2010

Disks are slow, very slow, and memory is getting cheaper. That's why a lot of data analysis stuff moves to in-memory solutions. Building those kinds of apps on top of shared memory BLOBs is incredibly unproductive because you cannot share pointers. You're right that these apps may be a minority today. But it's a fast growing minority.

And yes multi-threading errors can be difficult to debug, but all the approaches to avoid these errors can be used with threads just as well as with multiple processes. Message passing, features like those in Clojure or other functional languages, or Go's goroutines.

And to be honest, the fact that most people use threads when they don't actually need them or design their programs badly is a non issue for me. Most people are incredibly bad at most things I'm afraid. Does that mean Python should not empower people who know what they are doing?

tentonova2 · on Jan 5, 2010

Multiprocess approaches are both significantly more complicated to implement and significantly more constrained in their functionality.

Unless you have overriding design restrictions (such as requiring privilege separated worker processes), it seems that the only genuine argument for the "multiprocessing" model is simply that Python doesn't support anything else.

gte910h · on Jan 7, 2010

Complicated looking perhaps. But the ERRORS you see are significantly easier to debug.

I come from the "Why not at least look at multiprocessing" in C world as well. Threads are a way lots of very subtle bugs happen. It's a "all is well until someone loses an eye" type of situation.

Both are valid ways to solve problems, but both have their share of issues.

And they're not "significantly" more complicated if you're actually testing your code. Testing multi processed solutions is easy. Testing for threading issues is a pain in the ass.

apu · on Jan 5, 2010

It's also not practical for many scenarios. Signals in python can only be received by the main thread, and since that's what multiprocessing uses for communication, it means many common scenarios are out: GUI programs where the main thread is the rendering loop, network programs where the main thread is the connection handler, etc.

I've had much more luck using Python Remote Objects: http://pyro.sourceforge.net/

It's trivial to make a network queue and use that for communication across different processes/machines/etc.