I embed Python in my application. My highly graphical application requires XY,000 KB of memory. It's probably not wise to re-invoke my entire app Z times simply so that Python can properly take advantage of the user's multi-core OS while scripting inside my application.
Any time you embed Python (e.g. a game, a browser, etc), the multiprocessing support isn't going to work very well as it is today. Ideally we get to the point where threading works great in the core interpreter.
I asked Guido about this in front of his keynote audience a few Pycons ago, and his answer was a sad shrug and saying to use Jython or IronPython instead.
I'd love to see the GIL improved/removed because CPython-embedding apps don't have the same options as everyday pure scripting.
No, it doesn't. The whole interpreter was designed from the beginning to be embedded in another (typically C or C++) project, and they didn't want to have any concurrency infrastructure in the core language that would clash with the surrounding project's design. (It has excellent support for coroutines, though.)
Rather than having static variables and a GIL, the interpreter keeps all its state behind an opaque pointer to a VM context. The VM is quite lightweight (a few hundred K), and running in each thread or process is not a big deal. With forking, you even share the standard libraries. In practice, this just means that every call to the Lua C API starts with ("lua_State *L"). Not a big deal.
Lua also has real tail-calls, closures, and lambdas. It's a great language. Rather like a smaller, cleaner, less opinionated Python.
Also lua right now has the one of the best JIT's out there (luajit). The guy behind the project is simply genius. Really good stuff. And also kept small. It's x2-x3 faster than V8 on specific benchmarks that I've did (mainly dealing with 3d model meshes and modifying them).
It's only for x86 right now, but he's working on x64 port.
LuaJIT is really impressive, but honestly, the stock interpreter is fast enough for me in general, and I'm fine with just moving hotspots to C as needed.
I'm more impressed with how concise Lua is. I've got the Lua 5.1 Reference Manual sitting in front of me. It's 103 pages for the whole language (roughly half of which is the C API), the entire syntax fits on page 95, etc. I can keep it all in my mental L1 cache, so to speak. Yet, it's expressive enough to have displaced Python as my go-to language. It's not perfect, but I'm very happy with the trade-offs it makes.
I'll be psyched once it's ported to a couple more hardware platforms, though.
ANSI C, the standard to which Lua complies, has no mechanism for managing multiple threads of execution.
The method above for implementing threading in Lua using a global mutex is inefficient on multiprocessor systems. As one thread holds the global mutex, other threads are waiting for it. Thus only one Lua thread may be running at a time, regardless of the number of processors in the system.
It's a bit confusing. Essentially, "the method above" (using multiple OS threads running on the same Lua VM) extends Lua, adding a GIL along the way.
Lua itself does not use pre-emptive threads (the kind where you need to worry about locks, race conditions, "thread safe" libraries, etc.). It has co-routines, co-operative threads, in which independent light processes run and yield to each other. Only one is running in the interpreter at a time, so locking and race conditions are much less an issue. If something needs to happen atomically, you just don't yield and resume in the middle of it. (Coroutines are as versatile as one-shot continuations, incidentally - each can be implemented in terms of the other.)
BUT, since Lua is designed for use in other projects, it may be running as a scripting engine inside of (say) a multi-threaded C++ game. In this case, Lua can have multiple threads running in the VM at once. There are no-op functions (lua_lock and lua_unlock) defined in the code around operations which need to be atomic. Such a project could recompile Lua with them calling appropriate locking operations. You can run Lua multithreaded, if you want, but then you have to worry about thread safety. (I have to admit some ignorance here, though - I think running multiple independent VMs is much simpler than having to deal with debugging multi-threaded code.)
FWIW, the source for Lua 5.1.4 is ~626 KB. Adding a customized version of it to a project's source tree is nowhere near a big deal as including, say, all of Python (which is, what, 40+ MB?). In the Lua source, luaconf.h collects some items that are likely to be tweaked in project-specific ways.
Anyway, Lua does not have any threading implementation of its own because it would make it far more difficult to run inside of other projects. The Lua VM itself is pretty light (a few hundred KB), and since all VM state is specific to each instance, you can run several without running into the sort of problems that people have with Python's GIL.
This is one of the cases where keeping Lua small, portable, and easy to embed means that functionality people expect ends up in an extension rather than as part of the "official" standard library. I'm happy with it as a trade-off - it makes Lua as expressive as Python but keeps the core language as small and clean as SQLite.
I agree with you, but in practice, there won't be as much memory replication as you assume. Linux (and I assume most modern OSes) implement copy-on-write: no data is copied until a write occurs. So as long as the separate process only read that large data set, that large data set should exist in memory only once.
Like I use to say: that's Microsoft holding everybody back. ;-)
Now, seriously, can we implement a better multiprocessing module for those with modern OSs and leave a less functional one (one that will raise a LesserOSError when certain functionality is invoked)?
Just because your frontend uses XY,000 KB of memory doesn't mean that worker processes have to too. The working set size for a fresh Python 2.6 run on my Windows 7 box is 6,620KB. See Chrome for a good example of this approach :)
Chrome's approach was not chosen because of its general purpose suitability as a multi-core processing model, but because of the security advantages of relying on OS process separation in a web browser.
Chrome is a very specific use case with unusual requirements. If you take anything away from the Chrome example, it would be that securely replicating the functionality easily available from shared-memory threads in a multi-processing model is both very complicated and not very portable.
It uses fork under the covers, so small worker processes aren't an option for that module (unless we forego interpreter embedding entirely which isn't the point).
There are other techniques for spawning processes to do some external process work, but those, of course, need entirely separate plumbing.
And the concomitant disadvantages are that multi-process solutions eat memory like there's no tomorrow, and can share cached data only via IPC. So you get your scalability at the cost of a very large constant factor drop in overall performance.
Some problems don't care. Some do. But for those where threading is a more appropriate solution, the Python GIL is a significant limitation. And this is a great exposition as to why.
Announcing that threading should never be used just tells the world that you only work in one problem domain (probably web content generation) and haven't thought through the details elsewhere.
I honestly would say MOST people who reach for threads do so instinctively, not after actual analysis of what type of concurrency would serve the project they're at best.
Lots of devs come from the java world or windows world where threads are done all the time but multiprocess concurrency is rare.
I'd say that out of the projects I've seen people grab for threads, only about 10-15% actually need the type of heavily shared data or marginally smaller memory footprint that threading gives you. Of the rest of them, it was a toss up, or leaned towards the simplification of data structures, easier porting to multiple processors and overall less complex components that IPC gives you.
Additionally, I'd contend you're doing something wrong if you see a HUGE increase when you go from threads-> multiprocessed solutions. Usually you just see a small overhead increase unless you're doing something questionable like loading data structures over and over in all processes and the like.
May I recommend Stevens Unix Network Programming Vol 2: IPC to see some common designs of MP programs? You may see some large memory footprint gains by just looking through some of the programs in that book and rethinking your processing/responsibilities of your different processes.
I by no means think EVERYTHING should be multiprocessed, I just think MOST things can be equally well done with threads or processes, and MANY programmers have never seen multiprocess programs, especially large multiprocess programs, where most have worked in multithreaded programs (where there are more than 1 user threads).
Having worked with both, the errors you see in multiprocess programs are MUCH more traceable and manageable. Multi-threading errors can take months to debug sometimes, and weeks to even get a good reproduction scenario for. Then again, for a SMALL amount of concurrency, threading is simpler to setup and do for sure.
That's a fair case, but it's not exactly a rebuttal to what I said either. If you can fit your problem into a multi-process solution (and most web stuff qualifies), then yes, there are many advantages there. But some problems benefit from both CPU parallelism and fast shared memory (think numerics work, graphics, games, data stores...). And this is an area that (due to the GIL contention visualized in the linked article) Python serves poorly.
But don't fool yourself that "MOST" of anything works with any one architecture. It's a big world.
>But don't fool yourself that "MOST" of anything works with any one architecture. It's a big world.
Of methods of concurrency that people use, there are two main approaches, threads, and multiprocessing. I just contend only about 15% of solutions are super clear winners for one or the other, leaving 65-70% of apps that can go either way.
And I'm not a web developer, I'm an embedded software developer with years of experience in embedded video, streaming video, and hardware emulation. Lots of THAT stuff also fits just fine in either system too. As does the test systems, large analytics programs, etc that I've seen unrelated to the field.
Actually python largely keeping to the inconsistency of the underlying OS with regards to mmap and other shared memory solutions is the biggest obstacles there. Oh, and the fact that hardcore numerics are still callouts to C libs in python.
>And I assure you I've read Stevens.
Lots and lots of people read Vol1, it seems like 1/10th as many read Vol2. Hell, I'm pretty sure most people who read Vol1 haven't a clue what's in Vol2.
I'm by no means saying I want the GIL to say the same, I just think numerics work, graphics, games need threads, after that, you get to use either one (aka, 15% need threads, the other 85 need processes or can use either).
pygame does a lot of its threaded graphics, and sound work entirely in C world, because python threads are so limited.
Some of us have been using mmapd surfaces with python/pygame for years though. For graphical work, it's not all that hard to share data with python. You can use the same approaches with numpy. See this cookbook example: http://www.pygame.org/wiki/MmapSurfaces it's only a dozen lines of code to share data via mmap. There are lots of issues with the python, and numpy mmap modules though... and mmap is very different across platforms (eg, linux, windows, macosx have quite different behaviour).
Another approach good for some graphics problems is forking. Forking is pretty fast, and it allows you to share memory in a fairly nice way. However mixing forking and threading is a quite tricky with python. This is also not so good on windows - mainly its a good method on *nix.
Disks are slow, very slow, and memory is getting cheaper. That's why a lot of data analysis stuff moves to in-memory solutions. Building those kinds of apps on top of shared memory BLOBs is incredibly unproductive because you cannot share pointers. You're right that these apps may be a minority today. But it's a fast growing minority.
And yes multi-threading errors can be difficult to debug, but all the approaches to avoid these errors can be used with threads just as well as with multiple processes. Message passing, features like those in Clojure or other functional languages, or Go's goroutines.
And to be honest, the fact that most people use threads when they don't actually need them or design their programs badly is a non issue for me. Most people are incredibly bad at most things I'm afraid. Does that mean Python should not empower people who know what they are doing?
Multiprocess approaches are both significantly more complicated to implement and significantly more constrained in their functionality.
Unless you have overriding design restrictions (such as requiring privilege separated worker processes), it seems that the only genuine argument for the "multiprocessing" model is simply that Python doesn't support anything else.
Complicated looking perhaps. But the ERRORS you see are significantly easier to debug.
I come from the "Why not at least look at multiprocessing" in C world as well. Threads are a way lots of very subtle bugs happen. It's a "all is well until someone loses an eye" type of situation.
Both are valid ways to solve problems, but both have their share of issues.
And they're not "significantly" more complicated if you're actually testing your code. Testing multi processed solutions is easy. Testing for threading issues is a pain in the ass.
It's also not practical for many scenarios. Signals in python can only be received by the main thread, and since that's what multiprocessing uses for communication, it means many common scenarios are out: GUI programs where the main thread is the rendering loop, network programs where the main thread is the connection handler, etc.
http://www.dabeaz.com/blog/2010/01/python-gil-visualized.htm...