Kernel Summit 2009: How Google uses Linux

pmjordan · on Oct 27, 2009

There are also doubts internally about how much of this stuff will be actually useful to the rest of the world.

This seems the wrong approach. If it's useful to google, it might not be useful to the rest of the world in exactly that form, but I'm sure outside kernel devs would have opinions on it and could suggest a version that helps everyone.

Google makes a lot of use of the out-of-memory (OOM) killer to pare back overloaded systems. That can create trouble, though, when processes holding mutexes encounter the OOM killer. Mike wonders why the kernel tries so hard, rather than just failing allocation requests when memory gets too tight.

This is something that really bothers me about Linux too, and I'm not google, I run a workstation and some servers. Processes with runaway memory allocations can easily grind the system to a halt by causing crazy amounts of swapping. If you don't catch it in time, you basically need to hard reset. This shouldn't be possible from user space.

Mike concluded with a couple of "interesting problems." One of those is that Google would like a way to pin filesystem metadata in memory. The problem here is being able to bound the time required to service I/O requests. The time required to read a block from disk is known, but if the relevant metadata is not in memory, more than one disk I/O operation may be required. That slows things down in undesirable ways. Google is currently getting around this by reading file data directly from raw disk devices in user space,

Ouch!

but they would like to stop doing that.

No kidding.

psranga · on Oct 27, 2009

For the memory allocation problem, do you have any ideas about how other OS's address this problem?

How about ulimit? Could that be useful?

josephruscio · on Oct 27, 2009

"Google manages its kernel code with Perforce."

Did not see that one coming.

sid0 · on Oct 27, 2009

I thought it was common knowledge that Google uses p4 for practically everything.

dschobel · on Oct 27, 2009

And load balancing matters: Google runs something like 5000 threads on systems with 16-32 cores.

Even in the most IO-bound of domains or in the case where 4.9k are sleeping, does it really make sense to ever have that many threads/core?

jbellis · on Oct 28, 2009

Sure. Modern linux can handle 100s of 1000s of threads without breaking a sweat.

See also http://paultyma.blogspot.com/2008/03/writing-java-multithrea... for an interesting (java-centric) analysis.

agazso · on Oct 28, 2009

While the kernel certainly able to handle hundreds of thousands of threads, the main limit is memory, because by default (in libc) there is around 2M of memory allocated to the stack of each thread.

With the pthread library you can create threads with a stack size smaller than this, but this limits some aspects (think of recursion, local objects etc.) of your program.

My experiences show that it is possible to be happy with 64K stack size in an event driven program if you are careful, so you can run ~500,000 threads on 64G memory if you want to. Nevertheless you'd better design your program more clever and use only a handful of threads and put them in blocking state only when you must :)