There are also doubts internally about how much of this stuff will be actually useful to the rest of the world.
This seems the wrong approach. If it's useful to google, it might not be useful to the rest of the world in exactly that form, but I'm sure outside kernel devs would have opinions on it and could suggest a version that helps everyone.
Google makes a lot of use of the out-of-memory (OOM) killer to pare back overloaded systems. That can create trouble, though, when processes holding mutexes encounter the OOM killer. Mike wonders why the kernel tries so hard, rather than just failing allocation requests when memory gets too tight.
This is something that really bothers me about Linux too, and I'm not google, I run a workstation and some servers. Processes with runaway memory allocations can easily grind the system to a halt by causing crazy amounts of swapping. If you don't catch it in time, you basically need to hard reset. This shouldn't be possible from user space.
Mike concluded with a couple of "interesting problems." One of those is that Google would like a way to pin filesystem metadata in memory. The problem here is being able to bound the time required to service I/O requests. The time required to read a block from disk is known, but if the relevant metadata is not in memory, more than one disk I/O operation may be required. That slows things down in undesirable ways. Google is currently getting around this by reading file data directly from raw disk devices in user space,
While the kernel certainly able to handle hundreds of thousands of threads, the main limit is memory, because by default (in libc) there is around 2M of memory allocated to the stack of each thread.
With the pthread library you can create threads with a stack size smaller than this, but this limits some aspects (think of recursion, local objects etc.) of your program.
My experiences show that it is possible to be happy with 64K stack size in an event driven program if you are careful, so you can run ~500,000 threads on 64G memory if you want to. Nevertheless you'd better design your program more clever and use only a handful of threads and put them in blocking state only when you must :)
This seems the wrong approach. If it's useful to google, it might not be useful to the rest of the world in exactly that form, but I'm sure outside kernel devs would have opinions on it and could suggest a version that helps everyone.
Google makes a lot of use of the out-of-memory (OOM) killer to pare back overloaded systems. That can create trouble, though, when processes holding mutexes encounter the OOM killer. Mike wonders why the kernel tries so hard, rather than just failing allocation requests when memory gets too tight.
This is something that really bothers me about Linux too, and I'm not google, I run a workstation and some servers. Processes with runaway memory allocations can easily grind the system to a halt by causing crazy amounts of swapping. If you don't catch it in time, you basically need to hard reset. This shouldn't be possible from user space.
Mike concluded with a couple of "interesting problems." One of those is that Google would like a way to pin filesystem metadata in memory. The problem here is being able to bound the time required to service I/O requests. The time required to read a block from disk is known, but if the relevant metadata is not in memory, more than one disk I/O operation may be required. That slows things down in undesirable ways. Google is currently getting around this by reading file data directly from raw disk devices in user space,
Ouch!
but they would like to stop doing that.
No kidding.