Scaling out is natural when you have a large number of tasks, but it's not always an option when you have one big task. Maybe you need to sort a giant collection, or hash a giant input, or multiply a couple giant matrices. Threads with shared memory can get to work on those problems easily, but separate processes can't, at least not without trying to reorganize the problem to avoid doing tons of IO.