Although you are right about threads only ever scaling so far, you need to remember that network I/O has a rather large overhead.
If you always assume your code is going to be run over a network you might miss an opportunity to efficiently solve some problems that might be solved on just a single machine with a bunch of cores.
I think frameworks like celluloid allow you to deal with this elegantly, but they need the help from the language to realize this potential, which is why bascule requests these features.
An example: a computer game might be built concurrently by having the rendering system, the two physics engines, any AI's and the main game loop execute on separate threads. Obviously there is a bunch of information to be shared between these systems with as little delay as possible.
Simply put, if you map out storage levels like this:
L1 -> L2 -> (L3) -> Memory -> Disk/Network
These are orders of magnitude different in performance. Network can be faster than disk, but not generally by an order of magnitude.
So, everything you know about memory vs. disk for performance ought to translate fairly well to memory vs. network.
It's a good observation that extremely performance-bound jobs might want to look to other languages, but avoiding a level of that data storage hierarchy is no meager 2-3x speedup.
If you always assume your code is going to be run over a network you might miss an opportunity to efficiently solve some problems that might be solved on just a single machine with a bunch of cores.
I think frameworks like celluloid allow you to deal with this elegantly, but they need the help from the language to realize this potential, which is why bascule requests these features.
An example: a computer game might be built concurrently by having the rendering system, the two physics engines, any AI's and the main game loop execute on separate threads. Obviously there is a bunch of information to be shared between these systems with as little delay as possible.