I personally was thinking to apply a controller to each client machine and makin...

I personally was thinking to apply a controller to each client machine and making it control the number of connections to the backend.

You can control for the P90 latency and increase or decrease the number of connections to backend machines.

Similarly the backend can decide to drop connections if it see that the latency of the reply is too high, or if the CPU is too high or whatever other metrics make sense.

I don't see if a similar system would ever reach a stable-enough state.