While network latency may overshadow that of a single query, many apps have many such queries to accomplish one action, and it can start to add up.
I was referring more to how it's extremely rare to have a stack as simple as request --> LB --> app --> DB. Instead, the app almost always a micro service, even when it wasn't warranted, and each service is still making calls to DBs. Many of the services depend on other services, so there's no parallelization there. Then there's the caching layer stuck between service --> DB, because by and large RDBMS isn't understood or managed well, so the fix is to just throw Redis between them.
> While network latency may overshadow that of a single query, many apps have many such queries to accomplish one action, and it can start to add up.
I don't think this is a good argument. Even though disk latencies can add up, unless you're doing IO-heavy operations that should really be async calls, they are always a few orders of magnitude smaller than the whole response times.
The hypothetical gains you get from getting rid of 100% of your IO latencies tops off at a couple of dozen milliseconds.
In platform-as-a-service offerings such as AWS' DynamoDB or Azure's CosmosDB, which involve a few network calls, an index query normally takes between 10 and 20ms. You barely get above single-digit performance gains if you lower risk latencies down to zero.
In relative terms, if you are operating an app where single-millisecond deltas in latencies are relevant, you get far greater decreases in response times by doing regional and edge deployments than switching to bare metal. Forget about doing regional deployments by running your hardware in-house.
There are many reason why talks about performance needs to start by getting performance numbers and figuring out bottlenecks.
Did you miss where I said “…each service is still making calls to DBs. Many of the services depend on other services…?”
I’ve seen API calls that result in hundreds of DB calls. While yes, of course refactoring should be done to drop that, the fact remains that if even a small number of those calls have to read from disk, the latency starts adding up.
It’s also not uncommon to have horrendously suboptimal schema, with UUIDv4 as PK, JSON blobs, etc. Querying those often results in lots of disk reads simply due to RDBMS design. The only way those result in anything resembling acceptable UX is with local NVMe drives for the DB, because EBS just isn’t going to cut it.
I was referring more to how it's extremely rare to have a stack as simple as request --> LB --> app --> DB. Instead, the app almost always a micro service, even when it wasn't warranted, and each service is still making calls to DBs. Many of the services depend on other services, so there's no parallelization there. Then there's the caching layer stuck between service --> DB, because by and large RDBMS isn't understood or managed well, so the fix is to just throw Redis between them.