We struggle with this too. There are so many ways the dynamics can change--daily...

We struggle with this too. There are so many ways the dynamics can change--daily/weekly/seasonal cycles, robots, client caching, etc. One best practice we use for high-volume services is to minimize the variance in call mixtures--if there are two calls with vastly different call patterns, it's probably worth it to split them into separate services, so you can tune throttling, GC, load balancing, etc. specifically for those calls, instead of having to tune them to support both calls (which is often difficult or impossible to do). Of course, it's hard to predict how your service will evolve over time, so making the split is often painful for you and your clients. Some of our services can't be handled by a single load balancer, so we use DNS round robins, which have a whole other class of problems when you have mixed call patterns. Gotta earn your pay...

Some other techniques we use are one-box deployments that receive a proportion of production traffic and "bake" new changes before deploying to the whole fleet, and shadow fleets which let you tune and test against live traffic. We've found that simply replaying production traffic at higher volumes sometimes isn't sufficient, because our calls don't necessarily scale that way (some of them scale with upstream traffic, some of them scale by downstream fleet sizes due to client caching).