> Well, when it comes to performance - you can’t use the average if you don’t kn...

apk · on March 20, 2020

> The core point here is that all summary statistics are misleading. You need to be clear on what you care about

I couldn't agree more. A few months ago I gave a talk that tried in part to emphasize this point (https://www.youtube.com/watch?v=EG7Zhd6gLiw). mjb, I hadn't seen your post until just now but I wish I'd known about it earlier.

Another hard-earned lesson on many teams I've worked with is that humans just aren't very good at judging the variance that's intrinsic to many [summary] statistics. Even when your system is operating in what a human would consider a steady-state, summary statistics are naturally going to bounce around a bit over time. The variance is often higher for tail percentiles just because the density of the PDF is lower in that region. When faced with a question like "did the behavior of my system get worse?" in response to an external change (such as a config change, a code deploy, a traffic increase, etc.), it can be difficult to come up with a reliable answer just by eyeballing a squiggly time series line.