Hacker News new | past | comments | ask | show | jobs | submit login

> Well, when it comes to performance - you can’t use the average if you don’t know the distribution.

This is frankly wrong. Performance comes in multiple flavors. Latency is one of those, and there we know that percentiles really matter (see Andrew Certain's section of this talk: https://www.youtube.com/watch?v=sKRdemSirDM&feature=youtu.be... for Amazon's experience).

But for others, like throughput and scale, you don't need to know the distribution. In fact for throughput, the only thing that really matters is the long-term mean latency. For concurrency, it's that and long-term mean arrival rate. I wrote a blog post about it a while back (http://brooker.co.za/blog/2017/12/28/mean.html).

The core point here is that all summary statistics are misleading. You need to be clear on what you care about, and making absolute statements about the mean isn't a good way to do that.

Edit: This came across a bit more confrontational than I had intended. The OP makes some good points, but I think his point about the mean is overly broad.




> The core point here is that all summary statistics are misleading. You need to be clear on what you care about

I couldn't agree more. A few months ago I gave a talk that tried in part to emphasize this point (https://www.youtube.com/watch?v=EG7Zhd6gLiw). mjb, I hadn't seen your post until just now but I wish I'd known about it earlier.

Another hard-earned lesson on many teams I've worked with is that humans just aren't very good at judging the variance that's intrinsic to many [summary] statistics. Even when your system is operating in what a human would consider a steady-state, summary statistics are naturally going to bounce around a bit over time. The variance is often higher for tail percentiles just because the density of the PDF is lower in that region. When faced with a question like "did the behavior of my system get worse?" in response to an external change (such as a config change, a code deploy, a traffic increase, etc.), it can be difficult to come up with a reliable answer just by eyeballing a squiggly time series line.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: