Hacker News new | past | comments | ask | show | jobs | submit login

Most quantile sketches (and t-digest in particular) do not assume stationarity.

Note also that there are other bounds of importance and each has trade-offs.

T-digest gives you a strict bound on memory use and no dynamic allocation. But it does not have guaranteed accuracy bounds. It gives very good accuracy in practice and is very good at relative errors (i.e. 99.999th percentile estimate is between the 99.9985%-ile and 99.9995%-ile)

KL-sketch gives you a strict bound on memory use, but is limited to absolute quantile error. (i.e. 99.99%-ile is between 99.9%-ile and 100%-ile. This is useless for extrema, but fine for medians)

Cormode's extension to KL-sketch gives you strict bound on relative accuracy, but n log n memory use.

Exponential histograms give you strict bounds on memory use, no allocation and strict bounds on relative error in measurement space (i.e. 99.99%-ile ± % error). See the log-histogram[1] for some simple code and hdrHistogram[2] for a widely used version. Variants of this are used in Prometheus.

The exponential histogram is, by far, the best choice in most practical situations since an answer that says 3.1 ± 0.2 seconds is just so much more understandable for humans than a bound on quantile error. I say this as the author of the t-digest.

[1] https://github.com/tdunning/t-digest/blob/main/core/src/main...

[2] https://hdrhistogram.org/




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: