The 95th percentile response time is a bad metric. A typical user executes more than a single request, so it is close to 100% probable that they get a response time above the 95th percentile. Better to tune the maximum response time instead.
Edit: When comparing two systems, the Fisher–Tippett–Gnedenko theorem gives guidance about how to do large-sample statistical inference on estimates of maxima: https://en.wikipedia.org/wiki/Fisher%E2%80%93Tippett%E2%80%9...
The 95th percentile response time is a bad metric. A typical user executes more than a single request, so it is close to 100% probable that they get a response time above the 95th percentile. Better to tune the maximum response time instead.
Edit: When comparing two systems, the Fisher–Tippett–Gnedenko theorem gives guidance about how to do large-sample statistical inference on estimates of maxima: https://en.wikipedia.org/wiki/Fisher%E2%80%93Tippett%E2%80%9...