No idea why "story points" or "cups of coffee" or "shirt sizes" have much relati...

ohthehugemanate · on July 12, 2022

I hear what you're saying in two parts:

"Why not just estimate hours anyway?"

Because humans are extremely bad at estimating time, which is borne out by studies many times over. A good overview is the original research on this, for which Kahneman et al won a Nobel prize (yes, the same Kahneman who would later go on to write HN favorite, "thinking fast and slow"). The broad stroke is, the very best time estimators in the very best circumstances only underestimate their time needs by 33%. The norm is more like 80%. They propose a few time estimation strategies to get around it, like "third party estimation" and "tripartite estimation". But the simplest approach (which emerged in later research) is to ask them to estimate "size" of task, and use statistical corellation to convert that to a number.

This last is hand wavy unless you're familiar with the law of large numbers, the law that makes casinos profitable. A casino cannot (without cheating) determine the outcome of a single roulette spin. But they can predict with extremely high certainty the aggregate outcome of a thousand spins. This is the same with your estimates. You can't predict the corellation to time of a single story point. As you pointed out, sometimes something that looked complicated turns out to be easy and vice-versa. But given a sufficient sample size (of estimates with a consistent corellation to time), you can predict with extreme accuracy the time for 1000 story points.

"Consistent corellation to time" is a bit of a PITA in a group, BTW. If you have developers do their own estimations individually, each one will have a different corellation to time. You would need a very large sample size to overcome that much variation. This is why so many systems encourage team estimation, so the consistency is dependent on the team dynamic, which is much more stable even when adding/removing engineers. But as I said, if it's the same person or team always writing your tasks, you can use their team dynamic instead, since their story size will be consistent.

FWIW by sufficient sample size, I mean after about 3 sprints (of any duration) you can make reasonable predictions. After 6 sprints you'll have confusing outliers, and after about 9 sprints it will be clear with some numerical weight to it.

Which brings up question 2, "the commitment ends up being a deadline". This is a human nature thing, you're right! But the problem isn't a mismatch between human nature and your estimate. The mismatch is between human nature and the uncertainty of reality. How you push to improve this is contextual to your org. In hard situations I reverse the statement of my estimate, to "if we set dec 15 as the deadline, there's a 5% chance we won't make it. What's our fallback?" Asking that question a lot is helpful. But there's no magic bullet to making leadership - or worse, people who are afraid of leadership - plan appropriately for uncertainty. The best you can do is expose the uncertainty as clearly as possible, and give lots of lead time for the times when they still run into conflict between deadline, resources, and scope. After that, it's the manager's job to "manage" things and decide which variable they will alter to break the conflict.

Put another way: reality is uncertain. When that uncertainty leads to a conflict between deadline, scope, and available resources - because that will happen sometimes per point 1 - only someone with deadline, scope, or hiring authority can solve it. That's (usually) not within your purview as a lead engineer. The best you can do is to 1) call out the uncertainty as clearly as you can, as early as you can, and 2) signal that conflict as early as you can, so those managers have maximum leeway. Abstracted estimation makes that possible. Guesses and hopes don't.