Estimating a Full Backlog Based on a Sample of It

I want to address a question I was sent recently and that I get asked about once a month. The question has to do with how we estimate how many hours it will take to deliver a given product backlog if we have no historical data at all. My first bit of advice is always to try to put off answering until you're able to get even one sprint of historical data. But that's not always possible. When it's not my general recommendation is to conduct one or more capacity-driven sprint planning meetings and use those to forecast likely velocity. However, some people are still more comfortable thinking in hours and rather than forecast velocity, they want to forecast the number of hours a product backlog will take. From there they will usually estimate the duration of the project (something that could be done more directly with a velocity estimate). But, since it's a common question, I'd like to address it. Here's essentially the question I was asked. I've paraphrased, simplified and left out extraneous information:

We have no historical velocity data. We have a product backlog of 300 user stories. Each user story has been estimated, most in the range of 3-13 points. Would the following approach be reasonable:
1. Grab a random sample of 40 stories.
2. Break each of those stories into tasks and estimate the tasks.
3. From the task estimates come up with an average number of hours per story.
For example, suppose the randomly selected 40 user stories total 150 points and that the tasks identified for those 40 user stories total 600 hours. We would then theorize that it is about 4 hours per story point?

Well, yes, the idea here is fine with two problems:

1. It's important to remember that the relationship between points and hours is not an equivalence. It is not that 1 point equals 4 hours (which is what our example showed above). It is that 1 point equals a mean of four hours with a standard deviation of plus or minus say 45 minutes. This would mean that most of the time (68% of the time) the relationship would be that 1 point takes from 3:15 to 4:45 to finish. It would mean that almost always (98% of the time), one point would take between 2-1/2 and 5-1/2 hours to finish (two standard deviations).
2. The above approach assumes that 1-point stories and 13-point stories are estimated perfectly in relative terms. In other words, it assumes that if the mean duration of a one-point story is 4 hours, the mean of a 13-point story will be 13x4=52 hours. For many reasons, this is unlikely to be true. And the data I've collected from a variety of teams, shows that--as we'd expect--teams are not perfect, even though many are amazingly consistent.

So, what can we do to address these problems? A first really simple improvement is to calculate the average number of hours for stories of each size rather than one overall average. For example, in the example above we said that the 40 stories were 150 points in total and 600 hours for an average of 4 hours per point. But, if we averaged the 1-point stories on their own we might find that they were 3.2 hours per point, and the 2-point stories that were broken into tasks were 4.3 hours per point, and the 3-point stories were 4.1 hours, and so on. We can then multiply that average number of hours by the number of stories on the product backlog of each size. An example using the average hours given above is shown in the following table:

PointsHours Per Story# of StoriesTotal Hours13.2516.028.6868.8312.3786.1

First, notice in the above that the second column is hours per story, not hours per point. The two-point stories were assumed to take 4.3 hours per point so 8.6 (4.3x2) hours per story is shown in that column. This table shows that we have five 1-point user stories. Each is expected to be about 3.2 hours so we expect to spend 16 hours total on the 1-point stories. Summing the last column in this table gives an expected total of 170.9 hours. Note that this approach is subject to all the shortcomings that will have been introduced during the task identification and estimation step. Most importantly that the team will fail to identify all tasks. So this approach will estimate the number of hours to deliver the tasks identified. Some adjustment will need to be made to estimate the amount of work that the team failed to identify and the duration adjusted accordingly. I'll write about other improvements to this simple approach in upcoming blog posts.

Posted:

Tagged: 