Size-Growth-Pace Model

troy.magennis
Forecasting using data
11 min readJul 7, 2017

Chapter 3

This chapter looks at simple models that can forecast software projects before actual data is available, or when we think the future isn’t going to resemble the past well enough for a reliable forecast. We discuss how to minimize or eliminate many of the estimates your teams may currently perform.

Goals of this chapter

  • Introduce the Size-Growth-Pace model for forecasting time
  • Learn how to combine the estimations of size-growth-pace range estimates into a forecast

Forecast models using estimates (or not)

In Chapter 2, Assumption Based forecasting was introduced. The basic concept behind forecasting using assumptions is that all the necessary facts (assumptions until proven) that make a forecast probable to succeed are written down. We first look for assumptions that make our forecast impossible (or implausible). I see a lot of effort performed by teams laboring over some estimation process at a fine detail level, when even a rough estimate would have ruled out that option as viable. Don’t do this. Try and rule out forecast assumptions with the least effort possible, and only estimate in finer detail when absolutely necessary (the assumption is near its tipping point).

Many people feel that traditional “estimation” is a total waste of time, and they are probably right the majority of the time. The error isn’t in doing the estimates. The error is estimating irrelevant things. We need only estimate assumptions that are near their tipping-points, and only then move on with other estimates.

The general assumptions that will help build a reliable forecast fall into the following categories –

1. Necessary conditions — Must Haves

a. Requirements and ability to start delivery (team, equipment, space).

b. Agreement of acceptable quality to be able to deliver to customers.

c. Requirements and ability to deliver to customers (environments, process, logistics).

2. Measures and estimates

a. A measure of initial scope and size.

b. A measure of expected scope growth and rework.

c. A measure of expected progress of scope delivery over time.

Necessary conditions (the assumptions) have binary “pass, yes” or “fail, no” answers. Failing even one necessary assumption is a certain sign the forecast is questionable; Often forecasting proceeds as if wishful thinking and pixies will solve the blocker. Failed necessary conditions is the most common cause of eventual project delay, and this was known even before the first line of code written. Ironically, most forecasting effort is spent achieving group consensus of the measures and estimates, especially feature size and scope. Find ways to brainstorm and confirm the necessary conditions before moving on to measurement (numerical) assumption analysis. Don’t proceed if the necessary conditions are violated.

Measurement assumptions are the numerical building blocks of forecasting. Forecasting how long work will take to complete requires three pieces of information. If these three pieces of information are accurately estimated, the forecast will be reliable. The three pieces of information needed are -

1. How much initial work (size)

2. How much work gets added over time as work is done (growth)

3. How fast we build and deliver that work (pace)

I wrap forecasting using these assumptions into a model called size-growth-pace. It’s the simplest model for determining how long it might take to complete work over time.

Size-Growth-Pace Model

When we embark on a lengthy road-trip vacation, we often use a pace based method for estimating when we will arrive. We start by knowing the distance we need to travel and a rough travel pace (speed, distance per hour). We simply divide the distance to travel by the expected travel pace, to see how long that trip may take in hours. We then adjust by contextual factors, for example adding extra time to account for stopping along the way to refuel our bodies (food) and our cars (fossil fuel, Tesla’s excepted).

Software projects are no different. We have an initial amount of work and a delivery pace in mind. We know that we will learn as we build and this might alter and add scope. We also assume we will not get everything perfect first off and there will be defects. It sounds simple, but it’s an accurate reflection of what occurs in real life. The basic formula for making a time based forecast in its simplest form is –

This equation makes forecasting look simple. Although single point estimates could be used for each estimate, there are good reasons not to. If we get agreement on average values for each estimate, the final result will have an average chance of being accurate. If we were doing the same project 100 times, 50 times will be below our forecast, 50 would be longer than our forecast. We don’t run the same project 100 times, we run it once. We also don’t want a coin toss chance (50–50) of being late, we want to be more certain.

To allow more certainty in our forecast, the uncertainty in the estimates needs to be included as data in its own right. To include uncertainty, all estimate should be ranges. The ranges should be wide enough that the team is confident the actual value (eventually known values) falls within the estimate range given. For example, the estimates for a feature or project might be: the initial size will be 20 to 30 feature stories, we expect 1 to 3 defects per story, and hope to deliver 3 to 10 stories every week.

Solving the size-growth-pace equation using range estimates keeps the uncertainty of the estimates in the final result. Forecasts can be given to the desired certainty level wanted. 85% is more certain than the 50%. How this is calculated will be covered soon.

Performing mathematics on ranges is hard for humans to do in their heads. Spreadsheets have no difficulty though. The rest of this chapter looks at how to solve range estimate mathematics using paper and then spreadsheet.

Computing the result of a Size-Growth-Pace Model

Having estimated the required ranges, it’s now just a matter of plugging those estimates into the formula -

This means doing mathematics on ranges, for example 20 to 30 stories plus 1 to 3 times additional stories to account for splitting and defects divided by 3 to 10 items per week delivery pace. If you are struggling to work out how to compute that in your head or with a fancy calculator, then you are not alone. Traditional math doesn’t solve this type of problem.

Equation 1 — Example feature forecast using range estimates

Solving Equation 1 on paper (or Excel, which is paper in my world) involves creating a list of every possible combination of range estimate values (to a degree of rounding wanted) and plotting the range of answers in a chart type called a histogram. Given there are 11 equally possible story count starting points (a), 3 growth factor multipliers (b) and 8 different pace values (z), there are 264 combinations. That’s not too many, and easily permuted in rows of a spreadsheet as shown in columns A, B and Z (column C) of Figure 3.

down to…
Figure 3 — Spreadsheet of all input combinations shown in Equation 1 (rows 11 to 255 excluded for brevity).

Column D of Figure 3 holds the result of the size-growth-pace formula for each permutation. Figure 4 shows how often the same result was calculated as a bar chart, called a histogram. Histograms put results into equal sized bins of value groupings to show how often values occurred in those bin ranges. Higher bars mean values in that range happened more often. Lower bars mean less often, and less likely to occur. This chart tells us that there is a range of possible outcomes from 2 weeks to 30 weeks. A very wide range of possible values. This chart also shows that there is a much higher chance of 2 to 10 weeks than 11 to 30 weeks by the number of potential input combinations that gave that result. The 20 to 30 week possibilities are still possible, but less likely. We can use this chart to understand probability. Probability is the ratio of the number of results that were on or before a certain number of weeks (or date) divided by the total number of possibilities

Figure 4 — Results plotted as a histogram for 20–30 stories, 1 to 3 growth rate, 3 to 10 stories per week rate. The x-axis is a range of weeks to complete. 2.00, 4.00 means any answer between 2 and 4.

For the results shown in Figure 3, there are 184 results that are less than or equal to 10 weeks. There are 264 total possible results. The chance the result will be on or less than 10 weeks is the ratio = 184/264 = 0.69697. Converted to the more common percentage, that is 69.7%. There is a 69.7% chance the result will be 10 weeks or less. Table 1 shows the computed probabilities for 2 week increments.

Table 1 — The full set of results and probabilities calculated for Equation 1 and Figure 3.

These are mathematically and a statistically correct answers. We can’t compute a single result, because there isn’t one. There are a range of results possible, some more likely than others. We took three range estimates and combined them to determine a set of possibilities and probabilities. It allows us to say “There is a 84% chance of 14 weeks or less, but it could be between 2 weeks to 30 weeks.” We make that statement knowing we have accounted for three major assumptions — initial scope, how much that scope will grow when doing the work, and expected delivery pace.

Mostly we are interested in the likelihood of delivering on or before a certain date, and need to convert number of weeks to calendar date by adding weeks a specified start date, a pretty simple calculation. My only reservation about converting to a date is that if work doesn’t actually start at the assumed start date, people remember the date and still expect delivery. Missing the start date is the most often violated assumption. Often the team isn’t in place or trained sufficiently to call it a start date in reality. Staying with duration helps compare trade-offs about scope and team size without prematurely locking stakeholders memories to a calendar date they will remember and expect.

Solving more complex models using probabilistic forecasting

Solving range mathematics involves playing out all of the possibilities our range estimates specify and return use that set of results to determine a final answer. Or it would if we had infinite computing time. Creating EVERY possible outcome for all model assumptions could take prohibitive computation time on more complex models or when the number ranges are large. Or what if every value wasn’t the same probability as the others within a range estimate? Up to now all numbers within the range estimate are considered equally possible — this is one possibility, but an unlikely one. Solving more complex models requires a subtle refinement of our approach.

Sampling — We don’t need every possible outcome, we just need a random sampling of possible outcomes.

As a computational shortcut, we play out 500 randomly chosen possibilities and knowing what we learnt about statistical sampling, that samples size will be more than ample at giving a reliable (similar) result. And that’s what we do in the Throughput Forecasting spreadsheet (downloadable here). 500 random trials are built that hypothetically complete the feature or project using the size-growth-pace model. The 500 trials allow a clear picture of the range and probabilities of outcomes.

To generate possible outcomes, the spreadsheet completes the project many times (a trial). For each trial it picks a random starting point (number of stories), plus a random amount of growth. It then hypothetically burns down that starting point by subtracting a random sample of pace until all work is completed. It does this 500 times.

The necessary inputs are –

1. A starting date

2. A range estimate of size (count of items or points)

3. A range estimate of growth rate

4. A range estimate of pace

Figure 5 — Input values for the Throughput Forecaster spreadsheet that implements the size-growth-pace model.

When combined, these inputs allow simulated project to be completed. Graphically, they can be visualized as multiple burn-down chart paths as shown in Figure 6. Burndown charts are line plots that show progress towards completion over time. Each path is based on estimates and chance. The outcome is multiple possible forecast dates, with some being more likely than others. The initial starting point varies based on the random starting point for number of stories (20 to 30), and the random story growth rate (1 to3). This spreadsheet also computes a weekly throughput pace for EVERY week along the way to no more remaining work.

Some trials have bad luck, getting the highest story count start point multiplier and the lowest pace. Some trials get good luck and low starting point and growth multiplier and fast weekly pace. Most get a mixture of both.

Figure 6 — Hypothetical burn down paths for the first 50 trials the Throughput Forecaster computed.

The burndown paths are interesting in understanding how the forecasting process works, but a little hard to interpret. The spreadsheet also produces a histogram of results, and a more human readable table form as shown in Figure 7.

Figure 7 — Result outputs from the Throughput Forecaster spreadsheet as a histogram and as a probability table.

The Throughput Forecaster spreadsheet is a simple Monte Carlo simulation forecasting tool. It uses straight excel formulas, no macros, add-ins or programming other than formulas. It will give accurate forecasts using the range estimates provided and in future chapters we will look at how to replace the range estimates with actual data when it becomes available.

Summary

This chapter has introduced a simple model for forecasting agile delivery of a feature or project using estimated size, growth and pace. It really is a simple model that is no more complex than forecasting how long to travel in a vehicle of your choice.

This chapter has also introduced probabilistic forecasting, first by showing how to do it manually, then how to do it using a spreadsheet. There is a lot more to learn before we can understand the assumptions made in this spreadsheet, and how it actually works, but for now, it’s a good enough takeaway to trust that it is combining three assumptions — size, growth and pace.

Resources: Throughput Forecasting spreadsheet (downloadable here).

--

--