Estimation — how can we estimate with confidence in software development?
‘When will it be done?’ is one of the most common and difficult questions to answer in software development.
Estimating in software is traditionally difficult, inaccurate most of the time, with project teams spending a significant amount of time on the process.
There are a number of contributing factors to this. Part of it is that software development is a design activity, and thus hard to plan and estimate. Software is done with people, and it depends which individual people are involved. Individuals are hard to predict and quantify and humans in general are inherently bad at predictions . Those teams that build the software are operating in an ever-changing business and technology landscape.
The cost of estimation is typically high, resulting in teams either postponing the estimation altogether until becomes too late, or not readapting the estimate when they are in possession of new information.
Estimation is not a process that can be automated, a magic formula cannot be created for it. It requires skills and knowledge and an understanding of the estimation process is critical.
This paper sets to introduce aspects that should be considered when putting together an estimate, with an aim to reduce the cost of estimation and increase its accuracy. A note of cautions though — these principles and techniques should not be seen as exhaustive, nor as the only way to estimate. Context, knowledge and skills are important.
I. Guessing, Forecasting, Estimating
Before we move forward, we need to clarify some of the language used in estimation.
Troy Magennis in his book “Forecasting Using data” highlights a subtle difference between forecasting and estimating. Forecasting is estimating in advance, and estimating is carefully forming an opinion, calculating approximately. Guessing is taking a stab in the dark by having little or no information about a topic.
To predict or calculate (weather, events, etc), in advance. 
To form an approximate idea of (distance, size, cost, etc); calculate roughly; gauge. 
To form an uncertain estimate or conclusion (about something), based on insufficient information. 
To highlight the subtlety between estimating and forecasting, consider for example the difference between asking to estimate the time of the day hours after consulting a working watch vs. asking to estimate the likelihood of a next day scheduled airplane to leave on time.
For estimating the current time, even though the recent information that we had is out of date, we can use it to estimate the current time. Given that the question is in present and there is an actual correct answer, this will be an estimate rather than a forecast.
To answer the question of the plane leaving on time, the question is in the future, there isn’t yet a correct answer, thus will be a forecast. To forecast, we can use historical data of previous departure times combined with other pieces of information, such as the weather forecast or other significant events for the day. Forecasting is estimating in advance.
All forecasts are estimates, but not all estimates are forecasts.
Forecasting is carefully answering the question about the future, to a transparent degree of certainty, with as little effort as possible .
II. Avoiding estimation
If estimation is so difficult, why estimate in the first place? If it can be avoided, then it should be avoided.
Kent Beck, the creator of XP Programming said “Alternative to estimates: do the most important thing until either it ships or it is no longer the most important thing”.
Other alternatives include the one advocated by Gojko Adzic. He proposes the use of budget as a design constraint for the delivery team, similar to other non-functional constraints such as scalability or performance; the delivery team is asked to come up with a solution that fits the budget constraint .
However, there are situations where these or other non-estimation techniques cannot be applied, and the need to know How long? needs satisfied.
The rest of paper will focus on these situations where an estimation cannot be avoided.
III. One question, in two flavors
For completeness, the ‘How long will it take?’ question comes usually in two flavors:
- having a scope in mind and a start date, our clients ask us ‘How long will it take to complete (the scope)?’
- having a start and an end date in mind, our clients ask us ‘How many items can we build (in this time-frame)?’
The forecasting techniques presented in this material are applicable to both flavors.
IV. Deterministic vs probabilistic
Dan Vacanti in his book ‘When will it be done?’  advocates that a mental shift needs to happen in the software community in the way we produce forecasts, a move away from a deterministic approach to a probabilistic one .
A deterministic forecast predicates that there is only one possible outcome to a problem, therefore such forecasts provide only one answer where 100% certainty is assumed (e.g. ’we will complete on 1st of December 2017').
A probabilistic forecast on the other hand accepts that there are multiple possible outcomes, and such forecasts will produce multiple answers, with each answer accompanied by a confidence level (e.g. ’we have a 75% confidence level to complete by 1st of March, and 85% confidence that we will complete by 17th of March’).
Dan makes a great analogy between weather and software forecasting.
Forecasting in software development is not too dissimilar to forecasting the path of a hurricane. Same as the storms, software development can be influenced by many factors outside our control and it is full of uncertainties. A probabilistic approach to software forecasting is more suitable.
V. Forecasting using models
The Wright brothers became more successful than others in building airplanes because they built models which were tested in wind tunnels. They overcame uncertainties by testing those models putting them through wind tunnels.
Similarly, to forecast in software we can build models for our situations, and use those models to simulate the uncertainties we face. It’s like putting our plans through a wind tunnel .
Before we look in more details at the elements of a Forecasting model, it is worth considering what George Box, one of the great statistical minds of the 20th century said: “all models are wrong, some models are useful” .
We should not take all models are as gospel, however even very approximate models can help us think about a problem, they tell us more than gut feel alone.
A model should be designed around things we don’t understand, rather than things we do.
“A successful model tells you things you didn’t tell it to tell you” — Jerry P. Brashear, Washington, D.C., consultant.
VI. A software forecasting model
The building blocks of a Forecasting model are:
1. A start date
2. A delivery team
3. Work that we want to do
4. A working method (necessary to complete the work)
5. Data (or lack of) about size of the work and speed of delivery
1. The start date
One of the most common errors in forecasting is the use of a wrong start date. Before we use a start date we need to make sure that the conditions to start effectively the work are met, such as the team is in place, ready, having all the skills and tools required to do the work.
If these conditions are not met, then it would be wise to publish the forecast as a duration only. Attention should be paid though if the likely duration could overlap with periods typical holiday periods.
2. The team
When we look at the team selected for the work, we should consider whether the team has the right skills to do the work, whether there is a good spread between those who can teach & create vs. those who can do & maintain vs novice and learners .
If the skills are not present, or if the team is not complete, then we should take this into account in the ramp up phase of our forecast (see more under the ‘Delivery Pace, S-curve’ section).
What about the pace of delivery of one team over other? Not all teams are the same, and if we know that one team is more effective than the other, then should we take this into consideration while we are building our model.
3. The work
We are estimating because we want to know when we would complete some work. The work can vary, from building something brand new, to enhancing existing solutions, to supporting existing ones (or a combination of the above).
The nature of the software is that at times we don’t even know what the work would entail. Even when we do know what we want, we might not know how the final solution would look like, or how we might build it.
Once we discover the solution, things get a bit easier, but we are faced with other challenges when we start building it, such as the complex interactions between people, the various external constraints that are imposed on us (time and/or money, resourcing, processes), the unexpected events that we need to deal with, the dependencies that we need from others and others need from us. We make new discoveries that might invalidate some of the early findings and the original solution might not be fit for purpose.
Typically, work is referred to as scope, and at times these terms are used interchangeably. The small subtlety is that scope might imply knowing what we want. As described earlier, at times this might not be the case, and work constitutes discovering of scope as well. When we build our model we need to take this into account as well. For the purpose of this paper, we will use the term of work that encompasses the scope plus the discoveries needed to arrive to the scope.
4. The working method
For the work that we want to perform, we need to select a working method, a way of working that helps us to complete the work. This method varies based on the nature the work and we might find ourselves constantly adapting the method to suit the type of the work we’re dealing at a given time.
To help navigate this landscape, we can consider:
- slicing the work into different phases: Discovery, Alpha, Beta, Live
- choosing a methodology that is best suited for these phases, remembering that in certain situations more than one method would be applicable. These methods can range from well-structured methods (such as Kanban or Scrum) to less structured (such as a time-box spikes or experiments). For instance:
- for Discovery, we can use experiment based techniques to surface emerging solutions. These could include activities such as user-research, technical spikes.
- for Alpha, the use of prototyping and further experimentation can be considered
- for Beta, methods such as Kanban or Scrum could be used to implement the findings from Discovery and Alpha
There are two types of slicing, one supporting the other.
One type is concerned with slicing of work into features. Scope tends to start off in a fuzzy state, and we need to work to transform it in sliceable pieces.
These features should be created as manageable chunks, taking in consideration the following:
- can each slice be allocated to a team?
- can development of those slices happen in parallel?
We refer to this type of slicing as decomposition strategy of the scope.
These features tend to group functionality in Features/Epics, which get further broken down into User stories and Tasks.
Features/Epics, User stories and Tasks are work items. These items have different granularities, Features/Epics having higher granularity than User stories. These work items need feed into our model.
The other type is concerned with slicing the work per phases. This is a sequencing concern driven by release scheduling (answering the question ‘what to release in what sequence?’).
A careful consideration needs given when slicing per phases is done. Consider whether multiple releases will be run in parallel, each with its own phase. What skills and team members will be covering phases such as Discovery, and what will be covering Build? Is one release being run by one team, or multiple teams work on the same release? Visualizing release cycles and phases per cycle helps building the picture for your model.
When choosing a working methodology, we should favor those processes that are measurable. A measurable process gives us essential data for our model.
What are we measuring? We are taking measurements on work items, measuring how long they take to complete (time), what is the rate of completion (speed).
For instance, in a Kanban method we can easily measure three simple metrics, such as work in progress, cycle time (which answers the ‘how long they take complete?’ question) and throughput (which answered the ‘what is the rate of completion?’ question). We can apply this measurement to all work items.
How to measure these metrics is outside the scope of this paper, and it is well described in Dan Vacanti’s Actionable Agile Metrics for Predictability book.
These metrics, especially the delivery pace (throughput), are important elements of our model.
5. The data
The final element of the model is the data, a numerical representation of concepts introduced previously:
- size of the work — expressed as aggregation of work items. For example, this can be the total size of a backlog.
- delivery pace — the rate at which teams delivers the work items. For example, in a Kanban system this is the throughput.
Some important aspects about the data need clarified. Questions such as “How can we obtain the data? Do we always have data? Is the data fixed? Can we trust the data?” need an answer before proceeding.
We can obtain the data by working continuously on:
- understating the size of the work — finding out the number of work items required to complete the work
- measuring our rate of delivery — measuring our delivery process
We should emphasize the continuousness nature of data gathering. As our understanding of what and how we build expands, the size of the work increases or decreases. Similarly, our rate of delivery can change in time.
The way to deal with this situation is to express the data points as assumptions (e.g. ‘we assume that we need to build between 20–35 user stories’ and ‘we assume that our rate of delivery is between 10–12 stories per week’). These assumptions should be clearly stated and well communicated.
At times we have very limited, to no data at all. For instance, it is quite common at the beginning of new deliveries that we have no data about the rate of delivery. For these situations, we deal similarly as when we have data:
- we capture our assumptions on delivery rate; these assumptions might be mere guesses, or views of experts based on similar experiences
- we start the delivery, start measuring our process, and as soon as we have enough data points we replace our initial assumptions with the measured assumptions. Both Dan Vacanti and Troy Magennis recommend that we only need between 7–11 data-points for this transition.
What about risks? Our assumptions could carry significant risks, on both the size of the work, as well as on delivery rates. If those risks materialize then our forecasts could be significantly impacted.
To deal with this situation, we should work on turning these risks into additional data-points.
For instance, we could express our risk as “we have a 50% likelihood that our scope will increase between 12–20 stories if the performance results breach our page load NFRs”.
If risks are turned into data, and those are part of our model, then the negative impact on the forecast accuracy is reduced.
Delivery Pace S-curve
A special mention needs to be made to delivery rate’s characteristics.
The delivery rate throughout the life-cycle of a feature, release or project can wary, at times materially significantly.
For instance, it is quite common for projects that the plotted delivery rate over time takes the shape of an S-Curve, showing a slower rate at the beginning and end of the project.
The S-curve is made up usually from multiple phases:
- a ramp-up (or starting) pace as the team is forming and learning, or transitioning to new work
- a stride pace, which is a sustainable delivery rate once we reach steady-state
- a ramp-down pace as the team is in the final delivery phases
Observed over a long period of time, even if a team is maintained and doesn’t need to ramp-up, the S-curve can manifest itself as the team transitions from one feature to other, a “rollercoaster” of delivery rates. This has can have a significant impact a longer term forecasting.
The S-curve needs taken into consideration while building our model. A “rollercoaster” effect can have a big negative impact on delivery, and best to be avoided. If this cannot be achieved though, then is needs modelled.
The role of the system predictability
The quality of our data determines the accuracy of our forecasts.
The more predictable our system is, the more we can rely on our data, the more accurate our forecasts will be.
Therefore, teams should pay attention on building predictable working systems. How to build such systems is outside the scope of this paper, however teams should not forget about this important aspect. Dan Vacanti in his Actionable Agile Metrics for Predictability book describes one way of building predictable systems using a Kanban system.
VII. Forecasting is a continuous process
Forecasting is a continuous process. As soon as we have new information about our model we should re-forecast.
To be able to rapidly re-forecast and allow the team to perform a number of ‘what-if’ scenarios, we want to keep the cost of forecasting low.
“Goal of the forecasting is to know earlier than later if we’re in trouble”, Troy Magennis .
If the cost of reforecasting is high, likely it won’t get done. Make short and long term forecasts. Shorter forecasts will be more accurate than longer ones.
VIII. Putting the model through wind-tunnel
Now what? We have the elements of the model, now we need to put them together, build the models and put them through the ‘wind tunnel’, as Wrights brothers did with their model aeroplanes.
An important tool in the wind-tunnel arsenal is the Monte Carlo simulation, which we need to introduce before moving forward.
A brief introduction into Monte Carlo simulation
Sam Savage in his book “The Flaw of Averages”  defines eloquently the Monte Carlo simulation.
The last thing we do before climbing on a ladder to paint the side of our house is to give it a good shake. By bombarding it with random physical forces we simulate how stable will be when we climb on it.
A Monte Carlo simulation is a computational technique similar to shaking the ladder to test the stability of uncertain plans. The technique bombards the model with thousands of random inputs, while keeping track of the outputs.
Monte Carlo simulation is a statistical sampling simulation technique, and it is made up four steps:
- define a domain of possible inputs
- generate inputs randomly from domain
- perform a computation
- aggregate results
This can be translated to the model we’ve described in this paper as:
1. domain of possible inputs — this is the delivery rate (throughput)
2. start a simulation having the selected start date; set the end date same as the start date
3. generate inputs randomly from domain — select randomly a delivery rate
4. perform a computation:
- use the randomly picked delivery rate and deduct the delivery rate from the number of work items that we need to complete;
- increase the end date of the simulation by 1;
- check if the remaining number of items after the deduction is a zero or negative value — if it is, then stop the simulation, otherwise continue with step 3
5. once step 4 has been completed, we have completed one simulation and have one delivery date
6. we continue creating thousands of simulations by repeating steps 2–5
7. once all the simulations are done, we create an aggregation by grouping the number of simulation end dates together and calculating the % of simulations for each of the dates
The aggregates results look like a histogram.
Alternatively, we can present the output of a Monte Carlo simulation in a table like format.
The % of simulation completions act as our confidence levels.
In the spirit of transparency and using a probabilistic approach, when we present back our results we should consider:
1. present a range of date, such as — “based on the assumptions behind the forecast (<list here the assumptions>) we have:
- a 75% confidence level of completing by 23rd of October, 2017
- a 85% confidence level of completing by 26th of October, 2012"
2. surface the impact of risks — “ the risks (<list the risks here>) increase the duration by 25%. Let’s look at them, and see how we can eliminate them.”
We work with the client to choose acceptable confidence levels for the project.
What do experts say?
Avoiding estimation is best. If estimation cannot be avoided, then building models that can be forecasted using Monte Carlo simulations is good starting point.
What about expert estimation?
Reach out to experts. Ask for an expert estimate. Carefully blend in expert estimates with forecasts. Diversity on thinking mitigates missing a big risk item through cognitive bias or unconscious incompetence.
Finally, putting it all together
To put it all together, build a series of models, aiming to learn new information from them.
Define what work is, slice the work per phases. Identify what method of estimation can be applied for each phase. Sometimes the only thing we can do it is to time-box a phase.
Define your decomposition strategy. At times, this is done in parallel with the phase slicing, sometimes not.
Form your teams, paying attention to start dates, availability and skills.
Allocate work to teams. Forecast where possible using Monte Carlo simulation. Offer ranges of dates using different confidence levels. Show the impact of the risk on the date. Blend in carefully expert estimates.
Don’t forget about the S-curve, or the rollercoaster.
Visualize the plan. Ask other stakeholders to look at it. Is the sequencing right, have dependencies been identified?
Special thank you for Martin Aspeli (http://martinaspeli.net/) for feedback and edit of this article.
-  Planning Fallacy, https://en.wikipedia.org/wiki/Planning_fallacy
-  Definitions from Collins Dictionary
-  Troy Magennis, “Forecasting Using Data”
-  David Evans and Gojko Adzic, “Fifty Quick Ideas to Improve Your User Stories”
-  Dan Vacanti, “When will it be done?”
-  Quote also attributed to Edwards Deming, ‘father of quality management’
-  Sam Savage, “The flaw of averages — why we Underestimate Risk in the Face of Uncertainty”