Forecasting, strategy

Published in

Forecasting using data

20 min readJul 7, 2017

Chapter 2

This chapter looks at forecasting goals and strategies. It offers definitions about the difference between guesses, estimates and forecasts and why you care. It also pays tribute to Carl Sagan’s baloney detection tools by showing how to apply them to our world of software forecasting (using data).

Goals of this chapter:

Learn about “assumption based forecasting” as a strategy.
Discuss forecasting versus estimation versus guessing.
Discuss how to test forecasting models for reliability and predictive power.
Learn how to know when the information you are given is likely baloney!

Assumption based forecasting

If forecasting is the art of carefully answering questions that help make more informed decision as I believe, then forecasting is a complete waste of time if it’s not directly answering a well-considered question leading to making a better decision.

Forecasts aren’t just numerical or date values. A forecast is often a numerical value in addition to the embedded assumptions that allow that forecast to be reliable. We have to make assumptions when we model the future, and these may or may not come true in the real world. For example, we make assumptions that a team will be in place by an assumed start date, often they aren’t. We make an assumption that historical data and team performance will remain un-changed, it won’t. Littered throughout our analysis are stated and often unstated assumptions. It’s these assumptions we are actually forecasting. If they all come true, so will the date, cost or numerical value we predicted.

Forecast the assumptions to hit values, dates or dollars.
The value, cost or date will happen if our modeled assumptions ALL come true, and we have modeled the right assumptions.

Whenever possible I reverse the typical approach used to forecast software projects. I fix the delivery date (or a set of them), and work backwards to what is needed to hit that date. After presenting forecasts as dates to clients and my own management teams, it was obvious there was always an expected date in people’s heads, so I just started asking for it in advance. When I was asked to forecast how long, my question was “When do YOU need it?” This avoids the uncomfortable moment when faces in the room contort in either elation or sadness depending on how well our analysis matched their assumed and desired expectation.

Analyzing what is needed to hit a given date and presenting it in a positive way always receives an elated response because it’s a “it can be done” statement. Often the resulting requirements are way too expensive or impractical, but all of the important puzzle pieces are on the table to discuss and trade. Those puzzle pieces are a list of assumptions; necessary conditions for THAT desirable forecast date to be honestly viable. It’s hard to say just work weekends when you also see that you need another test environment, a designer and access to specialist equipment. It’s less about how much they like or trust you personally, or a negotiation about your work ethic; It’s about the entire system needed for delivery.

Get the (often pre-conceived) date people expect and forecast the assumptions to hit that date.
Stakeholders have a date in their head. Pry it out. Present what’s required to come true or be managed to hit that date. Don’t negotiate to do more in less time, negotiate what’s needed to deliver more in less time (or doing less).

If I can’t pry a date from the stakeholders, I present a set of dates. The dates I pick are roughly arbitrary at nice week, month or quarter boundaries. I might do two or three of these. The assumptions are often the same, but the estimates and quantities differ. The number of people, costs and risk values might be higher or lower depending on how long the team has to satisfy an assumption. The discussion comes around to, if all of these are POSSIBLE, how much does it make sense to economically spend to deliver by one date versus another later date. Giving a single delivery date forecast needlessly anchors that date and avoids discussions around cost of delay or alternative investment options.

Forecasting is a discussion, not a negotiation.
Avoid trying to force assumptions into estimates that hit a date you didn’t predict. Turn the conversation into a discussion about what investment level is needed to succeed.

Every piece of work has a difference cost of delay value to the customer market, and the lifelong value to the company. Some features inhibit competitors and pay off tenfold over many years, others, not so much. If the delay cost is low, then later dates and lower investment may well be the optimal decision. If there is significant cost of delay, perhaps greater investment is the more correct decision. This is a superior conversation than being asked “can you do a lot more work for a lot less time or money?” This just sets the feature and project up to fail on quality in order to hit the tight timeline. Nobody wants that outcome.

If nobody will give you a “desired date,” make one up. Discuss the options of build cost versus cost of delay.
Model the assumptions required to hit a series of dates. Negotiate what the right investment level is for the stakeholders given their sense of cost of delay (or calculated cost of delay).

The other advantage of assumption based forecasting is that it offers an ongoing report card on how well the model and forecast are tracking. The moment an assumption is missed or project circumstances change, the forecast should not be expected reliable. Reporting on the progress towards making and keeping assumptions means the forecast will fall within originally intended calendar dates and certainty. Assumptions become the status report meaning nobody has to hope or lie about achieving a faster pace later in the project to compensate for being behind schedule.

Use assumptions as the forecast scorecard and status report.
Track met and at risk assumptions. Any assumption that fails means the forecast is out of date and likely invalid.

A third benefit of assumption based forecasts are they can be reviewed by others. Making the assumptions public and taking feedback seriously about whether those numbers are timid or aggressive helps quickly see where things will go wrong via crowd consensus. Forecasts fail because we determine too late that an assumption isn’t valid against actual data. The later we detect a problem the fewer options remain to solve it. Knowing earlier opens up alternative decision options, and less corrective effort.

Knowing the reasons why people think an assumption is either too aggressive, or timid helps increase confidence that the actual delivery system is well understood. Different people know different parts of the process better than others, and surfacing the different opinions help everyone learn about their process and improvement opportunities. People often find it easy to critique someone else’s opinion, even if they are less clear about their own! As annoying as this is, getting assumptions in front of a wider team helps find poor assumptions earlier.

Another myth I hear is that a forecast has to be exact. It doesn’t. Just like being in the Serengeti with a camera crew shooting a wildlife film about how lions vigorously protect their offspring, you don’t have to be fastest, you just can’t be the slowest. Any proposed forecasting method just has to be better than what you do now, or at least less expensive with a similar result. Exactness is secondary to understanding why something is possible or impossible; better or worse; cheaper or more expensive; a great idea or just mediocre. Don’t hold any new way of forecasting technique to a different standard than what’s currently done.

Any new forecasting method just has to be better than the current method. Avoid the “it’s not perfect” trap.

Probing for the first failing assumption — tipping points

It’s in everyone’s interest to get an answer quickly. First goal should be to determine if a project or feature forecast is ruled out just based on violating one unbreakable assumption. If you can plug-in a minimum or maximum value for any assumption and the feature or project becomes unviable, then there is no need for further analysis. For example, if the minimum estimates for task time of a feature or project are summed and it’s already too expensive in time or cost, then your decision is made; “No-go.” Don’t proceed as is, revisit scope expectations and remove something.

You are probing for the first assumption that is impossible to meet. If all assumptions turn out possible, then it’s time to tighten the estimates on the most sensitive assumptions. The most sensitive assumptions are those that impact the resulting costs or time the most.

A technique I employ to validate the assumptions it to look for tipping points. For each assumption get the team to determine what value would flip the decision from “OK,” to “Un-acceptable.” Throw out an estimate and if it’s still sounds OK to the group, double it, and keep doing that until you get a No.

Knowing the threshold values of decision flipping allows expensive analysis time to be allocated in the right spots. The riskiest assumptions are the ones where the current estimates fall close to those thresholds. Knowing the critical assumptions helps put a value on the time spent discussing those estimates with the team, and how eager you need to obtain corroborating evidence.

Capture any tipping-point values where an assumption goes from “OK” to “Un-acceptable.”
Spend time on assumption estimates where the estimate is close to these tipping-point thresholds. If an assumption hits one of these points, no further analysis will change this decision.

Assumption based forecasting recap

The process of assumption based forecasting comes down to six steps –

1. Ask stakeholders when they “expect” delivery. If they don’t know, make up your own targets.

2. Determine what is necessary to hit that date. These are assumptions. Get rigorous feedback on any missing assumptions.

3. Determine what value makes each assumption untenable. These are tipping-points that make a forecast viable or comical.

4. Get rough estimates for each assumption. You can stop the moment one assumption fails by falling past the tipping-point. This forecast is a no-go.

5. Spend time getting data and doing more analysis (firming up the estimates) for assumption estimates that are closest to their tipping-point values. You can stop the moment one assumption fails.

6. Always share the assumptions when giving the forecast. Use the assumptions as the status report of the forecast validity going forward.

Sometimes a guess is all that is needed. Sometimes detailed analysis involving gathering more data is warranted. Assumption based forecasting helps understand what level of scrutiny a numerical value is needed. Should we guess, should we roughly estimate, or should we spend more time modeling and forecasting a value?

Forecasting versus Estimation versus Guessing

There is a subtle difference between forecasting and estimating. Forecasting is estimating in advance, and estimating is carefully forming an opinion and calculating approximately. All forecasts are estimates, not all estimates are forecasts (some are imprecisely measured actual current values, an estimate of something available to measure more accurately).

Forecast: To estimate or calculate in advance; predict or seek to predict.[1]

Estimate: to form an opinion or judgment about to judge or determine generally but carefully (size, value, cost, requirements, etc.); calculate approximately[2]

A lot of estimates occur in the software world that would be better classified as guess. We neither have enough information to satisfy the “carefully” test the estimate definition mandates.

If you are guessing, then just stop. Guesses presented as fact makes it appear the value stated or written has more information than it does. If you are guessing, say so!

Guess: to form a judgment or estimate of (something) without actual knowledge or enough facts for certainty; conjecture; surmise.[3]

You are reading this book, so you obviously have an opinion about forecasting using data versus guessing or expert estimates alone. I certainly do, but it may not be the one you think. I put a lot of weight on expert estimates, but I don’t solely rely on them. I forecast using data and then blend well considered estimates with forecasts. Forecasting works best when we explore a higher number of possible outcomes from diverse experience and skills. Diversity of thinking helps neutralize missing a big risk item though unconscious incompetence or cognitive bias, of any one individual or group.

I’m not saying that experts should estimate everything. A lot of the pain around estimation in the software world is that it is done to the same level of detail for every item. If a piece of work is important and needs to be completed no matter what, why estimate it? Just get on with it. I only spend the time forecasting and estimating as the last resort.

When the answer to a question might change the path forward in a non-reversible (economically) way, then it’s worth estimating and forecasting. Stop estimating and forecasting the moment that question is answered. How much effort do I put into estimating? For the estimates that answer an assumption that is closest to its tipping-point, a lot. Others, not so much. Drive effort in estimation by how sensitive being wrong on that value will impact the final outcome. Estimate the big ones, guess the rest.

Definition of forecasting that works best for me:

Forecasting is carefully answering a question about the future, to a transparent degree of certainty, with as little effort as possible.

Time for a quick quiz, are these methods guessing, estimating or forecasting? Think about how much information is on hand by the participant. Think about what you would need to get to move from guess to estimate, and estimate to forecast.

1. You have been locked alone in an empty, dark, windowless room for at least one week with no watch or timekeeping device. What time is it?

2. You had a smart-watch on when you checked the time earlier today. Its battery is now blank. What time is it?

3. You have a regular correctly operating watch that gave the correct time recently. What time is it?

4. What time will tomorrows 9am train leave from a station in Switzerland?

Question 1 demonstrates a guess. If you survived without water and food, and the question was still relevant to you, all you can do is take a stab in the dark (pun intended) about the time of day.

Question 2 demonstrates that although you had recent information, it’s out of date. But given you have this information you can estimate what the current time is. Because the question is in the present time, and there is an actual correct answer, this will be an estimate rather than a forecast.

Question 3 infers you have an operating timepiece. And it was correct recently. However like any time measure, its approximate, so this is still an estimate, but a much more confident one.

Question 4 is easy to answer. Given my experience with Swiss trains, this will be a forecast because it’s in the future, and my forecast is 9am and not a minute later! Not all systems are as reliable as the Swiss railway network. How can you tell if you should expect you model to be a reliable predictor of the future?

How do you know a forecast model is reliable?

All estimates and forecasts are wrong. We are trying to predict the future, which is likely impossible if done too often. Forecasting is more accurately defined as forming an opinion about the future and using that opinion to make better educated guesses, I mean decisions.

A forecast being “right” isn’t only that it turned out as predicted, it is that it was right the right amount of times. To be reliable, the forecast needs to match eventual outcome to the probability stated.

There are two main techniques I use to determine if my models are reliable. Back-testing before work starts, and Assumption Testing once work has started.

Back-testing (before work has started)

The first test to determine whether your model is even remotely working is to forecast something from the past. Hypothetically go back in time and see if your model has predicted where you eventually ended up. This is called back-testing, and it should always be done on a forecasting model before you tell others the result.

A quick way of back-testing our forecasting models is to go back three months. Observe how many features there were in the “let’s do these” list. Using your model, forecast how many of those features will be completed by today. Compare that with how many features actually got done. How close did they match? What percentile of probability did your model predict?

I don’t want to give the impression that if you back-test successfully, then the model “works.” Just because your model shows promise doesn’t mean that they information it has used is still current. Imagine using 1900 weather observations to forecast pack ice in the Northern Passage in 2016. You would predict much more ice than there was. This problem of the forecast model just predicting something in the past is called “over-fitting.” Overfitting says that the data we train our forecasting models on is incomplete, and your future mileage may vary. Systems change, problems being solved are different. A good back-test is a good start, and the question about the model now becomes, “will it work in the new context, the next project?” But, it does prove there are no simple typing errors. In a sense, it is the spelling and grammar checker of models.

Assumption-testing — assumptions coming true (after work has started)

The more assumptions we get right in our models, the better the forecast outcome. This is why the first measure of a correct model is confirmation of assumptions by similar actual outcomes. For example, the team was in place when we said it would be, or the defect rate was within the range that we said it would be. Seeing these major assumptions coming true as a project or feature progresses is a solid indicator the eventual forecast is moving into likely territory.

To test how reliable a forecast is during a project, each assumption needs to have a reliable test. Tests based on actual observed measurements are great. You confirm actual measures fall within the low and high bound of the assumed estimate. If measured values fall outside the range the forecast is in jeopardy.

Another good set of tests are for binary assumptions. They come true when expected or they occur early or late. Most often these binary assumptions being late will cause the outcome will more than likely be late.

Assumptions that are hard to test don’t add much value. When documenting assumptions write down the tests and how often they will be reported. This make it transparent to everyone the first moment they will find out that a forecast is falling short (actual outcome is going to take longer than expected). A good blend of binary and range estimate assumptions are good. The better the assumptions and clearer the tests, the more likely the forecast is reliable.

Every feature and project will have different assumptions and tests. A minimum set of assumptions should cover these aspects –

1. A measure of the ability to start delivery (team, equipment, space).

2. A measure of initial scope and size.

3. A measure of expected rework and scope growth.

4. A measure of expected progress of scope delivery over time.

5. A measure of acceptable quality to be able to deliver to customers.

6. A measure of the ability to deliver to customers (environments, process, logistics).

Most assumptions will fall into one of these categories. A warning indicator for me is when a forecast is given that hasn’t covered one of these aspects. Some assumptions are the same for every project, for example the expected quality requirements. Still call these out specifically to the team so that everyone understands expectations and can give reasons why that assumption may be too tight or weak for this feature or project.

Coming up with the assumptions as a group surfaces important conversations, especially when they conflict with each other. Bringing these conflicts to resolution earlier in a project causes fewer problems later when stress is high. If we can’t agree now when there is nothing to lose, imaging how hard it will be after people have sunk time and money into “their way.”

Here is a sample set of assumptions for a proposed new website feature for discussion. These assumptions include their tests -

1. Team will be hired and trained with three developers who can code to our web standards, and two testers to write automated test code to our automated test standards, in-place by 1st August 2016. First accepted stories delivered by 7th August.

2. Original backlog story count is between 20 to 30 stories.

3. Defect counts found by others outside of the sprint they were signed-off by the team will be between 0 to 2 per story.

4. Open defects in the backlog will not exceed 5 per developer (15 total) at any sprint end. This count only includes actual code defects agree by the product owner (product owner expected to triage within 2 days of submission).

5. Team throughput will be between 3 to 5 pre-split stories from the backlog.

6. Staging environment for performance testing is in place and available to the development team by 1st September 2016.

7. Production environment ready for live releases by 15th September 2016.

There is a mix of assumptions here. Assumption 1 confirms the team is in place and capable of delivering stories. Just having the team in place is a weak assumption, the flow of stories is the measurement that counts. Assumption 2 confirms the scope is as intended and agreed by everyone. Assumptions 3 and 4 spell out the expectations of quality. It makes clear that escaped defects outside of the sprint boundary is expected to be rare, and that the team isn’t expected to carry defects and prioritize feature stories before defects. A growing defect count left unchecked makes the progress of feature stories a poor predictor of the future. This assumption aims to make this visible as early as possible. Assumption 5 is the rate of delivery required to hit the agreed forecast date. Assumption 6 and 7 make it clear that certain staging and production environments need to be in place for successful delivery.

How can you tell when an estimate of an assumption told to you is a guess? How can you tell its Baloney? Let’s look skyward to a cosmologist for some ideas.

The Fine Art of Baloney Detection — Carl Sagan

“Like all tools, the baloney detection kit can be misused, applied out of context, or even employed as a rote alternative to thinking. But applied judiciously, it can make all the difference in the world — not least in evaluating our own arguments before we present them to others.” ~ Carl Sagan

Carl Sagan in his book, The Demon Haunted World, Science as a candle in the dark (Sagan, 1997) devotes a chapter to avoiding being misled by baloney. I’m not going to do justice to his work in this summary (buy and read the book) but I’m introducing it here to build critical thinking that helps us understand whether we are hearing a guess or an estimate. We need to be aware when expert estimate is veering into the bright headlights of a guess. I find this thinking lens works for me.

His basic toolkit consists of nine thinking tools –

1. Wherever possible there must be independent confirmation of the “facts.”

2. Encourage substantive debate on the evidence by knowledgeable proponents of all points of view.

3. Arguments from authority carry little weight — “authorities” have made mistakes in the past. They will do so again in the future. Perhaps a better way to say it is that in science there are no authorities; at most, there are experts.

4. Spin more than one hypothesis. If there’s something to be explained, think of all the different ways in which it could be explained. Then think of tests by which you might systematically disprove each of the alternatives. What survives, the hypothesis that resists disproof in this Darwinian selection among “multiple working hypotheses,” has a much better chance of being the right answer than if you had simply run with the first idea that caught your fancy.

5. Try not to get overly attached to a hypothesis just because it’s yours. It’s only a way station in the pursuit of knowledge. Ask yourself why you like the idea. Compare it fairly with the alternatives. See if you can find reasons for rejecting it. If you don’t, others will.

6. Quantify. If whatever it is you’re explaining has some measure, some numerical quantity attached to it, you’ll be much better able to discriminate among competing hypotheses. What is vague and qualitative is open to many explanations. Of course there are truths to be sought in the many qualitative issues we are obliged to confront, but finding them is more challenging.

7. If there’s a chain of argument, every link in the chain must work (including the premise) — not just most of them.

8. Occam’s Razor. This convenient rule-of-thumb urges us when faced with two hypotheses that explain the data equally well to choose the simpler.

9. Always ask whether the hypothesis can be, at least in principle, falsified. Propositions that are untestable, unfalsifiable are not worth much. Consider the grand idea that our Universe and everything in it is just an elementary particle — an electron, say — in a much bigger Cosmos. But if we can never acquire information from outside our Universe, is not the idea incapable of disproof? You must be able to check assertions out. Inveterate skeptics must be given the chance to follow your reasoning, to duplicate your experiments and see if they get the same result.

These thinking tools help uncover how confident you should be in what you are told. My process is to turn each tool into a set of open-ended questions that probe for the underlying rigor and thinking logic that went into a decision. Often these questions help well-intended people giving you an estimate think through and gain confidence in their answer.

Here is a starting list to get your baloney detection juices flowing with respect to estimation and decision making –

1. Wherever possible there must be independent confirmation of the “facts.”

a. Did you run the [start date/estimate] past the upstream teams that need to finish before we can start?

b. How did you calculate the [defect rate]?

c. Did we use data? From what source(s)? Can I get a copy?

d. Who analyzed the data?

e. Who else might be able to shed light on this estimate?

2. Encourage substantive debate on the evidence by knowledgeable proponents of all points of view.

a. Who disagrees with this approach?

b. Who disagrees with the data used in this approach or estimate?

c. Can I get us together with a group of dissenters to hear their reasoning?

3. Arguments from authority carry little weight …

a. Who will do the actual work? What’s their perspective?

b. What pressures might the team have felt when deciding this estimate?

4. Spin more than one hypothesis…

a. What else did we consider?

b. Why was this one chosen?

c. What would be the simplest reason our approach might fail?

d. What ideas might we have missed that could change our opinion?

e. Who can we ask to dis-prove our data and estimate?

5. Try not to get overly attached to a hypothesis just because it’s yours…

a. What else could happen?

b. What else could cause?

c. Do we have all the viable ideas on the table?

d. Say, “I don’t like my idea” (and wait to see who tries to save it).

6. Quantify…

a. How did we quantify this estimate?

b. How else can we calculate this quantity?

c. Who might have a better way of getting this number firmer?

7. If there’s a chain of argument, every link in the chain must work …

a. What assumptions have to happen for this to work?

b. How can we confirm those assumptions are valid?

c. How might we know earlier if any assumption is going to fail?

8. Occam’s Razor. This convenient rule-of-thumb urges us when faced with two hypotheses that explain the data equally well to choose the simpler.

a. What if we did nothing? Would the outcome change? (very simplest!)

b. Which option has the least effort? Would that do?

9. Always ask whether the hypothesis can be, at least in principle, falsified…

a. What assumptions need to hold true for this forecast? (if there are none, worry)

b. What could go wrong if we take this path? (if there is nothing, worry)

c. How would you break this idea? (if no ideas are given, worry)

d. Have we tried this approach in the past and had it fail?

e. Why will it be different this time?

The idea is that by asking questions you get to hear more about the analysis and logic that went into the forecast you heard. It’s fine if these questions can’t be answered, perhaps it un-economical to do for the question being asked. It’s more worrying that these questions aren’t being considered at all, as this information that directly changes how certain you should feel hearing the forecast. Forecasting is about answering the right questions, to a transparent degree of certainty, with as little effort as possible.

Summary

This chapter has introduced a new strategy for forecasting, “Assumption based forecasting.” Assumption based forecasting surfaces and documents all of the required and necessary things that are required to come true for reality to match forecast.

Assumption based forecasting has the following benefits –

People seeing the forecast can see the assumptions and disagree earlier.
People have a date in their head. The assumptions communicate how likely it is to achieve that date. And if not, why.
A range of forecasts can be given and the discussion becomes about what investment is warranted to hit the “desired” date rather than just a negotiation about your team working harder.
Assumptions can be tracked as status indicators. If an assumption is missed, then the forecast is in jeopardy.

This chapter also introduced the definitions of guesses (personal judgement), estimates (a careful approximation), and a forecast (an estimate of something in the future with transparent degree of certainty). It also offers some ways to detect is you are given an estimate or forecast, but it is actually a cleverly disguised guess by using open ended questions based on Carl Sagan’s baloney detection toolkit.

Finally, this chapter introduced some ways to estimate how reliable a forecasting model might be, before and during actual delivery progress.

[1] http://www.collinsdictionary.com/dictionary/american/forecast

[2] http://www.collinsdictionary.com/dictionary/american/estimate

[3] http://www.collinsdictionary.com/dictionary/american/guess

Next chapter: Chapter 3 — Size, Growth, Pace Model