Using Probabilistic Forecasts for Feature Planning

11 min readSep 16, 2023

“We don’t need to plan, we’re agile!” — if you have heard this before, you’re not alone. I’m convinced that by now you are aware that this is not the case and we plan and replan a whole lot, for example if you’re doing Scrum you plan in Sprint Planning and the Daily Scrum. However, those plans are rather short-term. While this might be working well in some environments, in others mid- to long-term planning is required.

In the spirit of the Agile Manifesto, we tried to find better ways of doing this. We’re running already probabilistic forecasts, so we were looking into ways we can apply this to features that take a couple of weeks to implement.

Forecast instead of estimate, learn how to apply this to features in this post — Source: Freepik

In this post, we’ll share how we ended up using forecasts using Monte Carlo Simulations to predict when features will be done, how we’re using this for planning releases, how we’re visualizing it, and how this helps us take action.

Features, Epics & Product Backlog Items

Before we dive deeper, let’s make sure we’re aligned on the language. During this post, I’ll be talking about Features. For us, a feature is something that should be of value to our users, often driven by problems the customers have and that we want to solve. Such a feature tends to take 4–6 weeks to implement.

In some screenshots in this post, you’ll also see the term Epics. Epics are the type we’re using to track the features in our backlog management system. We’ll break features down into smaller units of value, those items are referred to as Product Backlog Items (PBIs for short). These are the items that are pulled into Sprints and should take only a couple of days to finish.

In our Backlog, Features are tracked as Epics and have PBIs as Children

You might use different words, hierarchies, or colors in your context. But the above terminology is what will be used throughout the rest of this post.

Why use Forecasts for Features Anyway?

Now that we’re aligned on the language, you might wonder, why would I want a forecast for features? Should we not simply go for a small version of the feature, release it, and see whether it needs adjustments or not? This might work well if you can deploy continuously and collect feedback from your “production environment” easily. If this is your environment, you should go for such a way of working (but please read on if you’re curious anyway).

If you can, you should absolutely go for Continuous Deployment. Source: Red Hat

In our organization, we have many products that have longer release cycles. This can range from a couple of weeks to multiple months. If you have products that involve hardware changes or develop software that runs in isolated systems that are not connected to the internet, it might not be possible to update too often.

Also, often it’s many products that work together. They might be set up on the premises first and it’s verified with the customer that everything is working as expected, before the full systems are shipped to the places where they run. Those acceptance tests cannot easily be moved.

Thus, for the organization, it’s relevant to know whether a feature can be done by a certain target date or when we expect it to be done. When do we plan an acceptance test? How many features will fit till that point in time?

Sometimes our products actually need to go on a ship to be delivered — Source: Freepik

Previous Approach to Feature Planning

We’ve been using probabilistic forecasts using Monte Carlo Simulations for a while already. The main use case so far was short-term planning, for example, checking how many items most likely will fit in a Sprint. If you’re new to probabilistic forecasting, I suggest reading this article to give you an overview and an introduction:

Using Monte Carlo Forecasts in your Scrum Events

If you read some other articles from me, you might be aware that I’m an advocate of flow metrics and that I like to use…

medium.com

Planning beyond a short-term horizon was more difficult, as it would have required us to create all the PBIs for a certain feature. And that would have felt like a lot of waste.

To plan for features, we looked at what we managed in the past and tried to extrapolate it into the future. You do this once and hope that it works out.

This made it hard to communicate with stakeholders about the progress. Will we manage to hit the target date? Do we have space for another feature? Or how much later would we expect the release to be ready if we were to add another one?

“When Will It Be Done?” — Source: Goodreads/D. Vacanti

As we were not able to answer those questions, we wanted to change something.

Forecasting a Release with Multiple Features

We wondered how can we forecast a release with multiple features. That means, we’d like to know how likely it is that we reach a certain target date with the features we plan for this release. If we have not started yet, we would also like to know when we’re most likely done with what we planned so far so start communicating the next target date for an upcoming release.

We want to run forecasts for our releases to help the business — Source: Freepik

Forecasting Features using Monte Carlo Simulations

To limit waste, we’re breaking down the features to PBIs only if we’re certain we’re going to work on them.

The good thing is, we don’t need them to be broken down. We can run a forecast for the features. In theory, we can, in practice, it turned out that the results were not useable for us. The reason for this is, that because we have only very few days where features are set to done, the Monte Carlo Simulation does not perform well.

If your throughput is mainly 0 per unit of time, the Monte Carlo Simulation will not be accurate — Source: Actionable Agile Demo/55 Degrees

One option could have been to adjust our unit of time, for example, use weeks or months instead of days. But for simplicity, we wanted to keep it the same for all our forecasts.

Forecasting based on Product Backlog Items of a Feature

As forecasting the features directly was not working well for us, we reverted to what we already were doing, trying to forecast based on the individual PBIs of a feature.

Feature that is broken down

If a feature is already broken down, we can forecast when all the PBIs are done. Once that’s the case, it will be done.

As a prerequisite, we try to add all potential items we think we need to get the feature done (for example by applying a technique like User Story Mapping).

User Story Mapping

User Story Mapping is a dead simple idea. Talk about the user's journey through your product by building a simple model…

jpattonassociates.com

The PBIs don’t need to be refined yet, often they just contain a title. They’ll be further refined before we start working on them. Also, we’re changing things frequently, meaning we add or remove items. Once we start working, we notice things that we did not consider in the beginning and that are obsolete.

It’s in the doing of the work that we discover the work we must do — Woody Zuill

Feature that is not broken down

For us, most things won’t be broken down until before we get started to work on them. We did not want to change that approach, as we didn’t want to introduce waste into the system. Instead, we simply take a value based on the recent features we were working on. As we try to right-size our features, they should all be in a similar range and we should not have big outliers.

We opted to use a value slightly above average. You could also use the median, the maximum value, or whatever would best fit your situation.

Number of PBIs per Feature that we worked on in the recent past

Putting it all together

Last but not least, we need to forecast based on the remaining items. Our assumption is that we’re working on one feature at a time. Depending on your setup (for example, multiple teams working on the same product) this assumption might not work for you.

We’re forecasting each feature individually, and include the remaining items from previous feature in the forecast for the later ones. The last feature of a release determines whether we’re on track or not.

Let’s assume we have 2 features pending: Feature A and Feature B. We already started with Feature A, it’s broken down into PBIs and we completed everything but 3 items. We forecast when this feature is done by checking when the remaining 3 PBIs are done.

Now we forecast Feature B. To do so, we take the amount of items pending for this feature (or the default, if it ain’t broken down yet) and add the 3 pending items from Feature A. So Feature B will be done when we complete the pending items for Feature A as well as all the items for Feature B.

This forecast is run once a night, so every morning we can judge whether we’re still on track for the release and the individual features, or whether some action is needed.

Visualizing Forecasts

To inspect the results from the forecasts and to adapt our plans accordingly, we visualize the forecasts as follows:

Let’s dive a bit deeper into the individual elements of this.

Wet and Dry Predictions & Traffic Light

On top, we see which release we’re talking about and what would be our target date for this release. Next to it, we can see two sets of predictions: dry and wet.

Predictions with indications about the likelihood to reach the target date

Both predictions are based on the number of items left in the release. While the “dry” one uses the real number of items, the “wet” tries to factor in some “unexpected items” that will come up. That could be bugs that need to be fixed or something we discover along the way.

Next to it is a simple traffic light, that indicates the need for action based on the 85th percentile of the wet forecast:

Red means the forecast predicts that we’ll be missing our target date — we should act now.
Amber means we’re “just managing” the target date — we’re on track but there is not much slack.
Green means if we’re ahead of schedule — no action is needed.

Item Forecasts

We’re also forecasting daily how many items we’ll be managing over different periods. We’ll use this for short-term planning (like the next Sprint) and to give a feeling for how likely a target date might be.

Forecasts of how many items we’ll manage over different periods of time including a recommendation.

There is also a “recommendation” as we run forecasts that take “different histories” into account: Do the next 30 days look more like the last 14 or the last 30?

This is based on an article from Nick Brown, and I’d recommend diving deeper into this if you want to learn more. In a nutshell, the higher the score, the closer the prediction was to the real number of items we closed.

The Full Monte

Just how accurate are Monte Carlo forecasts for software development teams?

medium.com

Feature Progress

The last element is the scatter plot which shows the progress of features over time. On the x-axis, we’re showing the completion rate based on the total amount of PBIs per feature over how many of them are done.

On the y-axis, there is the likelihood of finishing all remaining items before the target date of the release. The lower the item more likely it is we’ll manage everything in time.

Every bubble represents one feature. The bigger the bubble the newer the data. This allows us to see the trend — is the feature moving from green to amber? Or is it on a good trajectory from red towards green?

This chart was inspired by the Visual Portfolio Management by Steven Tendon from his book TameFlow.

Visual Portfolio Management

Visuals that busy exectutives wil love; while teams will be more in control.

tameflow.com

How do we take Action?

So this looks cool and colorful. That’s not the point though, but actually how we use this to make decisions.

The forecasts of how many items we’ll manage over a certain number of days are a useful input for our Sprint Planning. Also, the longer forecasts give you a rough idea of what’s reasonable and what’s unrealistic.

The feature forecasts allow us to make decisions early to see if we need more focus on a feature, or whether we might need to remove some aspects of it to make it in time. For a feature that wasn’t started, it allows us to see if it’s reasonable that we still manage this in time. If not, we might choose to skip the feature entirely for a release or to explore early if we can move our target date.

Lastly, the overall prediction and traffic light help to gauge how we’re overall doing. The attentive readers have noticed that the features are “in the green”, yet the traffic light shows red — how can that be? Simply because there are things we planned to do that are not tied to a specific feature. Shall we decide to not make such improvements and rather focus on the features or drop a feature instead?

These are discussions that the visualizations allow us to have on a regular base. They don’t help us get faster, but they help us take action early so we as a team can become more predictable.

Conclusion

Continuous forecasts help to make the current state transparent. We’re using the forecasts to frequently inspect where we stand and adapt our plans. We’re adding, removing, or exchanging features for a release given the circumstances. Or we adjust our plan and find different ways to solve a problem, so we can keep the target date and achieve the goal.

If you are already using probabilistic forecasts, you might benefit from forecasting bigger chunks of work. If your environment requires you to batch multiple changes into a single release, this allows you to have a leading indicator to know whether your original plan is working out or if you should take action. By visualizing this, you also get a tool to communicate the state to different stakeholders, that might not be interested in too many details.

Special thanks to Lorenzo, Peter and Henadz for reviewing this post and providing valuable feedback.