Why Engineers Cannot Estimate Time

Published in

The Startup

7 min readOct 18, 2020

A statistical approach to explaining bad deadlines in engineering projects

Whether you are a junior, senior, project manager, or a top-level manager with 20 years of experience, software project time estimation never becomes easy. No one no matter how experienced or genius they are can claim to know for sure the exact time a software project would take.

This problem is especially prevalent in software engineering, but other engineering disciplines are also known to suffer from the same downfall. So while this article focuses on software engineering, it also applies to other disciplines, to an extent.

Overview

Let’s first have a birds-eye view of the problem, the consequences, and the potential root causes. I will be covering most of these during this series.

The Problem

Software projects seldom meet the deadline.

The Consequences

Marketing efforts can be wasted, clients can be dissatisfied, stressed developers can write poor quality code to meet deadlines and compromise product reliability, and ultimately, projects can outright get canceled.

The Known Causes

Wrong time estimates (the focus of this article).
Unclear requirements at the start of the project and, later, changing requirements.
Gold-plating: too much attention to details outside the scope of work.
Not taking enough time in the research and architecture design phase or, conversely, taking too much time.
Overlooking potential issues with 3rd party integrations.
The desire to “get it right the first time”
Working on too many projects at the same time or getting distracted (breaking the flow too often).
Unbalanced quality-throughput scale.

Over-optimism, Dunning-Kruger effect, pure uncertainty, or just math?

It’s easy to dismiss the concept of over-optimism all together just because it’s common sense that no developer who ever struggled to meet a deadline will be optimistic when setting deadlines. Now if project management is not coming from an engineering background and they set deadlines without knowing what they are doing, that’s a whole different issue that is outside the scope of this article.

Some also attribute bad time estimation to the Dunning-Kruger effect, however, if inexperience or overestimating one’s ability is behind underestimating time then definitely more experience should alleviate the issue, right? The biggest companies out there with almost infinite resources still have a shockingly high rate of missing deadlines, so that hypothesis is debunked. Not to mention, we have all experienced this ourselves. More experience barely helps when it comes to time estimates.

Most developers, especially rather experienced ones, quickly conclude that it’s just pure uncertainty. And it follows that time estimates will always be wrong and that’s just a fact of life and the only thing we can do about it is, well, try to meet client demands and tell developers to “just crunch” when things go wrong. We are all familiar with the stress, the garbage code, and the absolute mayhem that this philosophy causes.

Is there a method to the madness? Is this really the best way we can get things done? Well, I didn’t think so and that’s when I embarked on my journey trying to find a rational mathematical explanation as to why all those smart people are unable to estimate the time it’d take them to do something.

It’s just math!

One day I was doing a task that should have taken 10 minutes and ended up taking 2 hours. I started contemplating the reasons why I thought it would take 10 minutes and the root cause that pumped that number all the way up to 2 hours. My thought process was a bit interesting:

I thought it would take 10 minutes because I actually knew 100% in my head the exact code that I needed to write.
It actually took me around 7–10 minutes to be done with the code. Then it took 2 hours because of a bug in the framework completely unknown to me.

This is what people like to call in project management “force majeure”; external uncontrollable causes of delay.

Now you might be thinking that I’m just proving the uncertainty argument with that scenario. Well, yes and no. Let’s zoom out a bit. Sure, uncertainty is the root cause of the delay of this particular task because I would have never guessed that bug existed. But should it be responsible for the delay of the whole project?

That’s where we need to draw the distinction that a single task isn’t representative of the project and vice versa.

How we “normally” estimate time

Normal distributions are all around us and the human brain is pretty used to them. We are experts at estimating things following a normal distribution by nature; it’s the basis of gaining experience by exposure.

If you went to the nearest 7–11 almost 20 times this month and every time it took you 5 minutes, except for that time the elevator needed maintenance and you had to wait for 10 minutes and maybe that other time you decided to wait a couple of minutes until it stops raining. What would be your guess for the time it takes you to go there right now? 5 minutes?

I mean, it doesn’t make sense to say 15 because it’s a rare incident or 7 unless it’s raining outside. And you’d be right, most likely. If 18 out of 20 times took 5 minutes then certainly there’s a big chance that it would just take 5 minutes (the median) this time, roughly 90% chance (without getting into more complex algebra, of course).

It’s skewed!

Even if you are really good at estimating the time a task will take, that doesn’t mean you will be good at estimating the time the project will take! Counter intuitively, you will be more wrong.

Now all the math nerds (or data scientists/statisticians) reading right now must have already recognized that tiny graph in the previous meme as a right-skewed normal distribution. Let me enlarge, and clarify:

The median still has a higher probability of being true than the mean, for that single task! If you were to guess the mode value which has the highest probability, you’d be even more wrong on a larger scale

Do you see how things can go wrong here? Our “natural” guess is based on the median which maximizes our probability of guessing right, however, the real number when that “event” occurs enough times will always approach the mean. In other words: The more similar tasks you do, the more that error accumulates!

Delay equation, based on that hypothesis

Programming tasks on a project are usually pretty similar or at least grouped into few similar clusters! This equation also implies that the problem is scalable! While we want everything in software projects to be scalable, problems are certainly not welcome.

So, how to use this knowledge?

To be honest, while writing this article I didn’t have in mind any intention to give “instructions” based on this hypothesis. It’s just meant as an exploratory analysis concluding with a hypothesis that’s up to you, the reader, to interpret however you wish.

However, I do know that many will be disappointed at that open-ended conclusion so here’s what I personally make of it.

It’s easier to tell if task X would take more/less/same time compared to task Y than it is to tell exactly how long they would take. This is because comparing the medians works just as well as comparing the means if the skewness of the curves is roughly the same (which is true for similar tasks).
I don’t recall or record every single similar task to do the math and get the mean (and couldn’t find any data to torture). So I usually estimate the inevitable error (mean-median) as a percentage of the task time that goes up/down depending on how comfortable I’m with the dev environment (do I like this language/framework? (40%) Do I have good debugging tools? (30%) Good IDE support? (25%) … etc).
I started splitting sprints into equally sized tasks, just to create some uniformity in the time estimation process. This allows me to benefit from point 1, it should be easy to tell if two tasks are roughly equal in time. This also makes tasks even more similar so that the hypothesis applies even more perfectly and things become more predictable.
With these principles applied, you can do a “test run” if you have the resources. For example, if in X1 days with Y1 developers Z1 of the uniform tasks were completed then we can easily solve for X2 (days) given we know Y2 (developers available) and Z2 (total tasks left).

Finally, make sure to follow if you don’t want to miss the upcoming articles covering the other causes of delay.