How do you estimate stories?
Hi fellow engineers đ
The importance of estimating our work should be obvious.
If youâre new to agile practices, know that by estimating work (usually in the form of Jira stories) a team can better prioritise and plan their sprintsâ capacity and scope to deliver value predictably without stress.
How many times has your team been asked to estimate a piece of work and the answer wasnât convincing? Can you imagine that hesitant voice replying with a random number of hours, days or story points? đ€„
In my early days as an engineer, we once had an âelephantâ story which meant nothing and after a while, we completely stopped estimating.
Big mistake!
There are many ways to estimate. It can be time-based, story points which usually refer to the complexity of a task, animal sizes, T-shirt sizes, and many moreâŠ
It doesnât matter which style of estimation your team uses, what matters is to make those estimations consistent.
In a previous Medium, I shared how we adopted BDD and that we document BDD scenarios in our tickets.
Part of our âDefinition of Readyâ which indicates that a story can be picked is adding an estimation of story points to the ticket.
Letâs address that âelephantâ in the room then.
During refinement or as we call it, âBDD sessionâ, we gather the requirements, technical and visual and once we feel we have enough details, we use an estimation tool weâve built on a simple spreadsheet.
For each category, as youâll soon see, we pick the best option that describes it. Each option has a score between 0 and 0.75.
Again, It doesnât matter what score each category gets as long as itâs consistent.
After all categories are addressed, we sum all scores and âroundâ them up to meet the Fibonacci sequence, i.e. 1, 2, 3, 5, and 8.
For example, if the summed score is 3.5 or 4, weâll âroundâ it to 5.
If itâs more than 5, like 5.5, 7, and even 11, weâll âroundâ it to 8.
No reason for a larger score than 8 as we wonât work on stories that are âtoo complexâ. When we have such big estimations we discuss how to slice the requirements into smaller stories.
These are the categories that we use (Iâll cover the most common ones):
UI
- N/A (not applicable):
If the story doesnât require UI changes, the score is 0 - Design system only:
If we can build the entire UI requirement using our existing UI components, the score is 0.25 - Design system + custom widgets:
If we can use our design system but have to create a few components from scratch, the score is 0.5 - Custom widgets only:
When we canât use our design system, have to completely refactor it or build a new design system component, the score is 0.75
Endpoints
- N/A (not applicable):
If the story doesnât require BE (backend) changes, the score is 0 - Single:
When we implement a new endpoint or modify an existing one, the score is 0.25 - A couple:
Same as single but when we have two endpoints. Usually, a typical screen will have two endpoints at the most. The score is 0.5 - Multiple:
In the rare case when we have to implement or modify more than two endpoints, the score is 0.75
That might be an indication of an opportunity for slicing requirements đ
Affected packages
We call our feature teams BAs which stands for Business Area, e.g. Onboarding BA, Payments BA, etc..
As we use story points to estimate complexity, the more BAs involved in a story, the more complex the solution is.
- BA only:
If we only need to modify our BAâs codebase, the score is 0.25 - BA + Core:
If we have to modify our BAâs codebase and a few of the core packages, the score is 0.5 - Multiple BAs:
When we have to collaborate with other BAs, the score is 0.75
Localisation
- N/A (not applicable):
If the story doesnât require any texts, the score is 0 - Single translation:
If we donât have to translate or have only one language for texts, the score is 0.25 - Multiple translations:
When we have to translate to all supported languages, the score is 0.5 - Dynamic text:
When we have texts that contain dynamic content, the score is 0.75
Solutionising
- Simple:
When we have a typical story, like implementing a new screen with a single endpoint, the solution will usually be very intuitive with a lot of existing implementations to reference if needed. The score is 0.25 - Design patterns/ Algorithms/ 3rd party integrations:
When a solution is a bit more complicated and requires some more thinking, usually with a couple of sub-tasks, the score is 0.5 - Full refactoring:
Similar to the second option but with more than a couple of sub-tasks.
Usually, such stories involve multiple BAs. The score is 0.75
Analytics
- N/A (not applicable):
If the story doesnât require any analytics, the score is 0 - User interaction:
For classic analytics like screen view, tap, submit, etc⊠the score is 0.25 - Business rules:
When we track backend responses and/or errors, the analytics events become dynamic and a bit more complicated. The score is 0.5 - UI + Business rules:
For new stories, if we canât slice them, we might have to include both options 2 and 3. The score is 0.75
BDD
If your team is keen on collaboration you can include the complexity of the scenarios as a category. Otherwise, please read my previous Medium first đ
In our BDD world, inputs are the number of user interactions and outputs are the number of outcomes.
For example, A signup form might have multiple authentication providers, e.g. Google, Facebook, etc... Each one represents another input which results in another scenario.
An example of multiple outputs is an email evaluation. It can either be a valid email, an invalid email, an existing email, and maybe other âstatusesâ.
If each status results in different navigation, messages, errors, etc.. we consider them as multiple outputs.
We aim for 100% coverage of the requirements.
- N/A (not applicable):
If the story doesnât require new scenarios, the score is 0 - Simple scenarios:
For âland onâ, tap, submit, etc⊠the score is 0.25 - Combinatorial scenarios (inputs or outputs):
When we have multiple combinations of either the inputs or outputs, the score is 0.5 - Combinatorial scenarios (both inputs and outputs):
When we have multiple combinations of both inputs and outputs, the score is 0.75
Risk
Every change involves a risk. Some are minor and some might result in breaking changes and regression bugs.
Itâs important to consider the risk when estimating a story as it sometimes leads to better solutionising (see âAffected packagesâ too).
- Minor:
For a new feature, text change, etc⊠the score is 0.5 - Major:
For deprecating a feature, modifying an existing feature, dependencies update, etc⊠the score is 0.75
Use Fibonacci or any other scoring system, maybe donât use numbers at all, use letters or animals. Add conditional formatting for visual feedback of categories and options, or use tools other than spreadsheets.
Feel free to test our categories, add or remove, and modify the options to better suit your teamâs needs.
The only important thing is that by using such a tool, you train your team and yourself to be consistent with your estimations. Soon youâll see how your team gain confidence and sprints become more predictable and simpler to plan.
Iâll be happy to learn what you come up with!
Thank you đ