How do you estimate stories?

Rony Brosh
Tide Engineering Team
6 min read4 days ago
Image by vectorjuice on Freepik

Hi fellow engineers 👋

The importance of estimating our work should be obvious.
If you’re new to agile practices, know that by estimating work (usually in the form of Jira stories) a team can better prioritise and plan their sprints’ capacity and scope to deliver value predictably without stress.

How many times has your team been asked to estimate a piece of work and the answer wasn’t convincing? Can you imagine that hesitant voice replying with a random number of hours, days or story points? đŸ€„
In my early days as an engineer, we once had an “elephant” story which meant nothing and after a while, we completely stopped estimating.
Big mistake!

There are many ways to estimate. It can be time-based, story points which usually refer to the complexity of a task, animal sizes, T-shirt sizes, and many more


It doesn’t matter which style of estimation your team uses, what matters is to make those estimations consistent.

In a previous Medium, I shared how we adopted BDD and that we document BDD scenarios in our tickets.

Part of our “Definition of Ready” which indicates that a story can be picked is adding an estimation of story points to the ticket.

Let’s address that “elephant” in the room then.

During refinement or as we call it, “BDD session”, we gather the requirements, technical and visual and once we feel we have enough details, we use an estimation tool we’ve built on a simple spreadsheet.

For each category, as you’ll soon see, we pick the best option that describes it. Each option has a score between 0 and 0.75.

Again, It doesn’t matter what score each category gets as long as it’s consistent.

After all categories are addressed, we sum all scores and “round” them up to meet the Fibonacci sequence, i.e. 1, 2, 3, 5, and 8.

For example, if the summed score is 3.5 or 4, we’ll “round” it to 5.
If it’s more than 5, like 5.5, 7, and even 11, we’ll “round” it to 8.
No reason for a larger score than 8 as we won’t work on stories that are “too complex”. When we have such big estimations we discuss how to slice the requirements into smaller stories.

These are the categories that we use (I’ll cover the most common ones):

UI

  1. N/A (not applicable):
    If the story doesn’t require UI changes, the score is 0
  2. Design system only:
    If we can build the entire UI requirement using our existing UI components, the score is 0.25
  3. Design system + custom widgets:
    If we can use our design system but have to create a few components from scratch, the score is 0.5
  4. Custom widgets only:
    When we can’t use our design system, have to completely refactor it or build a new design system component, the score is 0.75

Endpoints

  1. N/A (not applicable):
    If the story doesn’t require BE (backend) changes, the score is 0
  2. Single:
    When we implement a new endpoint or modify an existing one, the score is 0.25
  3. A couple:
    Same as single but when we have two endpoints. Usually, a typical screen will have two endpoints at the most. The score is 0.5
  4. Multiple:
    In the rare case when we have to implement or modify more than two endpoints, the score is 0.75
    That might be an indication of an opportunity for slicing requirements 😉

Affected packages

We call our feature teams BAs which stands for Business Area, e.g. Onboarding BA, Payments BA, etc..

As we use story points to estimate complexity, the more BAs involved in a story, the more complex the solution is.

  1. BA only:
    If we only need to modify our BA’s codebase, the score is 0.25
  2. BA + Core:
    If we have to modify our BA’s codebase and a few of the core packages, the score is 0.5
  3. Multiple BAs:
    When we have to collaborate with other BAs, the score is 0.75

Localisation

  1. N/A (not applicable):
    If the story doesn’t require any texts, the score is 0
  2. Single translation:
    If we don’t have to translate or have only one language for texts, the score is 0.25
  3. Multiple translations:
    When we have to translate to all supported languages, the score is 0.5
  4. Dynamic text:
    When we have texts that contain dynamic content, the score is 0.75

Solutionising

  1. Simple:
    When we have a typical story, like implementing a new screen with a single endpoint, the solution will usually be very intuitive with a lot of existing implementations to reference if needed. The score is 0.25
  2. Design patterns/ Algorithms/ 3rd party integrations:
    When a solution is a bit more complicated and requires some more thinking, usually with a couple of sub-tasks, the score is 0.5
  3. Full refactoring:
    Similar to the second option but with more than a couple of sub-tasks.
    Usually, such stories involve multiple BAs. The score is 0.75

Analytics

  1. N/A (not applicable):
    If the story doesn’t require any analytics, the score is 0
  2. User interaction:
    For classic analytics like screen view, tap, submit, etc
 the score is 0.25
  3. Business rules:
    When we track backend responses and/or errors, the analytics events become dynamic and a bit more complicated. The score is 0.5
  4. UI + Business rules:
    For new stories, if we can’t slice them, we might have to include both options 2 and 3. The score is 0.75

BDD

If your team is keen on collaboration you can include the complexity of the scenarios as a category. Otherwise, please read my previous Medium first 😅

In our BDD world, inputs are the number of user interactions and outputs are the number of outcomes.

For example, A signup form might have multiple authentication providers, e.g. Google, Facebook, etc... Each one represents another input which results in another scenario.

An example of multiple outputs is an email evaluation. It can either be a valid email, an invalid email, an existing email, and maybe other “statuses”.
If each status results in different navigation, messages, errors, etc.. we consider them as multiple outputs.

We aim for 100% coverage of the requirements.

  1. N/A (not applicable):
    If the story doesn’t require new scenarios, the score is 0
  2. Simple scenarios:
    For “land on”, tap, submit, etc
 the score is 0.25
  3. Combinatorial scenarios (inputs or outputs):
    When we have multiple combinations of either the inputs or outputs, the score is 0.5
  4. Combinatorial scenarios (both inputs and outputs):
    When we have multiple combinations of both inputs and outputs, the score is 0.75

Risk

Every change involves a risk. Some are minor and some might result in breaking changes and regression bugs.

It’s important to consider the risk when estimating a story as it sometimes leads to better solutionising (see “Affected packages” too).

  1. Minor:
    For a new feature, text change, etc
 the score is 0.5
  2. Major:
    For deprecating a feature, modifying an existing feature, dependencies update, etc
 the score is 0.75

Use Fibonacci or any other scoring system, maybe don’t use numbers at all, use letters or animals. Add conditional formatting for visual feedback of categories and options, or use tools other than spreadsheets.

Feel free to test our categories, add or remove, and modify the options to better suit your team’s needs.

The only important thing is that by using such a tool, you train your team and yourself to be consistent with your estimations. Soon you’ll see how your team gain confidence and sprints become more predictable and simpler to plan.

I’ll be happy to learn what you come up with!

Thank you 😎

--

--