Building Better Products

Implementing Behavior-Driven Development with Component Teams

Marcel van der Poel

Published in

ING Blog

15 min readOct 27, 2023

Introduction

In bigger organisations, IT-products are seldom the result of one team’s work. With 37 million customers in 40 countries, served by 60.000 employees of which 15.000 working in Tech, this also applies to ING.

ING uses component teams as the organisational building block for software delivery. Component teams are centered around a component, which provides a functionality. If ING would have been in the shoe sales business, we would have had component teams for shopping cart, inventory, expedition, user profile, search shoes etc.

The problem with component teams is that they do not deliver customer value. At least, not by themselves. I know off no customer that gets value from filling and emptying a shopping cart without placing an order. This does not mean that what component teams deliver is not valued. Quite the opposite.

But customers have a need or desire to be fulfilled. Like buying blue sneakers or returning shoes that do not fit. Business value is only achieved when components work together to fulfil these larger business processes or end-to-end customer features like buying shoes.

Behavior-Driven Development (BDD) helps component teams understand the desired behavior of end-to-end customer features and how their components contribute to them. BDD also helps manage the dependencies that exist between contributing component teams, by using conversation and concrete examples.

This article describes an adapted BDD process and the extensions we made to BDD tooling, to allow an organisation to build better products, when feature teams are not an option (feature team is a team of software product developers that implements end-to-end customer features as a single team).

If you are new to BDD, then this section on cucumber.io is a nice starter.

Discovery workshop

The usual starting point for BDD is a discovery workshop that helps structure the conversation about the new customer feature.

Let’s stick with our shoe sales business. We already sell shoes, but we didn’t implement any business processes that allows customers to return shoes.

We have a user story for that: “As Joe the customer, I want to return the shoes I bought, so I can get a full refund.”

A single team would organise a three amigos-, example mapping-, event storming- or any other session to get business and IT on the same page. With multiple teams, it is not very different, just bring more amigos to the session.

For our example we invite the component teams that we expect will be needed to implement the e2e customer feature(s) that will be derived from this user story:

Order management (they handled the original order)
Return shipment handling (new team to handle return shipments)
Inventory (returned items go back in stock)
Finance (they do the refund)

Also in the session is a product lead as domain expert and an architect or feature engineer for technical high-level guidance.

The actual discovery session will not be very different from what you would expect from a single team setting. Examples of the e2e customer feature(s) are discussed, explained and challenged with the goal to reach a common understanding of how the combined components should behave to fulfil the need of the customer.

In my experience, it is hard to get teams together in one room at the same time. Having a separate central team doing all the preparation for the teams, communicating this by document and integrating everything together once delivered, appears to be more convenient. But also comes with a cost: Teams do not know their context, align late in the delivery process and work on their delivery at different moments. As a result, misunderstandings are discovered late in the process and software is being shelved, waiting for all parts to be ready.

Formulation

Now we discovered what the software could do, during formulation we formalise what the software should do. We do this by creating human and machine-readable scenarios for the examples we discovered. We use a light-weight descriptive language called Gherkin (https://cucumber.io/docs/gherkin/reference/) for this purpose.

A happy flow scenario is a good example:

Feature: Joe returns items to get a refund

Scenario: Joe gets full refund

Given Joe has bought shoes for 80 euros 2 weeks ago
And Joe has created a return ticket
When the shoes are returned and received in order
Then Joe's bank account is credited for 80 euros

If you look closely, you might be able to see how the steps relate to the teams that were present in the discovery session. Return shipment handing for example relates to the step ‘Joe has created a return ticket’, because the steps business logic will be implemented in their component.

What we want to achieve is that every step can be completely related to a single component team. Because later, every team will not only make the changes needed for their component, but also create the step that relates to their component.

For order management it is now clear they need to provide data about the shoes Joe has bought in the past, because return shipment handing needs it to create the return ticket.

And when inventory then receives the shoes in proper condition, they can signal finance to pay out.

Teams discuss how their components will interact. This defines clear boundaries where one step stops, the other begins and how information flows between them.

With clarity on the interfaces, clarity where the business functionality goes and a common understanding of the e2e customer feature, we have what we need to create the steps and change the components.

Automation

From here on I stop using the shoe example, and instead use real examples from our banking business.

Automation is the part where we take the human-readable scenarios from the formulation step, and make them machine-readable. When automating BDD scenarios in a single team setting, this single team works with these 5 things:

The scenarios. Scenarios are found in feature files when you use Gherkin or in scenario tables when you use a wiki based BDD tool. An example of a scenario for handling a PSD2 payment initiation request:

Feature: PSD2 payment initiation request

Scenario: DE sepa credit transfer with booking pattern

Given a PSU has an account (DE, ING, EUR)
And the PISP has a creditor account (DE, ING, EUR)
And the PISP YACA obtains a TPP AT
And the PISP initiates a sepa credit transfer request for the PSU of EUR 0.02
And the PSU provides the debtor account
When the PSU authorises the payment order
Then after 0:05 hr I expect the following booking pattern has been followed:
     | account | amount   | debit or credit |
     | debtor  | EUR 0.02 | debit           |

A test runner like Cucumber. Cucumber reads the scenarios step by step (line by line) and will look for a matching step definition (https://cucumber.io/docs/gherkin/step-organization/?lang=java).
The matching step definitions and their underlying test code. They are packaged in a JAR and Cucumber uses the annotated lines like @Given or @And in the example below to match steps from a scenario with code in the JAR. When there is a match, it will execute that code. The step “the PSU provides the debtor account” matches to this code:

    @And("^the PSU provides the ?(WBAAC)? debtor account$")
    public void the_PSU_provides_the_debtor_account(String wbAccount) throws Throwable {
        ingWebComponentEntity.setTpp((TPP) PicklesSession.get("tpp"));

        // The PSU gets a list of accounts
        String accounts = ingWebComponentEntity.getPaymentAccounts();

etc, etc.

The component or system that is tested using the steps test code.
The test output.

During automation in a single team setting, the team first automates the scenarios by automating each step. That results in a scenario that can be executed but will fail, because the underlying implementation is missing. In other words, the system under test does not contain the implementation that can make the scenarios pass. The team then creates the underlying implementation and runs the scenarios until they pass. This is done in both the developers IDE and in the build pipeline.

In a multi-team situation, where multiple component teams collaborate on a customer feature, we cannot replicate this without solving the following issues:

Platform: To test against all components that support an e2e customer feature, we need the test runner to be able to reach all systems under test. Systems under test become service under test. The e2e customer feature is part of a bigger service offering. Testing in an IDE or component pipeline is not an option anymore.
Contributors: Steps come from multiple teams, instead of one (assuming we do not want to centralise what doesn’t need to be centralised).
Ownership: who owns the scenario? In my experience, joined ownership of these scenarios by the teams, is not acceptable for the teams. Ownership by the product lead, is. Since product leads do not have access to an IDE, they need something else to take ownership.
Stakeholders: When we are going to continuously integrate the e2e customer feature by running the scenarios, we have a way of giving continuous feedback to stakeholders on progress, quality and dependencies. How can we distribute this information to a large group of people?

We solved these issues in an organic way, not by a predefined plan. Starting with plain vanilla Cucumber, whenever we ran into its limitations within our context, we did small experiments. It helped us to understand what works for us and what doesn’t. The current state of our solution is discussed in the rest of the article.

Platform

Our Cucumber test runner runs on a central server. The file system of this server contains the feature files, the jars with step code and the code we created that solves the limitations Cucumber has in our context. We named the whole thing Testcenter.

The Testcenter platform can reach all systems under test that are part of a service under test. Users can interact with Testcenter via a web application.

Testcenters main page, showing the latest test runs

Test runs on the platform are triggered manual or scheduled. Triggering from a team’s pipeline is possible, but not used at this moment.

Contributors

The contributors of steps are the component teams. We learned that component teams are not able to take responsibility for the whole scenario but are able to take responsibility for the part related to their component. For example, making a simple payment involves a team that initiates the payment, one that handles the approval process, a team that creates a debit and a credit part of the payment, etc.

Each team delivers the part (steps) they feel responsible for. When a step has a clear relation with one component, it can easily be automated by the team owning that component. This way one scenario is implemented by multiple teams.

These teams have a separate Testcenter step project for their component and whenever they add or change steps, they publish a new version of their projects jar in an outgoing feed. Testcenter then downloads the new jar to the Testcenter server, making the new steps available for everyone to test with.

Ideally, teams first automate the steps and then change their components.

Then whenever a team releases a new version of their component to a connected environment (means connected to each other), scenarios can be run to continuously integrate the e2e customer feature.

Here you can see we load all the jars as part of the glue:

@RunWith(LocalPicklesCucumberRunner.class)
@CucumberOptions(features = { "classpath:features" },
  // dryRun = false, //
  monochrome = true,
  glue = { "classpath:com.ing.e2e.team1.steps", 
           "classpath:com.ing.e2e.team2.steps", 
           "classpath:com.ing.e2e.team3.steps",
           .....
           "classpath:com.ing.e2e.teamn.steps" }

We plan to implement an option where you can specify the jars you want to load to run your test run. The main reason for this is, that with a growing number of teams on the Testcenter platform, a badly written step can negatively influence scenarios that don’t need that step. Another reason is duplicate steps. Using fewer jars lowers the change you run into a duplication problem; Cucumber will always use the first step implementation it finds, not necessarily the one you need.

Ownership

As mentioned, component teams feel they cannot take full ownership of e2e customer features and therefore scenarios, which are formalized examples of these features.

Product leads can, but they lack the Integrated Development Environment the developers in the teams use to write Cucumber scenarios.

Product leads also want to work together and collaborate on the scenarios, like developers would on code.

To support this, we created a project environment in Testcenter where users can create projects, edit feature files as part of these projects and schedule test runs.

Everyone can create a project and invite colleagues to their project. Within a project everyone has the same rights. Project resources are only visible to project members.

The most important resources of a project are the feature files.

For now, basic file operations like copy, rename and delete are enough. Clicking a file opens the feature file in the open-source Ace editor. The nice thing about the Ace editor is that it has syntax highlighting for Gherkin.

Besides editing a feature file, it is possible to “run” the contents of the web editor. This provides the person creating the scenario with easy and fast feedback on the scenario. What we typically see, is that when initial scenarios are created and pass, users start creating variations by changing step parameters and steps. The “run” option creates a one-time test run that is planned for immediate execution.

When users are satisfied with their new scenario, they tag it and use the tag as part of a test run definition.

A test run definition contains all the data needed for Cucumber to handle a test run plus a schedule. In the example above we take all scenarios within the project YODA that owns this definition and run the ones that are tagged @Granting. We do that based on the schedule.

The schedule options are simple. We used to have a cron expression here that turned out too complex for most users.

By scheduling test runs, continuous feedback on integration of components is provided. The number of scenarios passing and the number of steps passing in a single scenario will tell stakeholders how development of the new e2e customer feature is progressing.

Stakeholders

The two main stakeholders are the product lead and the component teams. Both stakeholders are interested in how they are progressing towards a done state, the product lead for the e2e customer feature and the component teams for their component and for their dependencies with other teams.

Component teams are also interested in implementation details, especially when a scenario fails. For them it is important that there is sufficient data available that can help analyse what went wrong.

After selecting a test run from the main page, a dashboard is presented with the feature files of that test run. Cucumber output is saved by us and stored in a database, from where we can retrieve it and display it via the Testcenter web app.

The standard Cucumber states are there, like passed, failed etc. Pending is new, that is explained further on in the article. We learned that easy access to Cucumber’s log is useful, so there is a button for that. And the button to create a PDF-report that contains the same data as the web app is used to create evidence. It saves teams a lot of work.

Clicking the features brings us to the level of the scenarios. This is where we can find the information about what really happened on a technical implementation level.

Example scenario with detailed information

Besides the scenario, including step run time and moment of execution, there are also several tabs. These tabs contain data logged by the creator of a step (e2e report), API requests and responses logged by the creator of a step (APIs), error message thrown by Cucumber and not in this example, screen prints made with Selenium web driver if the step is testing via a front-end (GUI-tab).

In the example scenario we refer to accounts by describing the kind of account that is involved, for example a German ING Euro WBAAC_IRT account. In E2E Report, we can find the actual IBAN of the account we used when running the scenario. When a SEPA credit transfer request is created further on in the scenario, the request-id is also logged in the E2E Report.

The API-tab contains a sequence of all API-calls where each call can be expanded to show all headers, bodies and response code.

All the tabs are a rich source of information to help teams determine where things go wrong in implementation and dependencies.

The platform provides standard methods for developers to reuse. If they log whatever they want to log in the correct way, they do not have to concern themselves with the presentation of the data on Testcenter. The idea is to provide anything that is not business domain specific as a service.

In case of scenarios that are used as part of regression testing, the number of failures can be overwhelming. A few root causes can result in a lot of red scenarios. The Errors menu item was created to solve that problem. It allows to sort all failed scenarios within a test run on a few items, including the error message. Grouping on error message helps to prioritize which errors to solve first and which later.

The pending state

I promised that I would get back on the pending state. The introduction of the pending state required us to start persisting the state of a scenario. And by doing that, we later had the option to display test run output on our web application. Pending state was also our first big change on top of standard Cucumber. Why did we need it?

I often have discussions where e2e customer features should begin and end. I look at it from the perspective of the user. When a user plans a payment for future execution, I see at least two features: the planning part and the execution part. The planning part might also contain the option for the user to change the original planning. In that case the execution part remains unchanged. Others might argue these parts are one thing and should be tested as one thing.

When features are defined longer than really needed, you have a bigger chance of encountering batch processing or waiting time somewhere in your scenario. This happens when you must deal with external parties that are open at different times (like trading in foreign currencies) or because of your application architecture (batches).

In the beginning we worked on scenarios where we had to wait till start-of-day and end-of-day moments or wait for 10 minutes (because a batch would run every 10 minutes).

With the number of scenarios growing, this became a problem. We solved it by creating a Gherkin addition to the ‘then’ keyword and called it ‘then after’ and ‘then at’. We pre-process our feature files and split it into smaller, technical feature files that run at different times. The first part we run immediately until we encounter a ‘then after’ or ‘then at’. We then persist the current state of the scenario and plan the execution of the rest of the scenario for later, ‘after’ an amount of time or ‘at’ a certain time during the day.

We cannot wait exactly the amount of time (we can be 30 seconds later), but we will not continue processing too early.

Everything that is planned to be executed has a yellow color, to indicate the state pending.

Example of the same then after but two minutes later

Multiple then after/at are possible in one scenario.

The state is persisted in what we call a pickles session and it can be inspected by looking at the Pickles Session tab, where the pickles session json is shown. The team creating a step needs to decide if they will support then after/at. If they do, they need to persist and retrieve state to and from a pickles session that is stored in the Testcenter database.

Testcenter takes care of the splitting the scenario, running the different parts at the right times and combining the reporting from the different Cucumber runs into one functional test run.

Conclusion

BDD scenarios are lightweight executable artifacts that support communication and collaboration between product lead and component teams, while working on e2e customer features.

The BDD process helps teams to better understand what their role is in the e2e customer feature. It provides context and clarifies dependencies, both in terms of interfaces as well in terms of collaboration (this is well explained in Maximizing Dependencies with Interdependent Teams)

The automation of the scenario can be shift-left to support the development of the e2e customer feature and it can be decentralized, minimising time and effort put in centralised end-of-the-line testing.

And finally, we can achieve continuous integration of the e2e customer feature.