Cucumber and Mobile, BDD or Automation? Thoughts from CukeUp!

TL;DR

’BDD is like Jazz. There is no authoritative definition of what it is’ — Aslak Hellesøy.

It’s correct that BDD has no formal definition but it’s easy to spot two distinct use cases of tools such as Cucumber.

1. Specification of requirements + ongoing verification of behaviour. 
2. A way to drive UI automation tools to replace manual regression of software/hardware compatibility testing.

In mobile development a clearer distinction between the two is helpful in understanding which we’re applying. It’s clear that the majority of QA teams in mobile are trapped using automation and this just isn’t the right way to implement BDD. These automation monsters mostly fail to deliver a return on investment.

Ever growing QA teams are a hindrance rather than a help, an anti-pattern of testing that’s harming mobile software delivery. Only through a wider appreciation of the distinction between the two use cases above, and a better understanding of the first will we see improvements to both development and testing practices in mobile.

My suggestion is that improvement in quality begins with a greater involvement of developers in BDD and testing. That starts with a better understanding of Clean Architecture through which we enable BDD testing. BDD cannot be only a QA concern. Developers really can test their own code (and absolutely should).

Aslak Hellesøy’s keynote reminded us of these fundamental (but oft overlooked) principles of BDD.

To further understand how I arrived at this conclusion and to get the full round up of talks and thoughts from this years CukeUp! ’16 read on.


Thoughts from CukeUp! 2016

This year I attended CukeUp! a BDD conference with a strong Cucumber influence. The aim of the conference is stated clearly; although interestingly it doesn’t mention Cucumber even though it’s heavily referenced in the name itself.

’We want to bridge the communication gap between business and IT to deliver precise software, faster.’

I presented a lightening ‘ramble’ at the end of the conference where I tried to illustrate that with my graph on detail, and why BDD and Agile is essential to the software development process.

I fluffed the 10 minute time limit but I did get to show the graph which sums up why the conference (and BDD) exist and are essential to effective software development.

Kind of Green

Things began with a keynote from Aslak Hellesøy the creator of Cucumber with a summary of BDD and a current understanding of it for 2016. The general jist of it being BDD is a constantly ‘evolving interpretation’ of various different practices.

The section of ‘Kind of Green’ that I was most interested to see was coverage of the do’s and don’ts of test design; specifically not testing through the UI. The reasons were explained and well known; slow, brittle, complex and doesn’t scale.

There was a short clean architecture discussion using its ‘Ports and Adaptors’ synonym and one of the nicest drawings I’ve seen of it yet.

It was good to see this fundamental concept hi-lighted from the get go. As the conference proceeded I sensed this topic had possibly been lost on most of the QA audience in attendance. There was little evidence from talking and listening to people that this approach was one that was being adopted or used. Somehow somewhere, this message has been lost.

High Impact!

The rest of CukeUp! was divided up between talks and workshops. The format worked nicely, reflecting the BDD spirit of learning through collaboration; it worked.

One of those workshops was John Smart and Jan Molak’s Agile Project Planning. Here I encountered the technique of ‘impact mapping’. You can read more about that on the slide share but I think there’s something in this. One of the techniques demonstrated showed how using feature points as-well as story points provides a means to calculate the prioritisation of work. Not new, but not used much.

Prioritising is seemingly one of the hardest things for organisations to get their heads around. “But we need it all so let’s just do the first thing, what about the Login screen?” — awkward developer asks “Login so the user can do what exactly!?”. BA rolls-eyes, and so it begins.

John provided us with a formula using estimated feature points together with story points that took a lot of the mystery out of figuring this out. It’s pretty simple.

Relative ROI = (Return — Investment) / Investment

What’s fascinating about this is that it allows you to arrive at negative numbers! I know because it happened to me during one of the exercises. Negative means quite simply *don’t do it!* — It will actually do bad to the project. Sometimes you may have a gut feel during a planning session about some feature not being right, but it’s often hard to argue without something rational to base that feeling on.

I found it did sort things into the most useful priority order, one which wasn’t based just on individual subjective decision making. This is valuable because developers tend to dislike arbitrary prioritisation when it disagrees with their own arbitrary analysis of the usefulness of the work they’re being asked to do. We all want to do meaningful work, this is one way to validate that the work is; and through a technique where everyone can see that clearly.

Planning Meetings You’ll Love

I had a personal experience of how this could have powerful importance during a subsequent workshop. Gaspar Nagy and Matt Wynne introduced ‘example mapping’ in A Planning Meeting You’ll Love. During this I was able to do my impression of a product owner. The feature we choose to map was the innocuous sounding:

”let the user change their pizza order after they’ve ordered”.

Sounds like a nice feature. By going through the exercise it turned out that to implement this we were going to introduce a delay to every pizza order by 10 minutes, probably an eon in Pizza time and motion study circles. As a PO I realised this was not going to look good at my next performance review.

We canned the feature before a line of code had been written. As Apple puts it “there are a thousand no’s for every yes”. This was one of those important no’s and anything that helps organisations figure this out is both a step towards better designed software and improved efficiencies in software development processes.

This and other workshops focussed effectively on better software planning and design in practice.

’There be Dragons’

On the second day Nat Pryce gave an enjoyable talk close to my heart on the subject of “Test Automation Tales of Terror”. Along with Jenny Martins “A BDD Manfesto” they both reminded us that despite our best intentions with all this BDD stuff we should be careful.

We loaded up our caravans and wagons with good advice, best practices and tools but didn’t spot the legend — “Here be Dragons” So we got burnt again. — Liz Keogh

I revisited my own experiences with the use of BDD tools in mobile testing and couldn’t help but feel an affinity with Nat’s slide “They Created a Monster”. It captured best a shared sentiment about what happens when QA teams descend on tools and tools become the point and take over.

For an amazing technical demonstration of one of these incredible monsters we met the BBC’s “Hive CI”. It was met with enthusiasm just for the sheer technically impressive feat of implementing it, however it terrified the beejeezus out of me.

BDD style tools such as Cucumber when applied to these kinds of projects tend to add additional complexity and have very little to do with BDD. My own prediction is that projects like these require so much effort and complexity that they inevitably are abandoned (usually to be replaced with another project to bring into existence almost exactly the same kind of monster a few years later.)

It occured to me as the conference went on it would be worthwhile to the audience to have a clearer distinction between Cucumber implementations that are designed for the purpose of facilitating automation and those that are intended to faciliate verification of the business domain. When we run unit tests we don’t call it automation, why do we so heavily associate BDD with the term automation?

In the case of something like Hive CI I find myself questioning the value of ‘Given, When, Then’ language in the domain of automation testing. Does anyone from the business side read these? If not why would you use it? given English by its very nature only adds ambiguity it’s a poor way to describe binary statements, exactly the kind that are required for a test. At best, it seemed like it only added inconvenience along with lots of extra words.

The Mobile Money Pit

Turning this back to my own experiences and applying John Smarts formula I came up with a back of an envelope calculation.

Relative ROI of UI based testing in Mobile

Return = 20, Investment = 100
-0.8 = (20–100 ) / 100

The conclusion is, the effort to use BDD when applied using UI based automation tools in mobile is off the charts. In terms of investment I give it the maximum points of 100. What about the return? Well considering they’re slow so we don’t run them often, we have to do away with many because of the fragility of them, and then because of the inherent poor defect localisation and inability to cover much of the business logic I could really only muster a valuation of 20.

”Businesses should be shutting down QA teams not growing them!”

When you run those numbers you end up with -0.8. That’s a worrying conclusion. It doesn’t add up, our effort doesn’t warrant the return. Businesses should be shutting down QA teams not growing them! In fact even if the return were the maximum of a 100 you only arrive at best at a value of exactly zero. It can never pay for itself. So why is everyone doing it, have we found ourselves victims of yet another software cargo cult?

Cucumber tools in my own domain of iOS have become deeply associated with UI Automation and that’s unfortunate, I don’t think that’s what Cucumber’s proponents had in mind, it certainly wasn’t anywhere in the stated goal of CukeUp!

Despite the best intentions by the maintainers of the various Cucumber tools such as Calabash, they remain misguided tools, all repeating the same mistake of being UI centric.

Introducing Cucumberish

There was light on the horizon for iOS with the emergence of Ahmed Ali’s Cucumberish which facilitiates component level testing using Cucumber, taking us a step in the right direction. Despite this being the most interesting thing about Cucumberish it wasn’t talked about much.

The Cucumberish demo had a strong UI bias and the following Q&A discussion focussed solely on technical discussion of UI Automation frameworks and integration with XCTest. No one seemed that interested in the component level access to the code it provided and it would be a shame if this aspect of it went completely unnoticed and unused. No amount of Xcode integration is going to tame the monster.

There was talk of Cucumberish being the officially sanctioned Cucumber tool for iOS/Mac and so it’s worth watching if you’re someone invested already in Cucumber tooling.

’Be-monster not thy feature’

There is clearly a very dominant mindset in the Cucumber testing community at work. An alternative BDD tool like Fitnesse isn’t confused for a UI Automation tool (although you could certainly use it as one) because it’s clearer in its limited goals and intentions. Maybe therein lies the reason why no one uses it much outside of certain industries. Is it because QA aren’t really looking for BDD, is it possible they’re instead just trying to repeat what’s familiar by automating something that was previously manual?

This has led to confusion. Mobile development teams can be confused in ticking the ‘BDD’ box whereas it’s actually an illusion — what’s worse under this pretense it is potentially doing more harm than good as we create automation nightmares with costly appetites that we expect our organisations to feed. Are we still meeting the stated goal of adding value through all this?


Could Testing Actually Be A liability

Ulrika Malmgren explored “How Testers Can be a Liability for your team”. Is it possible? In mobile I agree and her talk explains how. Her point that developers should be more involved in testing their code was an astute observation but one which is often challenged bizarrely by QA teams.

‘Developers can’t test their own code!’ — QA team

This silo’ing of QA responsibilty is damaging. That brings us back to Aslaks Keynote and the fundamental need for developers to build at the outset with clean architecture if BDD is to be applied effectively.

If we return to this idea perhaps we can begin to move the mindset of QA away from its unhealthy fixation with UI Automation. With this we could greatly improve our ability to more effectively close the communication gap between business and development. Let’s remember that was the reason why we started doing all this in the first place.

I’ve submitted my suggestion so maybe I’ll be back for CukeUp! 2017. I definitely recommend it.