Uncertainty management using Agile

They say data scientists love uncertainty. That’s partially true.

Markus Reimegård
Swedbank AI
5 min readJan 6, 2020

--

Photo by Karsten Würth (@karsten.wuerth) on Unsplash

The ability to train and optimize a model towards a set goal and then apply it on unseen data is often what we are looking to data science products to do, and thus, dealing with uncertainty is one key strength of data science. It is also one of data science’s key challenges, since there are more kinds of uncertainty to grapple with.

In our group at Swedbank, we recently worked on an NLP use case involving label propagation, where broad topics are identified in a large corpus and new documents are given a probability score of belonging to each topic. Highest probability score wins, and the text is thereby categorized. This allows for a more sophisticated way of categorizing new texts, as they don’t have to follow a strict ‘if-then’ formula. It works better, but it’s also less transparent. You basically have to have faith in it. Question is how to explain this to stakeholders? With no rule-based dichotomies, only a continuum of something never quite true or false, always floating between 0 and 1, data science can make many feel uneasy.

A piece of the graph-based NLP solution.

Uncertainty transcends beyond the methods and models used, it propagates into the process and product too. When someone has a problem that might be solved through some AI model, and if they are themselves not in the domain of models and algorithms, there is often a certain amount of uncertainty residing in the problem formulation itself. The business problem might be clear enough, but its solution is usually less so, and the initial phase of a project usually sees the data science team and stakeholder going in circles around the problem and its solution, stakeholder asking “what would a machine learning solution for this problem be, exactly?”, data science team suggesting one and then replying “would this approach solve your problem?”. The lack of data science understanding on one side confronts the limited knowledge of the subject matter on the other.

Many practitioners have pointed out the important difference between software development and data science here: no one comes to the data science team and says ‘build me an app’. When the data science team is hired, all of a sudden everyone is talking about training data, recall and precision, graph embeddings and GANs. Stakeholders are lost in a random forest, uncertainty abounds.

And uncertainty in process and goals is — as many surely have experienced — also often where data science projects start bursting in their seams. Fatigue sets in, rabbit holes are dug, stakeholder skepticism rise, delivery quality goes south. It is vital to resolve such uncertainty, within data science teams and in the broader group of stakeholders, both on the so-called “business side” and on the side providing infrastructure and data.

Why we do agile

This post is about how we at Analytics & AI at Swedbank are experimenting with agile. The reasons for it are fundamentally given by the first paragraphs; it is because we need to find ways to handle and restrain uncertainties in the process of doing data science. We are by no means agile experts, none of us are particularly versatile (at least not yet) in all the different flavors agile come in. We get help from agile coaches in our organization, pick our favorite pieces, and merge them into something that works for us.

We have focused on parts of the Agile Manifesto that inhibits some inherent difficulties of delivering data science solutions to a big organisation: the quick deliveries, the tight collaborations, the motivating of teams, the meta level retrospectives are some examples of agile ways of working that we cherish. This very text is part of this process, a way to reflect on our way of working.

Agile in vivo

Here is how things work for us in practice: Analytics & AI are currently around 30 people, a mix of data scientists, business analysts and agile product owners (APOs). In the team I’m in, we are seven people; four reside in Stockholm, one in Tallinn, and two in Vilnius. Apart from me — I’m an APO — it is an all-data scientist team. The team typically works on four-five different use cases during one sprint. We have as many stakeholders as we have use cases, and we also try to provide at least ten per cent of working time on R&D that is outside of ordinary deliveries. As far as agile goes, we do all the ceremonies; planning, stand-ups, grooming, demos and retrospectives, and we run two-week sprints.

Here’s in a nutshell what agile does for us: the 15-minute daily stand-ups bring the cross-national team together and make physical distances less important; we start every day as a team. The agile ceremonies provide us a common overview. Over the course of a working day, we track our work via common tools like Jira and GIT, and use cases and retrospectives are continuously documented on a Wiki. Besides meeting in ceremonies, we practice pair and “mob” programming, sitting in front of the same computer or connecting via Skype.

With no dedicated scrum master, we have adopted “hybrid” roles, with data scientists taking turns running stand-ups, handing over the responsibility to a colleague when we start a new sprint. We spend time on describing epics and tasks quite thoroughly, to allow anyone in the team to pick them up. We on-board each other to new use cases and discuss our good and bad experiences in retrospectives, defining actions for next sprint.

This work structure makes the team more self-contained and frees up time for APOs to plan work ahead and clear the way for new use cases. It also aligns our cycles with units in our vicinity, like the aforementioned data and infrastructure providers, making the process smoother.

Space for creativity

Agile has been criticized for killing creativity and innovation. But our view is that by restraining uncertainty around the work process and how we measure success, we can create spaces where data scientists and business analysts can experiment and let loose their problem-solving creativity — spaces of compartmentalized uncertainty, if you will.

In essence, we use agile as a tool to get rid of the wrong kinds of uncertainty. Our common responsibility is to keep the power balance between stakeholders, APO/CPO and team. While stakeholders are free to formulate their challenges and APOs have the power to prioritize among the available use cases, the freedom to select tools and methods to use — and to experiment with them — should lie firmly in the hands of the team.

We want our group’s data scientists to share our stakeholders’ goals, and to give them autonomy to exercise their creativity and expertise; this is our team’s currency. When they do this, APOs can look ahead rather than mending current projects, and stakeholders can benefit from AI solutions that fit their needs. And that is where we want to be.

--

--