Running Agile Machine Learning Experiments

Published in

The Launchpad

5 min readAug 2, 2018

One of the main challenges for any data-driven company is how to reconcile longer-term, less flexible data science development with rapid iteration and cycles of learning in agile software development.

Traditional software development can be broken up into small, predictable components which can be tested and iterated in a fast and modular way. This “agile” methodology is not directly applicable in data science development work since data is harder to obtain, data experiments are slower, and it is more challenging to modularize pieces of the model. Data science is by essence very much research oriented work. Hence, there’s an inherent tension for deepening research versus going agile.

At BrainQ, we develop a set of artificial intelligence tools to identify spectral patterns in brain waves, and build personalized electromagnetic treatment procedures aimed at enhancing neuro recovery for spinal cord injury and stroke patients. Traditionally, this kind of process takes a clinical trial and several months for execution with a very long feedback loop. As an early-stage startup, we can’t wait that long to course correct our algorithms and product strategy.

Illustrative visualization of high resolution spectral patterns, characterizing upper limb motor movement in healthy population brainwaves (as observed via EEG)

A data driven healthcare company like ours has two goals: maximize the efficiency and speed of our data science team and quickly validate as many hypotheses as possible which are not based on clinical trials. We have to be rigorous with which experiments we run, and how; hence the need for a new type of flexible and agile process.

In our quest to develop our own agile data science process, we have found a few areas particularly challenging, namely A) Triaging focus areas, B) Quantitative evaluation of data science team/project, C) Course correcting. To address these challenges we came up with a process that has resulted in our data science team being able to plan future experiments within a day of getting the first data (which is a big deal for us!).

A) Triaging focus areas

How does a data-driven startup choose which areas should be prioritized for the resource-constrained data science team?

Prioritizing focus areas requires gathering the following information for every proposed experiment:

Hypothesis: Each proposed experiment must start with a clear, written hypothesis which serves a specific product/business goal. For example, when running a classification of EEG data for stroke patients, a hypothesis might be that applying a specific CNN architecture may improve classification accuracy by a quantifiable amount.
Value proposition: What is the (customer-centric) value proposition associated with this hypothesis? You should always have a good answer to the question “assuming the hypothesis is correct, why does the customer care?”
Resources needed: Different experiments will require different amounts of time from the data science team. The questions to answer are “how many resources are necessary to A) prove/disprove the hypothesis and B) if successful, deploy the feature/model at scale?”

There will inevitably be more experiments on the “wish list” than the data science team can complete (and the company can afford to invest in), but if each proposed experiment has these three things written up, the process of prioritization becomes much more straightforward.

While these may seem obvious, being rigorous about putting them in writing helps data science teams be more agile by limiting scope, helping the team focus, and producing a record of the rationale and results of experiments, as well as documenting why other experiments were not pursued.

B) Quantitative evaluation of data science team/project

Once the most important focus areas are prioritized and agreed upon by the team, what is the process for benchmarking the team against the internal goals, especially before the experiments are complete?

As in agile software development, teams should aim for small steps where advancements are measured against a plan on a weekly basis and pivots can be made. These short cycles of learning are necessary as there is constantly new data coming in that supports or refutes the initial assumptions that were made at the time the experiment was started. Combining new data with the initial assumptions and the scope of the experiments is more art than science. The goal here is to avoid ending up with an “inconclusive” result — the worst possible outcome from expensive experiments.

We found that we were better able to evaluate our data science team’s work when we modularized components of the experiments as much as possible. For example, one of the things that can impact EEG measurement is how the sensor sits on the head. If someone has thick hair, there may be artifacts in the EEG signature compared with someone with no hair. Segmenting experiments based on factors like this helps us understand root causes and evaluate why the model behaves the way it does.

C) Course-correcting

Since startups constantly make decisions based on limited information, one of the most important skills of an agile data science team is the ability to course-correct based on new information in real-time. For a data-oriented company, early discovery of what is not working is equally important as discovering what is working. More is at stake — if some of the early fundamental assumptions are wrong you want to know as early as possible to avoid wasted time answering the wrong questions. If the hypothesis fails, we do our best to quickly understand why, and course-correct or “pivot” as soon as possible.

For BrainQ, we found that the information which causes us to course-correct usually lives in three categories:

Product: Did we receive any new information that changes our product requirement assumptions? This type of information would come from interactions with the end customer/user, usually owned by the PM.
Data Fidelity: Are there any signals which might lead us to believe that there is a flaw in the data collection strategy? This type of information might come from trying to repeat a test on a given patient and being unable to reproduce the results.
Algorithm: Assuming the data is “true” (without bias/artifacts), is the algorithm robust enough to produce the desired insights, for example classification accuracy? How brittle is the model?

In the highly competitive and fast-moving startup space, an organization does not have the privilege to sit and wait for feedback. If you don’t constantly validate and iterate your approach, chances are you will not survive. An agile methodology allows startups to maintain their nimbleness and speed as they go to market, even as they build machine learning/data science into their products and companies.

About BrainQ

BrainQ’s mission is to help paralyzed individuals following neuro trauma get back on their feet, restoring their ability to perform activities of daily living. BrainQ is currently conducting clinical trials at leading hospitals in Israel.

The company is developing a medical device that utilizes artificial intelligence tools to identify high resolution spectral patterns in patient’s brain waves, observed in electroencephalogram (EEG) sensors. These patterns are then translated into a personalized electromagnetic treatment protocol aimed at facilitating targeted neuroplasticity and enhancing patient’s recovery.

BrainQ participated in the first class of Google Developers Launchpad Studio, focused on the applications of ML in healthcare and biotech.

Running Agile Machine Learning Experiments

Written by Yotam Drechsler‏