
Nobody wants to have the wrong estimates in a business critical analysis or introduce errors or unexpected behavior in production. So how do we avoid that? When we have limited time and resources and want the most bang for our bucks.
In this article we will discuss the great benefits of creating tests on the distributional properties of the results first.
Other types of testing on your data science pipelines are relevant too, like input validation or checking how transformations work on known data sets.
But the easiest tests to make, and that catches the most errors are distributional tests…

Jupyter notebooks have three particularly strong benefits:
As a data scientist and consultant, we do discuss a lot of ideas involving data science. Ideas of type:
Data -> Data Science -> Valuable results
Let’s look at some questions that can help you evaluate if this is a good idea. Not all of these questions needs to have a positive answer — there is no perfect project. It’s about managing risk vs. potential, how your work will enable better ideas in the future and making good bets. These are approximately what I try to ask when planning large projects or evaluating new initiatives. …
ICE is a framework for selecting feature work based on 3 key dimensions, enabling teams to make better decisions on what features work to prioritise:
Rate each feature on scale from 1–10, multiply to get the score
The feature with the highest score wins. For more information, see for example here or here.
This scheme might seem simple, but that is also one of the upsides. Estimation is really hard, so trying…
Summary: Because most data science tasks have uncertain outcomes, it is easy to create a lot of work in progress that doesn’t go anywhere. Using Kanban enables us to keep track of that work in progress. We also find it to be a very lightweight tool for project management, with minimum ceremonies for our estimation work.
We are assuming you are already looking at an agile process methodology. As our main background is consulting, we are often working on early phase and immature problems where the level of confidence is naturally low, and with many, open solution spaces explore.
Some…
Data scientist