Creating a Data Science Practice with Impact

Published in

Michelle’s Musings

10 min readMay 8, 2020

“We know we need to do customer segmentation and build some models. Can you do that?”, was essentially the question with which I have started a number of my jobs in my career. The first time it was a little scary, but every time it has been a fun and rewarding opportunity to create something tangible and impactful. And in every case, answering that question followed a similar journey.

The first phase of the journey, I call “Winning and Planning” involves finding a way to capitalize on that enthusiasm and delivering something that keeps the excitement up, though it is rarely what was initially asked for. Because, in parallel, one needs to assess the current state of things, figure out what was actually possible (considering things like lack of data availability, cultural readiness, adequate technical infrastructure, etc.), and start to map out what is needed to solve for to get to the next phase of data science maturity.

The second phase, “Scaling the Data Science Discipline” has typically been to begin to execute on that plan; this involves hiring, getting more robust tools and infrastructure in place, and educating business partners and colleagues.

The third phase, “Growing Yourself”, has been adapting and growing myself, as the role of the leader has to change as the organization and data science capability grow.

For today, let’s talk more about the scaling phase.

Scaling Data Science

What has been notable is that these three phases can occur in an organization of any size and maturity and that they repeat themselves as a company grows its data science and machine learning capabilities. However, no matter where you are on that maturity life cycle, there are some themes that emerge during the “Scaling” phase. These themes are:

Building teams
Leading teams and individuals
Putting tools and infrastructure in place
Developing a Data Science culture
Creating impact

Each of these themes are a blog post of their own. This post will focus on the last one: ensuring that your data science and machine learning efforts are creating impact for your company. Obviously, the ability to impact the business requires addressing some elements of the first four themes; but, regardless of the data science maturity of your organization, impact can only be realized by working on the right problems and delivering iteratively.

Working on the Right Problems

Data Science, Machine Learning, and AI have the attention of the majority of business leaders today. So, it is tempting to jump right in and start with Data Science as a solution before identifying what are the important problems to solve. There is value in exploring what data science can do, as it can serve as inspiration for the problems to be solved or products to be developed. However, if that is all it is, and it isn’t ultimately aligned with a business need, its impact will be small or nonexistent.

So, how do you know what problems to solve? We’ll assume that you are starting with your company or organization’s mission, vision, and objectives (or these elements at the relevant organizational level) to drive your data science strategy.

An effective data science leader needs to think of themselves as a business leader and understand what success means for the organization, then leverage their unique skills to determine how to invest in data science.

But, how does one actually go about this?

Building a Data Science strategy usually involves many parties; business, technology and data leadership and subject matter experts. Often a list of potential projects is brought forward to be debated and negotiated. Then, deciding what to do might be based on a balance between perceived value and expected effort. But, how do you as a Data Science leader truly understand the potential value of these projects? And, how do you know if you are even starting with the right list?

In my most recent role, I’ve had the opportunity to learn a lot about product development. In doing so, I have come to see a number of analogies between building product and doing data science. This is not a novel realization and others have made similar comparisons (see the references at the end of the post for some example). For the purpose of identifying the right data science projects, we’ll focus on a human centered design approach.

For those unfamiliar, human centered design is a mindset and a design framework aimed to ensure that the products that you build are actually relevant to the people they are intended to serve. It is founded on empathy and deep understanding of the consumer so that you are highly likely to solve the real problems they are facing.

While there are a number of variations on the human centered design framework, it typically includes some form of these stages:

Framing: In the first phase, we Identify and prioritize the most important strategic problems that are in alignment with the company strategy.
Empathy: Next we gain a deeper understanding of the problem to be solved and develop empathy for our customers. Through various forms of research (interviews, ethnography, observation, analysis, etc) we identify key pain points, understand processes, tools, touch points, strategy, and psychology of our users.
Ideate and Prototype: Based on the problems and hypotheses found, the phase involves activities to solve the problem. We explore answers to the question “How might we…?” We develop prototypes and iterate towards a minimal viable product.
Test and Measure: Lastly, we have activities to prove out our solution and continuously deliver. We confirm whether we have achieved our intended outcome, iterate on designs, identify issues, and potentially uncover new strategic initiatives.

While couched in the context of developing product, perhaps you already see the connection to delivering data science projects? Determining what to work on ought to involve Framing and Empathy. We must first learn the needs and challenges of the business, identify the questions to answer, and the people to engage. But, what human centered design shows us is that we need to go a step further. As data scientists and leaders, we need to spend time with those people, asking questions, and understanding their challenges.

As I led off with, more than once (and for varying use cases) I have had stakeholders ask me to build a predictive retention model. In most of those cases, they were not wrong; however, there was much more to it. In one particular case, I believed I had done my due diligence and understood the business needs, the use cases, and what factors and data might be relevant. Consequently, I invested time in building a suite of models with good predictive performance. But when I was done, the stakeholders were impressed with my technical skills, but didn’t know what to do next. What I had missed was going beyond the use cases to the users. In this case, the users were the customer success managers.

And Empathy meant spending time with these folks; sitting with them and observing as they did their work, asking questions to understand what would make their job easier or enable them to work more effectively, and being open to possibilities.

Fortunately, after an investment in empathy, we delivered a successful solution, but it was very different than the one developed based on the “use cases” we had started with. Had we done this from the outset the question of “what to work on” would have had a different answer.

The next two phases of human centered design involve prototyping, testing, and measuring , which lead us nicely to the second key to impactful data science.

Delivering Iteratively

Now you know what you are going to work on. so you’re going to draft up a project plan and then spend a couple of months doing some analysis, building some models, and/or developing a data-driven application. Right? Except, then you show up to that meeting to deliver your results, only to learn that the problem has changed, or you didn’t understand the most important part, or your solution isn’t feasible for one or more reasons, or (even worse) you leave the meeting and never hear anything more about the result you delivered.

I don’t think the importance of iteration and feedback are new to anyone; but do we really do it? And do we do it effectively? We are often afraid that not done enough or not having all the answers will result in rejection of the project or a blight on our capabilities.

In The Lean Startup, Eric Ries says, “It is precisely this attitude that one see when companies launch fully formed products without prior testing. They simply couldn’t bear to test them in anything less than their full splendor.”

And, it is The Lean Startup method that we will look to as a foundation for iterative delivery of data science projects. Eric Ries coined the term “Lean Startup” in 2006, but the concepts harken from Lean Thinking, which was a management approach famously applied in Toyota’s factory production system as they rose to prominence. The Lean Startup teaches us how to quickly determine if our proposed solution is viable and shorten product development cycles. It revolves around a Build-Measure-Learn cycle and an approach to measuring progress called “Innovation Accounting.”

While the Build-Measure-Learn cycle activities actually occurs in that order, we have to plan them in the reverse order. We must first figure out what we need to learn, then what we need to measure to determine if we are learning and making progress, and lastly what we need to build to run that experiment and get that measurement.

Depending on the application, what we need to learn might be whether the solution we have proposed meets the customer needs or it might be whether the technical solution think will work is actually good enough. But what does “good enough” mean? Our questions (and consequently our measures) need to be tied to the business outcomes that we are trying to achieve with this data science product. For example, if we are building a model to recommend new products to our customers on our website, we are presumably doing so with the objective of increasing sales. While the accuracy of our model will affect that, model accuracy itself is not our target.

Once we know what success is, we need to find the quickest route to what a solution could look like, an MVP (or Minimal Viable Product) or “skateboard”. The skateboard analogy (https://blog.crisp.se/2016/01/25/henrikkniberg/making-sense-of-mvp) is helpful to keep in mind.

The idea is that if you need a means of transportation, a car is a great solution; but it will take a long time to build and it won’t provide value until all the requisite parts are in place. A skateboard, while lacking a lot of features and benefits of a car, is quick to build and is better than having no means of transportation at all. Plus, it gives us an immediate opportunity to get feedback.

In the case of a data science solution, a skateboard might be a few lines of simple logic, a basic model, or even mocked up outputs that allow us to get initial feedback. Our goal is not optimal performance, but to get data that informs if we are on the right track, need to pivot, or need to fail fast and move on to something else altogether. For data science, a pivot might mean we need to find a new data source, try a new technical approach, re-shape the problem we are trying to solve, or display or communicate the results differently. Because we have established our success measure, with each iteration, we will be able to measure our progress and decide when to stop. And we will be able to do so in terms of business impact.

Final Thoughts

We have just skimmed the surface of a framework for driving impact with Data Science through choosing the right things to work on and delivering them iteratively. In future posts, we’ll dig into a few tactics that we’ve found helpful, as well as some common challenges (and tips to overcome them). In the meantime, check out a few of these resources that I’ve found helpful in my own journey:

Laura Klein’s book “Build Better Products: A Modern Approach to Building Successful User-Centered Products” is a very readable introduction to the human-centered design: https://www.goodreads.com/book/show/32856281-build-better-products

“The Lean StartUp” by Eric Ries started the lean startup movement and is a great first read on the subject: https://www.goodreads.com/book/show/10127019-the-lean-startup

There are numerous articles on the intersection between human-centered design, lean, and data science, and here are a few:

A great example of applying the lean startup strategy to data science: https://channels.theinnovationenterprise.com/articles/how-data-science-teams-can-win-by-behaving-like-lean-startups

A great session on applying lean to data science and how data science is a lot like building software: https://blog.dominodatalab.com/lean-data-science/

Some practical advice for those at the intersection of machine learning and product development: https://medium.com/google-design/human-centered-machine-learning-a770d10562cd

A nice example of how understanding the user can enable more impactful data products: https://hbr.org/2018/03/what-happens-when-data-scientists-and-designers-work-together

Creating a Data Science Practice with Impact

Scaling Data Science

Working on the Right Problems

Delivering Iteratively

Final Thoughts

Written by Michelle Keim