7 Values for Data Scientists to Live By

Matt Dzugan
project44 TechBlog
Published in
8 min readMar 7, 2023
Photo by Giammarco Boscaro on Unsplash

Most professional Data Scientists I speak to, tend to agree on the fact that for modeling, statistics and programming — there’s a known path to mastering these skills. The soft-stuff that surrounds the technical work though is not so straightforward to learn. This article highlights the 7 values our team uses to guide our thinking.

Data science practice, especially in business settings, involves a lot of ambiguous challenges. Which model do we build, what is it for, what is it really trying to optimize, and how do we prioritize it? We regularly have to make decisions when there’s not a single technical answer to a question. Problems like choosing how to prioritize, when to call something done, and how to work with others are daily tasks for data scientists.

We have statistical models, mathematical concepts, and scientific processes to help us design the models and test the analyses, but we also need to have frameworks that guide us for these harder decisions. For our data science team at Project44, we created a set of values that we use for guidance in the parts of our work that are more than just science. This might seem like fluff, or it might appear irrelevant to grinding out models, but we think it’s meaningful for making us more effective and productive.

Why Values?

  • Values build a shared culture, which is especially important for teams with remote contributors. Values don’t replace interaction and shared work experiences, but they do help set shared expectations for how team members interact in their work lives. Because we’re not in the same room or even the same time zone very often, we need to be more explicit about how we treat each other and solve problems together. We can’t just rely on osmosis to create a positive and productive culture.
  • Values help shape coaching. Because we’ve formally articulated a list of values, our team members know the standard they’ll be held to. Nobody likes surprises when review time comes around, and people feel more empowered and have more agency when they have clear, structured guidance on how to respond to challenges or problems.
  • Values provide a model for working with other teams. When we’re collaborating with others, we can use our values as a guidepost for what we should work on and how we should approach our work. Having a framework for how we do what we do (and why) helps us explain our choices to other teams and gives us a shared language to use when negotiating priorities and goals.
  • Values help us constructively disagree. We don’t have to all be in lock step, but we do need to have a way to navigate disagreement. We need to be able to make choices that not everyone agrees with, without a flame war breaking out. Having a set of shared values on which we base our arguments holds us accountable to each other.

What about when values conflict?

These values live in tension with one another. We may be curious about a bug that requires further investigation, but we’re accountable for an impending deadline. A customer may demand a specific outcome, but delivering it may require iteratively building upon simpler solutions. This is inevitable, and we will still have to make hard choices. Clarity about values doesn’t mean that we automate our decision-making. But it does mean we have transparency about what tradeoffs we are choosing when we pick a direction. If we decide that we need to put outcome-driven first, in a particular case, then it is clear that this is not free of cost, and all parties involved in the choice will be able to see that.

Why bother?

When we neglect to explicitly clarify values, we assume that our expectations of a workplace are universally obvious. This is incorrect, no matter what those expectations are. Your colleagues, especially in a global business world, have different experiences from which they may have learned different ways of succeeding in the workplace and practicing data science. If we want our workplace to operate a certain way, we need to be explicit and clear about it. Otherwise, we’re just hoping things turn out well — and may end up disappointed.

The project44 Data Science Values

Accountability

We continuously monitor and remain accountable for the accuracy of our models and analyses. We also strive to be accountable to the promises we make to our peers.

Most of the time when we think about monitoring the accuracy of models and analyses, we think about monitoring the accuracy of predictions or ETAs our models are generating. But as the saying goes “garbage in, garbage out”, it is no secret that building high quality models is based on using high quality data to train those models. Here at Project44 we think of accountability at every step within the process of developing and maintaining our models by not only monitoring the model accuracy but also monitoring the quality of data that is feeding our models. We created metrics to measure the quality of different data sources and built dashboards that help us continuously monitor and act on time when something looks wrong. — Sravan

Scientific Thinking

We take a data-driven & measure-first approach to development. We base decisions on empirical data and experimentation, to the fullest extent possible. We test our assumptions and make decisions that respond to the true facts rather than guesses or biases.

Customer value is central to everything we do here at Project44. Naturally when we started our project to revamp a key model, our first step was to define the metrics that best capture value for our customers. We continue to experiment and at each stage evaluate how it moves the needle of our metrics of interest. We prioritize approaches that help unlock value thereby focusing our efforts on areas with the best ROI. — Anju

Outcome-Driven Mindset

We constantly think about how this work generates value for customers. This means our solutions should be designed and optimized to improve customer outcomes, not necessarily just standard model metrics.

We find that our customers tend to use our estimated delivery dates (last mile) for two very different use-cases — their problem is either 1) converting customers despite competition which needs aggressive estimates or 2) conservative estimates for avoiding late deliveries for better customer satisfaction. So instead of providing one estimate focusing on accuracy as a metric, we provide two estimated delivery dates, letting the customer choose which one aligns with the outcome they are trying to achieve. — Swetha

Prior to project44, I worked on a project for a utilities company to monitor key parts of their electrical grid to avoid unexpected outages. Part of the success of the project depended on surfacing metrics that grid operators were familiar with and wereaccepted across the utility industry. This requirement informed our entire modeling effort and forced our team to think past standard classification and anomaly detection metrics. Most of the consumers of the models we create are humans, not machines, so interpretable outputs are key to getting engagement and suggestions. — Jason

Iterative Development

We don’t let perfection stand in the way of progress, we constantly look for any avenue to improve our products in an iterative manner. This requires a focus and reflection on how our current products measure up against customer outcomes.

Here at Project44, our goal is to help our customers generate value from our solutions as early as we can. To achieve this, we strive to develop solutions that start unlocking value for our customers. We define our north start metric and continue to iterate our solutions towards it. As our solutions continue to develop and mature, our customers get to experience value generation that improves over time. — Anju

At school we have all the data and requirements provided from the first moment. In the work environment this is not always possible, many times solutions should be created with different data from those used in real predictions, meaning the best way to learn about our data is to generate a first version/prototype of the model and then monitor the behavior of this approach, then iterate over time, learning more and more of what-works and what-doesn’t-work with the data. — Evelyn

Teamwork

We actively seek cooperation from other teams and support those other teams in meeting common goals. We enable our data science colleagues by challenging their ideas, sharing knowledge and pointing others in the right direction. Many hands make light work.

As data scientists, we are not necessarily experts in the subject-matter when we start to tackle a problem. Experience may lead to useful features which in turn yields better model performance. This needs to be uncovered first, though. To overcome gaps in your own knowledge, an exchange with others is part of the job — be it SMEs to understand the data and processes, or fellow data scientists to challenge your own assumptions. This feedback loop has helped me grow tremendously. I still remember an on-site visit to one of the largest marshalling yards in Europe. After having had a brief look into their data and prepared with questions and dashboards, we had the opportunity to talk with people involved in the planning and the assembly of trains, wagon disposition and other operational management. We left with insights we would not have gotten by simply sending an e-mail (and simply getting our questions answered). After this meeting, we had more certainty about the first sets of features for our model, and other avenues to explore in the data. — Steffen

Curiosity

We investigate data, code, and processes consistently throughout the life of a project. No stone is left unturned. We challenge our preexisting assumptions and explore creative and unexpected ways of attacking problems.

In school there are requirements and certain ways that we are instructed to solve a problem, in a professional setting there is more freedom to explore the appropriate solution. — Aylin

Simplicity

We strive for simplicity over complexity in design, implementation, and process. The simpler the solution, the easier it will be for our teammates and customers to understand.

At school, most of the time we want to solve problems with newest and complex algorithms, however, in the professional environment it is common to look for clear and simple solutions, since there are some problems where, based on data analysis and knowledge of the problem, it is possible use simple models or why not heuristics, since these give us a clear solution to the problem, on the other hand, the complex solutions have the disadvantage of losing clarity. — Evelyn

The creation of these values was truly an act of teamwork — so thanks to our whole team for pitching in. Special shout out to Stephanie and James for writing much of the introductory material

--

--