Fostering excellence in data science

Published in

Data Science at Microsoft

6 min readMar 17, 2020

One challenge for data science teams is the ad hoc nature of our work. Presented with one-off problems, we tend to produce one-off solutions. Every analysis is separate and self-contained, and every model is self-sufficient. But when put into a broader perspective, our work often takes on new meaning: We amplify our impact by accumulating insights over time, sharing them with others, and exploring the ideas that result.

In fact, a big part of my team’s value at Microsoft is our wealth of institutional knowledge. Over the years, we’ve answered so many questions about the business that we’re able to draw upon that experience to better frame new questions and accelerate our analyses.

There is a cost, however. To enable that broader perspective, we need to follow best practices for using source control, authoring reproducible analyses, participating in peer reviews, and more. Each of these activities takes time, which could otherwise be spent answering new questions or building new models.

So, within my org, we’ve landed on a lightweight way of reinforcing the importance of these and other best practices. We’ve defined a set of standards, based upon operational aspects of our work. In many respects, they form a capability maturity model for data science. Simply put, this model of organizational development posits a direction of steadily increasing sophistication and the capabilities that come with it. An important aspect of reaching more advanced stages of development is the ability to reliably and consistently deliver results. Standards, because they function as a set of measurable criteria, are useful in helping the organization operate on a consistent basis.

What standards do

Standards fulfill several roles. First, they allow us to focus on capabilities that are most important for our success. Second, they enable us to set benchmarks. Third, they enable us to understand our progress. I address each of these in turn.

Focus

Selecting certain standards means making the decision to include certain ones while excluding others. The number of standards that can be selected is practically limitless. Having too many, however — not to mention having the wrong ones — limits success by diluting organizational focus. And focus is one of the most important organizational facets to achieve, because it enables a team to marshal its powers around the pursuit of results that materially move the business forward — or as we say at Microsoft, achieve business impact.

For example, one of our standards is around data visualization. We know that data scientists have a wealth of options and resources available as they put together visuals to accompany their data science work. We also know that visualizations are a critical element in communicating data science outcomes. Visualizations must be understandable by being clear, concise, and relevant. They must communicate a great deal of information in a compact space. They must help impart knowledge not possible with words alone. But beyond this, they represent an opportunity to put our team’s visible imprimatur on our finished work, by having our data visualizations reflect our team’s branding, much as modern corporations and other entities brand their products and services. In this way, stakeholders see a mark of quality reflected in the work itself and reified by our visualization branding.

Benchmarks

Once specific standards are selected, benchmarks enable us to define what success looks like for each of them. These barometers can result in some of the most interesting discussions a team can have, because at their essence they are conversations about what separates good from great performance. And making this distinction is especially important because of the necessity of setting high — and yet achievable — goals.

Targets that are not actually achievable, or perceived as not achievable, can dampen enthusiasm and demotivate. But ones that are high and yet achievable can help the organization go beyond what its members might think is the limit of what they can accomplish, to deliver something with great impact.

For example, we ask that each team member achieve one standard per quarter. This reflects the need for focus, enabling meaningful achievement in a standard instead of mere glancing familiarity with it. We could have chosen meeting only two standards per year or more than one per quarter. But we believe that a benchmark of one standard per quarter sets a high bar for achievement (four focus areas over the course of a year) while simultaneously enabling mastery.

Progress

Finally, measuring progress is essential. It not only helps to understand actual advancements made, but also provides the basis for a feedback loop to help in the next round of selecting standards and setting benchmarks for them. Perhaps some standards are achieved and move the organization to a higher level of capability to an extent that there’s no need to call them out as standards to focus on any longer. Perhaps some are not achieved, either because the organization couldn’t bring enough resources to them or was prevented from undertaking them because of unexpected demands or other developments taking precedence.

We report on progress toward standards as part of our regular business reviews. Instead of focusing on what specific individuals did or didn’t do in these forums, we look at the standards and team members in aggregate, reporting on the percentage of all team members who met at least one new standard over the course of each quarter.

The standards we use

Instead of being mandated by leadership, our organization created its standards from the ground up. This is not the only way standards can be established. But in our case, members of the team saw that having a set of standards would help the organization achieve its objectives, and they took it upon themselves to discuss and agree on the need for standards, select meaningful ones, determine benchmarks, and measure progress. My leadership team strongly backs their work, and recently we embarked on our second generation of standards — now up to 17 from an initial set of 12 — after learning from the first generation, many of which carried over.

Our 17 standards currently include:

Analytical: Authoring or co-authoring internally published data science work.
Code reviews: Having one’s own code reviewed and reviewing others’ code.
Communication: Creating internal communication pieces or templates.
Continuous learning: Achieving certification.
Data distribution: Building and signing off on internal data contracts.
Data visualization: Presenting or publishing finished work with our team’s data visualization templates.
Democratization: Publishing or presenting internally developed data science work.
Hackathon: Participating in an internally sponsored hackathon.
Knowledge management: Owning a certain number of entries in our internal knowledge management repository.
Onboarding & reference: Publishing self-serve reference documentation on our internal content site.
Operational excellence: Keeping work progress updated on our internal work tracking platform.
Peer reviews: Reviewing others’ data science projects and submitting one’s own projects for peer review.
Privacy and compliance: Writing privacy and compliance application or reference docs on data science models.
Retrospective: Participating in project post-mortems.
Source control: Adding to our body of stored canonical data queries.
Reproducibility: Creating data science projects with tools that foster reproducibility.
User readiness: Helping stakeholders use our team’s output with demos and training.

Each standards area has its own steward who is the key point of contact for the standard as well as its spokesperson, and the person who determines and awards recognition for meeting the standard. Stewards for each area meet on a regular basis to govern and run the process as a whole, which is in turn overseen by a program manager responsible for the entire undertaking.

Team members who achieve a standard receive a badge they can affix to their laptops or use elsewhere to signify their accomplishment. These badges have proved to be popular, and collecting them has created an incentive on its own, as well as driving awareness across the team not only of the standards themselves but of the accomplishments of individuals in meeting them.

The point of it all

What have these standards done for our organization? They’ve driven engagement among our data scientists, while providing an objective yardstick to measure our progress toward becoming a more capable organization. They’ve helped our data scientists to work together to determine what they believe is most important to achieve in our profession. And they’ve enabled us to understand our own progress toward achieving these capabilities on an ongoing basis, to the point we can consider them a reliable set of proficiencies we possess as an organization.

Of course, the point of the program is not simply the collection of a set of stickers for one’s laptop. Instead, it is the ongoing application of these standards in our day-to-day work that helps each of us grow as individual data scientists, and collectively, our organization to grow in its data science capabilities. As we’ve already experienced over two generations, individual standards will come and go, but it is the spirit of developing, striving toward, and achieving them that we believe is one of the things that will help fuel our organization’s ongoing contributions to business success.