Embedding Data Science In Cross-Functional Teams

Mario Konschake
5 min readDec 21, 2018

--

TL;DR: In our machine learning based business, including data scientists in cross-functional product teams turned out to be very effective. However, it’s important to take into account how data science workflows can differ from traditional engineering workflows.

With the advance of Agile and the DevOps movement, cross-functional software development teams (squads) have become the de-facto standard in modern organizations.
However, not much can be found on how to successfully integrate data science teams into that mix. Here is a record of what has worked for us and what did not.

The Background

Data science can be everything from analytical one-off workloads to long-lasting machine learning projects; it can be at the fringe of a non-data driven business or at the very core of a product. Since different environments will require different approaches, I’ll first discuss our particular setup.

We are in the ad tech industry and our core products (demand-side and data-management platforms) are built on top of machine learning. Of the 20+ engineers in the team about one third are data scientists working with data sets with more than 100 billion rows.

While we have some ad-hoc data analysis workloads, most of the data science work we do is model building with the operational constraint of having to make 10k+ model inferences per second.

In general, we approach this work in four phases:

  1. Data exploration: Is this project even feasible? Can we find signal in the data?
  2. Prototyping: Putting a couple of cron jobs, python scripts and SQL together for an end-to-end prototype. These prototypes usually are tested in production for evaluation purposes.
  3. Implementation: Collaboration with software and data engineers to move the prototype into “stable” production, e.g. improving performance and adding monitoring.
  4. Evaluation and continuous improvement: Integrating new data sources, model tuning, experimentation and adjustment for new business cases.

Functional Data Science Team

When our team started to grow and we had to think about hierarchy for the first time, we opted to work in functional teams and created a 5 - 8 person data science team, a data engineering and a backend engineering team.

Having a dedicated data science team likely had a positive impact on our identity and culture as a machine learning company and helped a lot with hiring.

However, some disadvantages became apparent over time. Data scientists complained that they often feel like a little annoying sibling, whenever they ask for support from data or software engineers, while the engineering teams struggled with prototypes, which they did not fully understand or which were impossible to implement with sufficient performance or reliability.

Since our organization was centred around data science, we also lacked the control mechanisms often found in a product-centred organization and data scientists sometimes got lost in rabbit-holes instead of continuously delivering value to our customers.

Because we were offering a machine learning B2B product, our customers were primarily interacting with the data science team directly and the other teams didn’t have as much exposure to business, leading to increasing disengagement and a lack of purpose.

Functional separation of data science from engineering teams creates long feedback cycles and common handover problems.

Cross-Functional Teams

The issues we had with the functional data science team, together with a variety of other problems made us reconsider our approach and we eventually moved from functional to cross-functional teams (squads) and embedded 2 - 4 data scientists in each of the teams which owned a machine learning product.

The results were stunning. We immediately saw the nature of the conversations between data scientists and other engineers change. We saw people debating passionately and constructively who rarely talked to each other before.

Data scientists profited from the chance to step in early when changes were discussed which could have impacted their models. On the other hand, software and data engineers began to understand the needs of the data scientists and our customers better which lead to a whole new level of engagement within the teams.

By embedding data scientists into cross-functional teams, the teams become empowered to deliver solutions autonomously and develop a strong sense of purpose.

Overall, the introduction of cross-functional teams has been a success for us. However, a few factors may have contributed significantly.

  • All our data scientists have a T-shaped skill set and are comfortable working with the Linux shell, writing code and working with production databases, which made it easy for them to interact with software and data engineers. Almost all of the data scientists joined us right from academia and were very open to jump on any problem placed in front of them.
  • Although the data scientists worked in cross-functional teams we kept a weekly data science meeting which helped a lot with knowledge transfer and maintaining an identity as a data scientist.
  • Even though we already have very lean processes we allowed even more slack for data science work. I.e. we sometimes had the data scientists working together with the rest of the team on a feature with short iteration cycles while at other times we allowed data scientists to branch out to explore a topic for a couple of weeks without demanding frequent progress reports.
  • We consistently hear from almost everyone who joins from another company or who left for another one, that the collaborative atmosphere within our team is outstanding.

Conclusion

Embedding data scientists into autonomous and cross-functional teams has been an overall success for us and solved a lot of the issues we experienced with functional teams while creating almost no new problems.

However, it’s hard to say how much of our own experience depends on the very nature of our product which has machine learning at its core and the people we have on the team. Your mileage may vary.

When embedding data scientists into cross-functional teams, it is important to respect the different nature of their work and do not necessarily impose software development process on them without making adjustments. Maintaining a forum like regular meetings where data scientists from different teams can meet to talk and share knowledge has also shown to be important.

When we were looking for information on how to embed data scientists into other teams a couple of years ago, there was very little information available. Fortunately, this has changed since.

I like to thank Michael Kaminsky and Paul Illg for proofreading my draft and Elisabeth Bommes for letting me use a picture of one of her notebooks.

--

--