How to build effective communication between data science and the business
Communicate tips for working effectively with business stakeholders and delivering value as a data scientist
Our educational system tends to emphasise hard skills and after coming out of university, my primary focus was to learn as much as possible about data practices and techniques and excel in my technical skills. I’m still a geek and value academic knowledge, but after working in various industries I came to realise that soft skills are critical to unlocking the value of technical skills. Warren Buffett, an investor and a billionaire said in an interview that “the one easy way to become worth 50% more than you are now is to hone your communication skills”. Communication was a common area of improvement amongst my past performance reviews and I think others will relate with me given that LinkedIn shared that communication was ranked first in their skills gap list in the US. At Tide, I have been fortunate enough to work with people who helped me transform this weakness into a strength and hence, I would like to share in this article the main tips that helped me become a better communicator.
Communication framework
We, data people, ask business people to be more data-driven, but are we business-oriented?
When communicating in general, it’s important to start with why as Simon Sinek said in his famous Ted talk. But what is the ‘why’ when it comes to machine learning projects? When becoming a data scientist, we are often told that this is training models and validating them. However, this is really the ‘how’, not the ‘why’. So what is the ‘why’ behind data science projects? The ‘why’ is always related to achieving some kind of business objective. Whether it’s acquiring more customers or increasing process efficiency with automation, there is always a business objective. It’s not the model training in itself that creates value, but the use of the model. This shifts our mindset from the output to the outcome as Joshua Seiden well explains in his book “Outcomes Over Output”.
How do we put this into practice? The pyramid principle is a great method to do this. We can start with the ‘why’, capturing what we want to achieve (e.g. optimise invoice collection strategies for our members to increase NPS score). Later, we can continue with the ‘how’, addressing the different approaches that we can follow to address this challenge (e.g. automate invoice chasing problems, predicting risky invoices, recommending collections solutions). Finally, we can elaborate on each solution and include more details (e.g. the expected impact, the deliverables, the project complexity, the experimentation technique). With this top-down structure of thoughts, the listener understands as quickly as possible the relevance of every argument and can actively direct the conversation if needed.
Transparency in projects
It’s not rare for models or insights to not be used and to be left forgotten, and from experience, misalignment between business and data teams is one of the main reasons behind this. A recurring mistake in these data science projects is not getting in touch with the business from the ideation stage and this usually leads to a lot of reworking to operationalise any models. There are various topics that a data scientist should discuss with the product team for a decisioning project:
- Definition of Success — A project should always have specific objectives that can lead to business value. Hence, probably the most important topic that the data scientist should have with the business is what success for this project looks like and how it will be measured. This should also determine the model objective and the definition of any proxy (if different to success criteria) for the model to target. There is a great article about how someone can frame a machine learning problem here. After a hypothesis is tested, we might discover trade-offs in the tracked metrics and the data scientists can help the business understand the implications of the trade-offs.
- Operationalisation concerns — As discussed before, a model primarily generates value when it is operationalised. In order to avoid delays in the project delivery, we should ask the following early on: “How will a certain model be used?”, “How often do you need the predictions refreshed?”, “Who is going to consume this data?”. Otherwise, wrong assumptions can lead to an over-engineered architectural solution and a wrong training dataset.
- Datasets & interpretability — Features are attributes of entities that might help us predict a certain event. Preparing a dataset for training a model phase requires a lot of domain knowledge and bringing in several people with deep experience in the particular domain during this phase is recommended. Machine learning to be applied in contexts that are risk-sensitive (which is a regular occurrence in fin-tech companies like Tide), it’s useful for the stakeholders to be able to disagree or agree with the model without being ML experts. In other words, to have interpretable features, which a business expert will be able to sense-check, helping to build trust between ML and the business.
- Lean practices —It’s not only important to understand what success looks like (i.e. the direction where we need to go) but also the minimum scope that’s necessary in order to produce something valuable. Since people from the business side often lack the technical background, the data scientist should bring the minimum viable accuracy to the table, which can sway the effort and decisions on the model techniques. This is essentially the minimum accuracy that the model needs to achieve for there to be a reasonable case to put a model into production. It does not represent the desired accuracy but more of a minimum boundary that gives the project a go / no go decision. In these discussions, a data scientist needs to stress to stakeholders that the chosen minimum accuracy might not be achieved by the acquired data and the trained model and hence an acceptable time-box for this phase (usually 1–2 sprints) should also be agreed.
- Retrospective — Communication is a two-way street and we are big fans of agile practices and feedback cycles at Tide. We invite stakeholders in retrospective meetings to reflect together on what went well or wrong in the projects. For example, challenging how certain blockers or limitations were handled can create new practices and tips for the interaction of the data team with the product team.
Creating a data culture
Investing time in creating a data-minded company goes a long way and should enable easier communication across the data department and others at scale. Data teams were introduced relatively recently in most companies and they are not often well integrated with the rest of the business compared to other more established functions, such as finance or legal. In addition to this, they don’t have well-known practices and processes. There are various ways to bridge this gap:
- The data team can collectively start to document a lot of agreed processes (e.g. how do we test a hypothesis, how do we use statistical models in risk policies) and common terminologies (e.g. a feature, a machine learning model, basic model performance metrics). In this way, a data scientist does not need to go over the same concepts several times.
- Training or “lunch and learn” type of events can also help with knowledge sharing (e.g.what are the lifecycle stages of a machine learning project and what’s the role of each team at each stage). These usually spark a lot of attention in the data science field from teams that haven’t had the chance to collaborate with the data department yet.
- Blocking time for communication in the form of data Q&A sessions can also bring data teams closer to the business needs.
- Lastly, other guerrilla tactics, such as posting insights and measured impact from data deliverables in work communication platforms, can potentially break the invisible silos between us and the rest of the organisation.
Special thanks to our Director of Data, Hendrik and our VP of Credit, Amit, who helped us improve our ways of working with the business stakeholders and develop the soft skills in the data science department.