Embracing Data science in business domains: an accelerator or a blocker?
Digitalization, data science, and machine learning (ML) have been quite a hype for the past few years and it will continue to be so in the coming years. The fundamental enablers for this popularity trend include (not limited to): advances in connected everything, fast-growing cloud computing technologies, and data science capability maturity within and across industries. Being a practitioner from the early starting point of this journey, I am fortunate to work with different data types and technologies within different business domains. This article tries to distil some of the key learnings on the path.
Before diving into data science investments, the first key question to ask is what does data science within your business context mean and what do you want to use data science for? Do you want to use data science to push an epsilon part of the business or do you position data science as a strategic importance to enable future business innovation and offering? Do you want to hire a few super-intelligent people to let them figure out what data science can do to the business or because there is a high business value at stake so that data science can be the right tool to reach the goal? Unfortunately, there are no unified solutions that fit every business and it is a journey to find out what is the best that works within your company. As being said, some key components can be great companions for successful data science projects.
Business-centric: Dream big, act small and target business impact
What motivates people, especially highly motivated and educated people, most? Recent studies show an engaging and motivating vision. Do not rush into data science solutions before understanding what the problem is and how it can bring value to customers and society. Take the time to break the vision down into manageable pieces together with the business owners.
Now let us suppose that there is a high digitalization agenda within your organization and there are a critical mass and skills to carry out data science initiatives. The business side is also ready to take challenges that could benefit from data science technologies. The next big step on the way is to find out the rhythm to bring business and data science together.
This sounds like a simple task, but in reality, it turns out to be very hard and many initiatives did not survive this phase. Business owners bring deep domain knowledge and aim to see the real business impact; Data scientists tend to focus on what are the latest and coolest techniques that can tackle the business challenges. They both seem to have what each other needs, but there is a critical piece missing here — the communication and education between these two parties. Data scientists need to have a proper level of business knowledge while the business side needs to understand what is under the hood for the fancy solutions in order to gain trust and accountability. It is of critical importance that both parties can commit resources to do what is needed. One approach really helps us a lot is: at the early phase of the project, it is expected that the business commits resources to front-load the business context and set up the scene. As the project progresses, commitment from the data science team shoots up and the business side will pick up data science knowledge. This process can iterate a couple of times before settling to an equilibrium.
Solid data foundation: designed for intelligence
Businesses are generating data and the data capture growth rate is faster than ever. Laying down the proper data foundation so that makes the data impactable is another big challenge. This somehow explains why many data scientists need to spend 80% of their time to understand, analyze and process the data before hitting the modelling part. It is important that data availability, accessibility, interpretability, interoperability and reusability are brought on a high priority agenda to enable the sustainable data science-backed up solutions and also general BI consumption. Within Grundfos, data lakes, data quality tools, data catalogue and FAIR principles are widely adopted. This, of course, cannot be done alone. We need a proper business process to capture the data (preferably designed for data science), data pipeline monitoring capabilities to keep an eye on the data quality and feedback from data scientists and end-users on how to improve the data-foundation in a continuous manner.
MLops: making it a culture
Once the data is in the hands of the data scientists or machine learning engineers, magic is expected to happen when there are a clear business goal and solid data foundation. But developing data science solutions share some similarities but yet different from software development. Two of the most critical differences that separate ML solutions to conventional software development are that the data can change, and models need to be retrained/ maintained/governed to keep the fidelity of the solution. Thus, there is a need to move from DevOps to MLops/AIops [2]. This enables data science experimentation before production and takes data and model management as an integral part of the data science solution. Tools like Azure Data Factory, Azure Machine Learning Services, Azure DevOps, Databricks, MLfow, Kubeflow, GitLab can be good choices depending on the scenario.
Summary: slow down to run faster
In summary, data science is gaining more trust in different business domains albeit a large portion of them fail to ever get into production. We need a better understanding and process to understand what it takes to drive a successful business use case that can harvest the data and data science capabilities. A clear objective, measurable performance indicator, actionable plan, good team composition, and agility in the ways of working can be great enablers. The business side needs to invest in the right skills before rushing into production. Slow down to clear the technical debts as early and as much as possible before rushing into production. Then you will have a better foundation to deliver a scalable and manageable solution to face fast-changing customer demand and provide a better experience for the end-user.
By: Lishuai Jing
Senior Data Scientist, Grundfos
About me
Lishuai Jing is working with advanced statistical inference methods, machine learning and deep learning techniques, and general AI in the market-leading pump manufacturing company in the world, Grundfos. He has a PhD diploma in statistical signal processing and wireless communication from Aalborg University, Denmark. Before joining Grundfos, he was a researcher on 4G/5G and IoT communication technologies. He advocates a data and AI-driven approach to improve business intelligence, operational efficiency, productivity and customer satisfaction.
[1] https://www.go-fair.org/fair-principles/
[2] https://github.com/microsoft/MLOps/blob/master/MLOps%20whitepaper.pdf