This is the first article in a series of three, which focus on production ML and the intersection between data science and engineering. The other two are Trawling Twitter for Trollish Tweets and Deploying an ML Model to Production using GCP and MLFlow.
One of the most exciting things in machine learning (ML) today, for me at least, is not at the bleeding-edge of deep learning or reinforcement learning. Rather it has more to do with how models are managed and how data scientists and data engineers effectively collaborate as teams. Navigating those waters will lead organisations towards a more effective and sustainable application of ML.
Sadly, there is a divide between “scientist” and “engineer”. A wall so to speak. Andy Konwinski, Co-founder and VP of Product at Databricks, along with others point to some key hurdles in a recent blog post about MLFlow. “Building production machine learning applications is challenging because there is no standard way to record experiments, ensure reproducible runs, and manage and deploy models,” says Databricks.
The genesis of many major challenges in applying ML today — whether that be technical, commercial, or societal — is the imbalance of data over time coupled with the management, as well as utilisation, of ML artifacts. A model can perform exceptionally well, but if the underlying data drifts and artifacts are not being used to assess performance, your model will not generalise well nor update appropriately. This problem falls into a gray area that is inhabited by both data scientists and engineers.
In other words, the crux of the problem is that the principals of CI/CD are missing in ML. It doesn’t matter if you can create a really good ‘black box’ model, if your environment changes, such as input data, and the model isn’t regularly assessed in the context of what it was built to do causing it to lose its relevance and value over time. This is an issue that’s hard to tackle because the people that are feeding the data in, engineers, and the people that designed the model, scientists, don’t have the happiest of marriages.
There are tangible examples of this challenge. Think about all those predictions saying Hillary Clinton was going to win amongst several other ML goofs. From self-driving cars killing an innocent pedestrian to prejudiced AIs, there have been some large missteps, which I would argue generally have origins in the gray area between data science and engineering.
That said, negative and positive alike, ML impacts our society. More positive, and slightly less commercial, examples include the electricityMap, which uses ML to map the environmental impact of electricity all over the world; ML in cancer research is currently helping us to detect several cancer types earlier and more accurately; AI driven sensors powering Agriculture towards meeting the global skyrocketing demands for food.
With that in mind, it’s critical to get production ML and more specifically model management right. However, coming back to the point, data scientists and data engineers don’t always speak the same language.
It is not uncommon for a data scientist to lack an understanding of how their models should live in an environment that continuously ingests new data, integrates new code, is called by end-users, and can fail in a variety of ways from time to time (i.e. a production environment). On the other side of the divide, many data engineers do not understand enough about machine learning to understand what they are putting into production and the ramifications for the organisation.
Far too often have these two roles operated without enough consideration for one another despite the fact that they occupy the same space. “That’s not my job” is not the right approach. To produce something that is reliable, sustainable, and adaptable, both roles must work together more effectively.
Scaling the Wall
The first step to speaking each other’s language is to build a common vocabulary — to have some kind of standardisation of the semantics, and therefore how the challenge is, or tangential challenges are, discussed. Naturally, this is fraught with challenges — just ask several different people what a data lake is and you’re likely to get at least two different answers, if not more.
I’ve developed common reference points that I call the ProductionML Value Chain and ProductionML Framework.
We’ve broken the process of productionising ML into five overlapping concepts which are too often considered separately. Whilst it may seem like introducing a holistic framework like this would increase complexity and interdependency — in practice those complexities and interdependencies already exist — and ignoring them is just kicking a problem down the line.
By allowing for consideration of neighbouring concepts in the design of your production ML pipeline — you begin to introduce that elusive reliability, sustainability, and adaptability.
The ProductionML Value Chain is a high-level description of what is required to operate a data science and engineering team for the purpose of deploying models to end users. There is naturally a more technical and detailed understanding — I call that a ProductionML Framework (some might call this Continous Intelligence).
This framework was developed after several rounds of experimentation with commercial MLOps tools, open source options, and the development of an internal PoC. It is meant to guide the future development of ProductionML projects, particularly the aspects of production ML that require input from both data scientists and engineers.
If you’re not familiar with those aspects, see data science in orange and data engineering / devops in blue.
As you can see, the “Training Performance Tracking” mechanism (e.g. MLFlow) and the Govern mechanism are centrally situated in this architecture. That is because every artifact, including metrics, parameters, and graphs, must be archived during the training and testing stages. Moreover, what is called Model Management is fundamentally tied to how the model is governed, which leverages those model artifacts.
The Govern mechanism combines artifacts and business rules to promote the appropriate model, or estimator to be more specific, to production while labeling others according to rules specific to the use case. This is also called model versioning, but the term ‘govern’ is used to avoid confusion with version control and emphasise the central role that the mechanism plays in overseeing model management.
A Golden Gun?
We’re all on this journey together. We’re all trying to scale the wall. There are a lot of great tools entering the market, but to date, no one has a golden gun…
MLFlow makes great strides from my perspective, it answers certain questions around model management and artifact archiving. Other products similarly address relatively specific issues — albeit their strengths may be in other parts of the ProductionML Value Chain. This can be seen in Google Cloud ML Engine and AWS Sagemaker. Recently, the beta version of AutoML Tables beta was made available by GCP but even that does not deliver everything required out of the box, albeit does come much closer.
With that continued disparity in mind, it is absolutely critical to have a common vocabulary and framework as a foundation between scientist and engineer.
Is the wall too tall? From my experience, the answer is no, but that’s not to say ProductionML is not complex.
This article is the first in a three-part series related to ProductionML. Stay tuned for the next two.
Obligatory James Bond Quotes
M: So if I heard correctly, Scaramanga got away — in a car that sprouted wings!
Q: Oh, that’s perfectly feasible, sir. As a matter of fact, we’re working on one now.
Perhaps that’s how you should get over that wall…