The Future of Machine Learning Will Include a Lot Less Engineering
AI is a systems engineering problem.
Building a useful machine learning product involves creating a multitude of engineering components, only a small portion of which involve ML code. A lot of the effort involved in building a production ML system goes into things like building data pipelines, configuring cloud resources, and managing a serving infrastructure.
Traditionally, research in ML has largely focused on creating better and better models, pushing the state-of-the-art in fields like language modeling and image processing. Less of a focus has been directed towards best practices around designing and implementing production ML applications at a systems level. Despite getting less attention, the systems-level design and engineering challenges in ML are still very important — creating something useful requires more than building good models, it requires building good systems.
ML in the real world
In 2015, a team at Google [1] created the following graphic:
It shows the amount of code in real-world ML systems dedicated to modeling (little black box) compared to the code required for the supporting infrastructure and plumbing of an ML application. [1] This graphic isn’t all that surprising. For most projects, the majority of headaches involved in building a production system don’t come from the classic ML problems like over- or under-fitting, but from building enough structure in the system to allow the model to work as it’s intended.
Production ML systems
Building a production ML system comes down to building a workflow— a series of steps from data ingestion to model serving, where each step works in tandem and is robust enough to function in a production environment.
The workflow starts from some data source and includes all of the steps necessary to create a model endpoint — preprocessing the input data, feature engineering, training and evaluating a model, pushing the model to a serving environment, and continuously monitoring the model endpoint in production.
The feature engineering > training > tuning part of this workflow is generally considered the ‘art’ of machine learning. For most problems, the number of possible ways to engineer features, construct a model architecture, and tune hyper-parameter values is so vast that data scientists/ML engineers rely on some mixture of intuition and experimentation. The modeling process is also the fun part of machine learning.
Modeling vs engineering
This modeling process is largely unique across different use cases and problem domains. If you’re training a model to recommend content on Netflix the modeling process will be very different than if you’re building a chatbot for customer service. Not only will the format of the underlying data be different (sparse matrix vs text), but the preprocessing, model building, and tuning steps will also be very different. But while the modeling process is largely unique across use cases and problem domains, the engineering challenges are largely identical.
No matter what type of model you’re putting into production, the engineering challenges of building a production workflow around that model will be largely the same.
The homogeneity of these engineering challenges across ML domains is a big opportunity. In the future (and for the most part today) these engineering challenges will be largely automated. The process of turning a model created in a Jupyter notebook into a production ML system will continue to get much easier. Purpose-built infrastructure won’t have to be created to address each of these challenges, rather the open-source frameworks and cloud services that data scientists/ML engineers already use will automate these solutions under the hood.
Sourcing data at scale
All production ML workflows start with a data source. The engineering challenges involved with sourcing data are usually around scale — how do we import and preprocess datasets from a variety of sources that are too large to fit into memory.
Open-source machine learning frameworks have largely solved this problem through the development of data loaders. These tools (including TensorFlow’s tf.data API and the PyTorch DataLoader library) load data into memory piecemeal and can be used with datasets of virtually any size. They also offer on-the-fly feature engineering that can scale to production environments.
Accelerating model training
A lot of work in the ML community has gone into reducing the time required to train large models. For large training jobs, it’s common practice to distribute training operations across a group of machines (training cluster). It’s also common practice to use specialized hardware (GPUs and now TPUs) to further reduce the time required to train a model.
Traditionally, revising the model code to distribute training operations across multiple machines and devices has not been straightforward. To actually see the efficiency gains from using a cluster of machines and specialized hardware, the code has to separate matrix operations intelligently and combine parameter updates for each training step.
Modern day tools have made this process much easier. The TensorFlow Estimator API radically simplifies the process of configuring model code to train on a distributed cluster. With the Estimator API, setting a single argument automatically distributes the training graph across multiple machines/devices.
Tools like AI Platform Training offer on-demand resource provisioning to train a model on a distributed cluster. Multiple machines and device types (high-performance CPUs, GPU devices, TPUs) can be provisioned for a training job with a bash shell command.
Portable, scalable, and repeatable ML experiments
Creating an environment that allows for both rapid prototyping and standardized experimentation presents a litany of engineering challenges.
The process of hyper-parameter tuning (changing the values of model parameters to try to lower the validation error) is not reliable unless there’s a clear way to repeat past experiments and associate model metadata (parameter values) with an observed evaluation metric. The ability to iterate quickly and run efficient experiments requires training at scale, with support for distribution and hardware accelerators. In addition, the process of experimentation becomes unmanageable if ML code is not portable — where experiments cannot be replicated by other team members/stakeholders and models in production cannot be re-trained as new data becomes available.
Personally, I work on the team building containers for AI Hub and we’re working to help absolve some of these challenges. We build high-performance implementations of ML algorithms (XGBoost, ResNet, etc) as Docker containers. The containers offer native support for AI Platform and save model metadata by default, offering a repeatable process to run experiments. The containers support distributed training and can be run with GPU or TPU devices. They’re also portable — the containers can run anywhere and by anyone as long as Docker is installed.
Serving infrastructure
Production ML systems require scale on both ends — scale for sourcing data and model training, as well as scale for model serving. Once a model has been trained it has to be exported to an environment where it can be used to generate inferences. Just as a consumer website needs to handle large fluctuations in web traffic, a model endpoint has to be able to handle fluctuations in prediction requests.
Cloud tools like AI Platform Prediction offer a scalable solution for model serving. The elastic nature of cloud services allows the serving infrastructure to scale up or scale down based on the number of prediction requests. These environments also allow the model to be continuously monitored, and test procedures can be written to inspect the model’s behavior while in production.
The future of better ML systems
In the future, building ML products will be more fun and these systems will work better. As automated tools around ML continue to improve, data scientists and ML engineers will get to focus more of their time on building great models and less of their time on the tedious but necessary tasks surrounding production ML systems.
References: