Deploying Apache Spark Supervised Machine Learning Models to Production with MLeap

IMPACTS

Our architecture to host supervised machine learning models with MLeap benefits Kount’s external and internal customers by:

  • Reducing average response time to generate transactional predictions from 19.27ms to 7.00ms (while 99th percentile dropped from 37ms to 16ms).
  • Increasing the scalability and reliability of the feature extraction data pipeline.
  • Enhancing certainty through model governance and lifecycle management.
  • Using hundreds of machine learning features and tens of millions of transactions for training.
  • Abstracting model serialization thus enabling portability.

SOLUTION

  • Transaction predictions load balance across a cluster of API servers hosting MLeap runtimes.
  • Transaction prediction process independently scale separate from the ML data processing pipeline.
  • API servers that host the MLeap runtimes self-configure through auxiliary microservices and a central SQL model governance database.
  • Spark, MLlib, YARN, and HDFS handle large data sets through distributed processing and storage.
  • MLeap integrates directly into Spark as a databrick and provides its own serialization mechanism without the need to leverage any external libraries.

PROBLEM

Since introducing Kount Boost Technology™ in October 2017, Kount has been researching improvements to the Boost supervised machine learning model leveraging distributed tools such as Apache Spark and HDFS. Consequently, we faced the challenging opportunity of architecting a technical solution to bring a Spark-generated machine learning model into our production environment.

Using a SparkContext in a production environment has a large overhead and is not efficient for performing transactional operations. Through research around how to “productionize” Spark-generated machine learning models, we discovered MLeap. MLeap is an active open source project designed to solve this precise use case. Copying blatantly from MLeap’s documentation: MLeap is “a common serialization format and execution engine for machine learning pipelines. It supports Spark, Scikit-learn and Tensorflow for training pipelines and exporting them to an MLeap Bundle. Serialized pipelines (bundles) can be deserialized back into Spark, Scikit-learn, TensorFlow graphs, or an MLeap pipeline for use in a scoring engine (API Servers).” Prior to using Spark to generate the Boost supervised machine learning model, we used Python3 with Scikit-learn. This approach worked well for our first iteration (and frankly is great for academic, research, and educational purposes), but it quickly ran into limitations like lack of portability, inability to scale, lack of multiple model support, and limited model governance. Leveraging our data warehouse and Python together ended up causing the entire data pipeline to become quite fragile, which caused the training lifecycle to become lengthy and unpredictable. The first iteration was our baseline for our continuous improvement effort:

  • “Real-time” / low latency transaction model predictions
  • “Real-time” state change detection
  • Support blue-green deployments
  • Robust machine learning model governance
  • Efficient scalability
  • Maximize reliability
  • Enhance monitoring and metrics
  • KISS principle

EVIDENCE

Response Time

Using Python3 with Scikit-learn, the following response time statistics and accompanying distribution for generating transactional predictions were gathered from our production environment on 6/4/2018 from 12am until 12pm PT:

In the new world of utilizing Apache Spark on top of HDFS for building Boost’s supervised machine learning models complemented with MLeap to enable the production use of those models, we designed several proof of concept architectures and load tested them to verify that the model prediction response time metrics would (hopefully!) beat our first iteration response time metrics. The fastest results were promising! In fact, they were AMAZING:

This was a significant turning point in our confidence to deliver supervised machine learning model predictions to our customers faster than we’ve ever done it before.

Using MLeap, the following response time statistics and accompanying distribution for generating transactional predictions were gathered from our production environment on 7/30/18 from 12am until 12pm PT:

SCALABILITY, RELIABILITY, AND MODEL GOVERNANCE

Upon designing the new architecture, we immediately ran into a scalability concern: the then-current MLeap-serving architecture was limited to load just one MLeap Pipeline (MLeap’s name for a “model”) into a running instance of the MLeap Runtime. On top of that, we could not configure what interface and TCP port MLeap runtime was bound to on startup. This created a problem with hosting multiple models on one MLeap API server, or at least supporting the ability to migrate from one model to another without any downtime to our customers. Due to this lack of functionality, Kount submitted a pull request to make these two parameters (interface / port) based on environment variables, which satisfied our scalability requirements.

Without any critical path issues, the following architecture highlights:

  • How the “Transaction Prediction Process Instances” cluster is completely separated from the “Feature Extraction Processor Instances” cluster.
  • Multiple models can be loaded into a single “Transaction Prediction Process” instance allowing for blue-green deployments, providing the ability to switch production traffic without downtime to a new supervised machine learning model.
  • Model governance SQL database provides authoritative development, implementation, validation, approval, modification, retirement, and inventory of machine learning models used throughout the entire ecosystem.
  • From the load balancer’s perspective, each TCP port (e.g. 10000) MUST BE loaded with the SAME model as all other MLeap instances running on that TCP port. Load balancer health checks are designed to make a prediction on a test transaction, performing validation of that backend server’s availability.
  • Load balancer deployed in a “sidecar proxy” implementation. The Feature Extraction Process instance does not care where MLeap is hosted — each transaction’s model prediction request is destined for localhost.
  • MLeap Runtimes are running in a Java8 Virtual Machine (JVM) with startup GC-optimized parameters loosely based on this article.
  • JVM metrics gathered with jolokia on a per port basis.

MLEAP INTEGRATION WITH APACHE SPARK

Using Python3 with Scikit-learn, we opted for using joblib to pickle the Python3 object saving it to disk, making the supervised machine learning model portable. Joblib is great — as long as the system doing the exporting and the system doing the importing have the same version of joblib. This added to the complexity of upgrading Python3 module versions.

Through the easy integration of MLeap with Apache Spark, our Data Scientists were able to create a common MLeap Bundle, which “provide a common serialization format for a large set of ML feature extractors and algorithms that are able to be exported and imported across Spark, Scikitlearn, Tensorflow and MLeap” (MLeap documentation). In order to start a PySpark Jupyter notebook with Python3 and MLeap support for research/design purposes, it’s as simple as including the MLeap databrick package in your pyspark2 command. For example, here is how you might use that configuration with MLeap v0.12.0:

MLEAP PITFALLS

I got your attention, right? Well, even though MLeap is relatively new on the open source block (their first release was in September 2016), the resulting product is nothing short of impressive. The only thing I would change about MLeap is maybe make the pronunciation more obvious… is it MLeap? M’Leap? M-Leap? MLe-ap? I guess without it being documented, we’ll never know!

ABOUT THE AUTHOR

Noah Pritikin
Noah Pritikin is a Site Reliability Engineer for Kount.
 Kount helps businesses boost sales by reducing fraud. Our all-in-one, SaaS platform simplifies fraud detection by applying patented machine learning through Kount’s proprietary platform offering maximum protection for some of the world’s best-known brands. Companies using Kount accept more orders from more people in more places than ever before. For more information about Kount, please visit www.kount.com