Deploying Machine Learning Models: Serving AI at Scale

John Mastro
Jan 30 · 5 min read

Our Sibyl API can serve AI predictions with a 50 millisecond response time. A developer calling the Sibyl API only has to specify an object ID (e.g., give me the prediction for person 0989k3dd84) instead of specifying all of the model’s input parameters. Sibyl supports many predictors with many versions each, and it offers data scientists a robust workflow for model deployment.

Introducing Sibyl

Image for post
Image for post
In this vague illustration of what it takes to build an AI model, Sibyl is focused on the last step (deploying AI models for production)

Creating and productionizing a useful AI system involves at least three stages:

  1. Prepare the data that the AI will learn from. This is more complicated than merely pointing an AI system at your database and will likely involve a Data Scientist carefully processing the data to maximize the AI’s ability to learn from it. This is a very interesting and subtle process, but its details are out of scope for this article.
  2. Build, train, and evaluate the model. This will likely involve a Data Scientist testing different algorithms and different parameters to those algorithms, and will in fact will likely be an iterative process interacting with step 1. This is, again, a fascinating topic, but it’s the next step that Sibyl focuses on and which we’ll describe in this article.
  3. Ultimately, a Data Scientist has produced a useful AI model with strong predictive power; now you need to deploy that model available for use at scale within your production systems. This is what Sibyl is for and what we’ll discuss for the rest of this article.

(What is AI?)

Image for post
Image for post
Examples of AI in everyday use include FaceID and content recommendations. Netflix customizes not only what movies it thinks you should see but also which cover art for a particular movie it thinks will most appeal to the viewer.

The Oxford English dictionary defines Artificial Intelligence (AI) as “the theory and development of computer systems able to perform tasks that normally require human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages”.

Artificial Intelligence is an extremely broad topic encompassing many potential approaches and applications. The one we’re interested in for the purposes of this article is what’s known as Machine Learning, which uses mathematical techniques to analyze a dataset to “learn” from it and construct a model that can be used to make predictions based on other, similar data, without being specifically programmed to do so.

Sibyl Design Goals and Successes

  1. A “model” is the actual Python object that was developed and trained by a Data Scientist and exposes a “predict” method.
  2. A “predictor instance” is the model plus related metadata and artifacts, including the preprocessor function, version/provenance information, and the data on which was trained.
  3. A “predictor” is defined by a particular topic and may be implemented by one or more predictor versions.

So, for example, you may have an “LTV” predictor implemented by two different “predictor instances” which accept different inputs, utilize different algorithms, and were trained on different data.

All right, back to Sibyl. When we started designing Sibyl, we had a few high-level goals in mind:

  1. A Sibyl deployment should support many predictors, and in fact many versions of each predictor.
  2. Clients should be served by the newest version of the predictor by default, but should optionally be able to request a specific version.
  3. Deploying a new predictor and/or predictor instance to production should not require updating or redeploying Sibyl itself, only uploading artifacts (the preprocessor function and the model itself) and metadata.
  4. The model and preprocessor should be deployed in exactly the state used and prepared by the Data Scientist, to avoid duplicating work or introducing bugs.
  5. In the most common case, requesting predictions about an existing entity within the Ro platform (e.g. a member or a treatment plan) the API should be about as fast as high-performance marketing APIs, around 50 milliseconds (0.05 seconds).
  6. When requesting a prediction in that common case, the client should be able to identify the object of interest via an identifier (e.g. a UUID used to identify it within the Ro backend) rather than assembling and calculating the model parameters itself.
  7. The production system should auto-scale to support a wide range of potential workloads without manual intervention.

Why Build In-House?

In part, because we can do so more cost-effectively, especially given the high demand and usage we expect for this system. More importantly, because the off-the-shelf systems wouldn’t meet some of the goals we described above.

For instance, an off-the-shelf system might not support versioning models: if you upload a new version, the old one is no longer available. You can work around this via naming (put the version in the model name), but then clients who do want the new version have to update their code to explicitly request the new version and adapt to any changes in the data required.

Speaking of which, an off-the-shelf system will require you to pass the parameters exactly as the model expects them, which may include tens or even hundreds of values. This makes integration significantly more complicated than if the client can simply pass an identifier. You can again work around this, for example by exposing an API that returns the parameters for a given object. However, this adds some latency (your client is now making two requests, one to get the parameters and one to call the model) and at this point running the model yourself isn’t going to add much complexity. Even if your in-house service only calculates the parameters, you still have to handle scaling, versioning (whether via names or a separate concept), and so on.

What’s Next?

We’ll also talk about how Sibyl fits into a robust Data Science workflow, which is something we’re passionate about — Data Scientists should have supporting automation that makes it easy to maintain a good workflow and automates as many of the bookkeeping, provenance tracking, and deployment tasks as possible.

Ro Data Team Blog

Ro Data Team Blog: data analytics, data engineering, data…

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store