Shipping Serverless AI to Production

Lessons learnt from integrating and deploying Serverless AI models on Google Cloud Platform and Firebase.

Jacob Richter

Published in

ShipLab

8 min readSep 21, 2018

Our Challenge

We recently completed a project to build a data aggregator platform that could, in quasi-realtime, monitor updates in commentary, news, regulation and laws, analyse these updates, determine their relevance, auto-summarise the changes, and categorise them into a bucket of approximately 100 tags.

The sources we needed to aggregate were extremely disparate and diverse. We covered everything from RSS feeds, JSON APIs and .NET generated sites, to single page applications, and of course good old HTML. The list went on, spreading over more than 2,000 distinct domains, and amounting to over 200,000 unique pages.

We’ll write a separate article soon about the fully serverless, fault-free data collection process, as well as the platform architecture needed to support API ingestion and data collection from scraping on such a large scale. We will note now though that without the use of serverless in this type of platform it would have been nearly impossible to achieve the kind of scale we achieved, not only in such a short amount time, and but without even surpassing the Firebase Blaze free-tier limits!

The Current State of Serverless AI…

… will not necessarily be discussed here.

If you’ve opened this article to learn about the wonders of serverless, sorry, but you’ve come to the wrong place. There are already plenty of great articles across the internet discussing what serverless is and why it is awesome. This article will focus specifically on our experiences and lessons learnt from integrating and deploying AI models on Google Cloud ML Engine, integrating them with Google Cloud Functions and, ultimately, serving the analytics back to our Single Page Application (SPA) hosted on Firebase. It also touches on reducing latency by avoiding cold-starts, and offers some tips for keeping costs low on Cloud ML Engine.

Why go with Google? What about Amazon Web Services? Because at the time of writing this article AWS does not offer a dedicated service to build and easily deploy custom AI models on a serverless environment.

Does AWS offer AI capabilities that are easily deployable and serverless? Of course! Services like Amazon Polly, Lex, Comprehend, Rekognition are all serverless and can be integrated with Lambda functions. They don’t, however, allow full flexibility.

Does AWS offer AI capabilities that are easily deployable and fully customisable? Of course! Amazon SageMaker and their preconfigured ML AMIs on EC2 are both examples of this. SageMaker for example allows for one click deployments and serving, through APIs, to VPCs or to Lambda functions. We actually did this a few months ago in a rather interesting integration with Excel for an ML-based business process automation.

Does AWS offer AI capabilities that are easily deployable, fully customisable and serverless ? Unfortunately they do not. This left us in rather a pickle and meant that we had to look elsewhere.

While you can actually construct a solution on AWS Lambda by using TensorFlow.js, there were multiple limitations to doing this, with a pretty major one for us being the maximum size of the functions’ storage. This would also require you to couple your data science team with your back-end team, with neither team able to make changes independently of the other.

And Google does offer this? Yes! Google Cloud has a range of serverless, big data and AI products in their arsenal. As it turned out, the one that we needed was Cloud ML Engine, cited by Google Cloud as being “a managed service that enables developers and data scientists to build and bring superior machine learning models to production”.

Integrating and Deploying the model to Google Cloud ML Engine

Now that we knew what platform we were going to use, we needed to build our models, and then integrate and deploy them to Cloud ML Engine.

For this particular use-case, familiarity and past experience led us to building our models in SciKit. Again, we won’t go into the finer details of building models using SciKit in this article, but if you are keen to know more scikit-learn has some excellent tutorials for getting started. As the models focused on text classification, so we chose sklearn’s pipeline functionality. Each pipeline had two parts: feature extraction and main classifier. Pipelines make things easier by combining all modelling steps into one function.

To integrate and deploy our SciKit models to Cloud ML Engine we were able to reference Google’s documentation, which we found to be quite conclusive. At this point it would also be useful to note that — for testing, at least— gcloud installs can have difficulty in locating Python installs, particularly if you’re using Anacondas. This means you may have difficulties with locally testing via gcloud prior to deployment.

Some additional points on using Google Cloud ML Engine

Aside from it being serverless, we found a massive advantage of Cloud ML Engine was that we could export and deploy the full pipeline from sklearn — there’s no need to implement feature extraction as separate cloud functions.
Because of the previous point, all the data science work could be done end-to-end in python. The data scientist does not have to worry about making sure the result is perfectly replicated by the cloud function, and the backend developer does not have to make sense of how feature extraction works to implement it. Decoupling engineering teams in the first instance is a great way to reduce issues when you begin to scale!
All in all, deploying the model is simple enough for the data scientist to handle simply using the terminal, even one not familiar with cloud practices. There is also the option to deploy without the CLI for those that way inclined, by deploying through the browser in the gcloud console.

Integrating the Google Cloud ML Engine model with our serverless back-end functions

This step was somewhat troublesome, owing partly to the outdated / incomplete documentation by Google, and partly to the seeming lack of attempts by others, be that on Stack Overflow or other forums. Even looking through the full source of the ‘googleapis’ library source yielded no fruit.

As it turns out, the solution required a combination of the googleapis package and the google-auth-library auth package (both found on NPM). Included below is an example class that could do the job. Keep in mind that you’ll need to make edits to this if your inputs are not strings.

import MLEngineModel from 'GoogleMLEngineHelper.ts'const MODEL_NAME = "..." //your model name found in the ML Engine 
const PROJECT_NAME = "..." //your Google Cloud project nameconst MLModel = new MLEngineModel({
    modelName: MODEL_NAME, 
    projectName: PROJECT_NAME
})// create prediction function that can be passed around (for our implementation our predictions were made from strings).const makePrediction = (data:Array<string>) => MLModel.predict(data)

The above can then be used across any cloud function in Google Cloud Functions, however to quickly be able to make requests via front-end it needs to be integrated into a Callable Cloud Function as below.

import * as functions from 'firebase-functions';export const predictViaMLEngine = functions.https.onCall( async (data, context) => {    const { dataToPredict } = data;    const predictionResults = await makePrediction(dataToPredict)    return predictionResults 
          ? { predictionResults, status: "success" } 
          : { status: "failure"}});

In production you should obviously add better error handling and security features to these functions . For now though this should be enough to get you started. Now on to the front-end!

Let’s Ship! Integrating the serverless Google Cloud ML Engine model with our front-end SPA

Instantly ship your serverless AI models into production applications using callable cloud functions (source)

We use Redux in almost all of our applications as it is extremely powerful when your applications and teams begin to scale. The below is a snippet of an example Redux Action Creator (we use Redux Thunk to handle promises) that would quickly integrate the above back-end serverless pipeline into a front-end web application.

import { functions } from './config' // contains firebase initiation code and exports firebase.functions()export const getMLEnginePrediction = ({ dataToPredict }) => {    return {        type: 'GET_ML_ENGINE_PREDICTION',        payload: functions.httpsCallable('predictViaMLEngine')({  
            dataToPredict 
        }).then((res) => res.data)    }}

Reducing latency by avoiding cold-starts

Anyone who has worked with serverless will know how big an impact cold starts have on the user experience.

Want to create an account?
Yeah, that’ll be a five second wait.
Whoops, we lost that conversion.

Unfortunately the cold start on Cloud ML Engine is significantly worse than five seconds. Our measurements showed that our models took between 30 to 60 seconds to warm up, on occasion even surpassing the timeout of the cloud function itself. And unlike AWS Lambda, Google Cloud Functions have no current capability to automatically schedule invocations.

What were we to do…

CRON jobs to the rescue! We use https://cron-job.org for a lot of our CRON automation because it is easy to setup and configure, and is quite reliable. If you are after a much more accurate CRON job we would recommend setting up the smallest possible Google Cloud Engine or Amazon EC2 instance to invoke via URL the function below. Get used to using CRON jobs on serverless backends — they are your friends, and they keep your functions warm and ready to go for the best user experience possible.

import * as functions from 'firebase-functions';export const keepMLEngineWarm = functions.https.onRequest(async (req, res) => {    const testData = [ "some input", "some other input" ]    const results = await prediction(dataToPredict)    res.status(200).send(`So warm right now.`)})

This is an example of a cloud function that can be called from a CRON scheduled job to keep your models warm.

A warning / A few tips to keep the cost low

We unfortunately got stung here, so we feel compelled to help others avoid making the same mistake. Much like other serverless service pricing models on AWS (e.g. Amazon Athena) and Google Cloud , Cloud ML Engine has a ‘minimum’ use time that they will charge foreach use. At 10 minutes per invocation this minimum is actually quite substantial. So while the hourly cost of Cloud ML Engine is low — $0.056 USD per node hour — hitting this limit 200,000 times every 30 minutes works out to be quite expensive if you don’t catch it quickly.

Luckily for us we had some spare cloud credit to absorb our ‘little’ mistake, and instead we now batch them together every 30 minutes so that it is seen as only one invocation.

With that said, even keeping the function warm through CRON job invocations every two minutes works out to be $6.72 per day. So if you have a few models, things are going to add up quite quickly.

Our advice would be:

In events where you require a modest number of invocations use the serverless approach discussed above
In events where you need millions of invocations per day, cannot bulk process (such as realtime fraud / threat detection), and cannot justify the cost above for ease and scalability use either dedicated cloud instances on VPNs or integrate models into the cloud functions themselves (with tensorflow.js).

So what’s next?

That’s it for building, deploying and integrating Google Cloud ML Engine. Stay tuned for another article on how you can leverage serverless to build truly awesome applications for very little 💰💰💰.

As always if you have any questions please let us know by leaving your comments below.

Jacob Richter and Adi Sevak are both part of ShipLab, an Australian Product Lab who scope, develop, maintain and upgrade early stage startups’ Web Applications, and ready them for hyper-scale through scalable serverless technologies. Do you have a non-technical team who needs something built or a very-technical team that has no experience in developing, deploying and maintaining user-facing applications? We’d love to hear from you.