Model Services: for the cloud, from the cloud (Or: Cloud9 + Lambda + Gateway = Data in use)

Scott Breudecheck
Sep 7, 2018 · 6 min read

Data science is only useful when it is actionable. If no one reads an analyst’s report or a recommendation engine sits untouched on an engineer’s laptop, all that was achieved was a high-cost training exercise. At a startup lacking the defined specialists teams and/or established processes, I’ve found that the responsibility to realize impact from data science lies first with myself, the data scientist. So in that light, I have been eager to see how AWS’s Lambda could be used to bring data projects into the real world.

Motivation. Or: keep going till you’re no longer blocked

A couple months ago I came across a fantastic tutorial on using AWS Lambda to create a callable model. Seriously, it’s worth a read.

But…I couldn’t get it to work. Sure, I could get the ‘hello, world’ version. But I couldn’t deploy the model as an API. Something was wrong with my environment.

Error message #291 I came across

Every change required zipping a folder, uploading to s3, and repointing the lambda to pick it up. It would work fine locally, in new environments, on a brand-new ec2, hell I even bit the bullet and learned docker to get a perfect environment, but every zip -> s3 upload -> rebuild Lambda resulted in an error, rarely the same. I was stuck in a frustrating loop that required spending more time clicking through multiple services than actually debugging.

Just when I had given up, I heard about AWS Cloud 9. A cloud-based IDE with more features than I’m aware of, my main draw was it could be used to build and test local versions of lambda.

Basic tutorial. Or: “hello, world”

Let’s start with a simple example.

Here’s what we’ll be doing

  1. Create a Cloud9 environment

Let’s get to it

Creating an environment from cloud9 console

Navigate to “Cloud9” in console, and click “Create environment”. Create an environment named “anotherLambdaTest”, leave all the settings on default.

Hello, cloud9

On the right tab, go “AWS Resource” > “Local Functions” > ƛ⁺(“Create a new Lambda function”). Let’s call the function “simpleTutorial” in the application “anotherLambdaTest”. And set it up as empty-python, with no trigger.

This should pop up lambda_function.py in the IDE with some barebones function. At this point, we just copy over the simple tutorial script:

def lambda_handler(event, context):
print(event)
result = 'Hello from ' + event['queryStringParameters']['msg']
return { "body": result }

Save it! There’s no autosave!

Ok, so now let’s test it. No need to zip and upload and deploy. Just…click test.

Wow that was painless

A payload window pops up to create a custom event. It runs from the IDE, and you get results and any print messages.

Ready to make this an API? Click “Deploy the selected Lambda function”.

et voila!, your lambda function exists.

Head over to the console for Lambda, select your newly deployed lambda (it’ll be prefixed “cloud9…” and have last been modified <1 minute ago). From here, it’s pretty much the same: Add an API Gateway trigger and configure it (create new API, make it open, and save the lambda). Then throw a test event against it via the API Gateway test

Naming things has never been AWS’s strong suit

Finally, hit the actual api by first deploying (“Actions” > “Deploy” > “default”) to generate your API url. You can copy that url, append your Lambda’s name and append the url params. In my case https://atkgc42ggg.execute-api.us-west-2.amazonaws.com/default/cloud9-anotherLambdaTest-simpleTutorial-X4VIU30USH5C?msg=TheWeb

Deploying a model. Or: adding data “science”

Here are the steps (this should look familiar)

  1. Create a model in Cloud9

Let’s get to it

Instead of building a new environment from scratch, we can just add on.

Back in Cloud9, add another Lambda function and application. Let’s call both “logitTutorial”.

Our model will need some additional modules. This is not quite as straight forward as might be liked, but relative to the normal virtualenv, pip, zip, s3, deploy, it’s a cake walk. In the application folder, open a terminal and navigate to the function’s folder ( ~/environment/logitTutorial:).

In this newly opened terminal you can import by specifying the target.

there’s a space and double hyphen before “target”. Medium auto-formating has issues.

python3 -m pip install — target=./ numpy pandas scipy sklearn

Now, we need to create the model file (‘logit.pkl’). To do that, I’m just going to create a new file in the environment (anotherLambdaTest/logitTutorial/build_logit_pkl.py), copy over the code Ben Weber provided in his tutorial, and run that from Cloud9’s terminal. This will drop logit.pkl right in the same folder.

import pandas as pd
from sklearn.externals import joblib
from sklearn.linear_model import LogisticRegression
df = pd.read_csv(
"https://github.com/bgweber/Twitch/raw/master/Recommendations/games-expand.csv")
y_train = df['label']
x_train = df.drop(['label'], axis=1)
model = LogisticRegression()
model.fit(x_train, y_train)
joblib.dump(model, 'logit.pkl')

The last python-ic step is to update lambda_function.pywith code to call the model (again, see Ben Weber’s tutorial for full code).

from sklearn.externals import joblib
import pandas as pd
model = joblib.load('logit.pkl')def lambda_handler(event, context):
p = event['queryStringParameters']
print("Event params: " + str(p))
x = pd.DataFrame.from_dict(p, orient='index').transpose()
pred = model.predict_proba(x)[0][1]
result = 'Prediction ' + str(pred)
return { "body": result }

Then we’re ready for testing. Note: unlike the tutorial you must use double quotes when testing here.

From here, we can deploy as before, then head over to Lambda console to set up this new Lambda with its API Gateway just like before.

https://kd32u9hkzg.execute-api.us-west-2.amazonaws.com/default/cloud9-logitTutorial-logitTutorial-JW9VGD8H7A4W?G1=1&G2=1&G3=1&G4=1&G5=1&G6=1&G7=1&G8=1&G9=1&G10=1

==> “Prediction 0.39762327867823355”

Conclusion. Or: let engineers work on engineering

While not entirely without frustrations, cloud-based IDE development and employment is a powerful combination. Far less time spent on setting up deploys + a tighter turnaround to testing = more time on actual data science.

Combining this with other AWS services: eg AWS Machine Learning and SageMaker, make it manageable to create an endpoint that bridges a production app (built by engineers/product) and predictive analytics (built by data team).

With no data engineers and a swamped product engineering team, being able to self-serve our work product to the rest of the team opens up huge opportunities!

To that end, we are now working on abstracting the data science features from our product. Instead of Engineers coding up Ruby functions based on our models, they can simply call a versioned endpoint. Any changes to the model can simply be swapped in behind the endpoint, with our air-traffic-control lambda function re-pointing to the latest model and feeding it with the latest relevant features. This frees up our engineers to focus on design and app infra work, while reducing our wait time for new models to be adopted.

It also allows wins from one project to be immediately available for research on another project. At Snapdocs we modernize mortgage closings, and recently we built automatic document classification to make finding e-signable pages fast. Using this approach, we could expand that application to finding Closing Disclosures and the mortgage details therein, surfacing relevant details for our customers and their clients even when doing a traditional signing. Improvements in the underlying model immediately flow to both projects with no major code changes required!

Snapdocs Product & Engineering Blog

Powering Homeownership

Scott Breudecheck

Written by

I like data. Some people pay me to do it all day.

Snapdocs Product & Engineering Blog

Powering Homeownership

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade