Machine Learning Integration in Under 5 Minutes
Over the course of the last year, Machine Learning at Ibotta has been going through rapid change. What used to be a primarily batch flow of getting predictions into our app is transitioning to real-time. Real-time Machine Learning allows for our models to quickly change the content and behavior of our app. When someone displays a unique set of actions while interacting with our app, our models can be quick to change their user experience.
In addition, changes made to our models are readily available in the app, allowing data scientists and developers to quickly iterate on ideas without needing to worry about a change in the integration point between Machine Learning and our platform.
The integration point between Machine Learning and platform has long been a complex and frustrating ecosystem. Once a data scientist has developed a good model, they will spend a majority of their time building a data pipeline (typically with technologies such as Spark and Airflow) to batch up large amounts of data, run it through their model, and then dump the predictions somewhere else. If that data will then need to be surfaced in an application, another data pipeline will need to pick up that data and move it into a low latency data store such as DynamoDB.
When one of these pieces fails, it can cause nightmares for a team. If a batch prediction job fails, it can cause degraded performance or even full outages in an application. Furthermore, cascading failures can occur if other data pipelines are dependent on that data.
In addition, these pipelines can be unnecessarily expensive. For one, only a small portion of the generated data is likely to be used. This means that a large amount of compute and storage resources went into creating data that may never be accessed. Moreover, monitoring and maintaining these pipelines can eat up a lot of developer time, which is likely more expensive than the compute and storage resources used.
These issues can cause a large delays in getting Machine Learning applications into production and in some cases cause these applications to never see the light of day. These fragile Spark jobs, Dynamo tables, and Airflow DAGs sit in the way of a basic use case:
Data scientists generate useful data and developers would like to use that data.
SageMaker
Enter the world of SageMaker, a fully managed AWS Machine Learning service that allows data scientists and developers to go from idea to production in no time. SageMaker allows data scientists and developers to focus on what is important to them while leaving the complex world of service and infrastructure management to AWS.
Some benefits of SageMaker include:
- The ability to wrangle data, train a model, and deploy that model for both real-time and batch predictions, all while being fully managed by AWS.
- Fully managed endpoints for making real-time predictions that serve as an integration point between Machine Learning and an application.
- Ability to use built-in or “bring your own” algorithms to train a model while having SageMaker manage things like hyperparameter tuning, distributed training, and optimization during training.
- Data scientists can use basic Python code to deploy their models to a real-time endpoint or for use in batch predictions.
- Developers can use a model for real-time predictions through a simple REST API.
We at Ibotta have only just started using SageMaker and have already seen vast improvements in developer time, deployment frustrations, and integration pain points.
Creating a Real-Time Machine Learning Endpoint in 5 Minutes
Let’s run through a quick example of how we might use SageMaker to quickly create a REST endpoint that will serve real-time predictions.
Say we want to create a model that predicts whether or not a given search term is referring to a retailer or a product. For example, when someone types “Walmart” we want to know they are referring to a retailer, while when they type “Apple iPhone” they are referring to a product.
This use case is almost impossible to solve in a batch flow as we can not feasibly enumerate all of the search terms users may type. We could potentially batch predict the top 10,000 search terms and have the predictions saved in a low latency database; however, it would serve much better value to have predictions for all search terms, especially when it comes to terms the model has never seen before.
A much more scalable solution would be to serve predictions in real-time. This allows our model to react to text it has not yet seen before while also leaving out the complex world of finicky data pipelines.
Let’s first define a data contract for the API call between the application and our real-time endpoint:
Once we have this data contract defined, we can quickly build a production endpoint that allows platform to start testing their code while data scientists start work on the model.
Start Timer
A yeoman generator that we have built in-house allows a data scientist to create the base directory structure for a new SageMaker container with only one command:
Boilerplate code that has been refined and abstracted out by data scientists at Ibotta (such as a Dockerfile, server, build scripts, and local testing scripts) are generated and placed inside a new directory. This allows a data scientist to ignore complicated environment configuration, testing frameworks, and deployment processes so that they can focus on important business logic.
We can now implement the data contract defined above. For now, we can focus on returning dummy data that will conform to the contract.
So far, all this service does is parse the incoming request, verify its format, and return a static list of predictions. While we are not yet making real-time predictions, we have implemented the defined data contract, which allows platform to start using our endpoint.
We are now ready to deploy our service. A script in the generated directory builds and pushes the Docker container to an ECR repository. This allows SageMaker to pull the container.
A few lines of Python code deploys the container to a production ready endpoint.
Stop Timer
We now have a production ready endpoint serving requests that adhere to the data contract we defined. Developers can use this endpoint to test their implementations while data scientists can focus on building a performant model.
Once the model has been built, we can rip out the dummy code and start serving actual predictions. Whenever a data scientist makes a change to the service only two steps are necessary to see changes in production: triggering a build and redeploying. Developers will not notice any change on their end as the service is still adhering to the defined data contract.
Furthermore, SageMaker will handle autoscaling of the endpoint out of the box. When a container starts to fail or when there is an increase in traffic, SageMaker will spin up more containers. When traffic slows and containers are no longer needed, SageMaker will automatically spin down unnecessary containers. Data scientists do not need to delve into complex technologies such as Kubernetes and Harness to host their REST APIs.
This offers a ton of advantages over the old paradigm. The biggest advantage is the reduction of development time when it comes to unnecessary infrastructure (ie Airflow and Spark jobs). Once the service is up and running, data scientists can focus on building their model, which should be their primary focus.
This also allows data scientists and platform engineers to work in parallel. Platform engineers do not need to wait for a data scientist to produce a full scale data product as they already have an endpoint to work with. Changes can be made on either end without affecting the other.
Conclusion
SageMaker’s unique features allow a team to deploy a production ready Machine Learning endpoint in minimal time. This enables both data scientists and developers to focus on where their skillsets shine, while not having to worry about building and maintaining complex and fragile data pipelines.
These advantages allow for a much more agile approach to developing Machine Learning applications. Since developers and data scientists can work in parallel without stepping on each other’s toes, new ideas can be quickly implemented, deployed, tested, and iterated on. Gone are the days of six month development cycles!
If these kinds of projects and challenges sound interesting to you, Ibotta is hiring! Check out our jobs page for more information.