Continuous Integration with AWS Sagemaker and CircleCI
At AdHawk, we use AWS Sagemaker for training and deploying data science models used to send customers tips or “insights” into optimizing their web presence. Company-wide, we strive to follow a continuous integration/continuous delivery model of software development, so it was natural for us to also integrate this rigorous process into our data science team.
The pattern we follow is very similar to the approach illustrated by AWS. The difference is, rather than being completely coupled to AWS cloud-native services, we opted to leverage our existing CircleCI setup. More importantly, we introduced the concept of environments to allow for deploying to and falling back from different versions.
So, without further ado, here are the elements of our data science CI process.
Docker “Golden” Image
We use a single docker image as our build environment for interacting with the Python Sagemaker CLI and for deploying to an nginx server for local testing. This allows us to lock down versions and ensure a common environment for both local testing as well as deployment to AWS.
Handling Multiple Environments and Artifact Versioning
Since Sagemaker is a managed service without any concepts of virtual private clouds (VPCs), regions, or availability zones, we delineate environments based on naming conventions. We currently maintain a Staging and Production environment where two parallel sets of docker images, models, and Sagemaker endpoints reside. Our commits in Github serve as the source of truth for versioning. Thus, we suffix our image and model names with the Git commit hash and the staging endpoint with “-staging.”
CircleCI Deployment Steps
Upon a feature being accepted and merged into master:
- Determine what models were modified in the latest commit
- Bake modified model into docker image and push to Amazon’s Container Registry with the commit tag as part of the name
- Make call to Sagemaker to train the modified model and store training output
- Deploy a web endpoint that serves inferences from the trained model
Promote to Production (upon a master commit being tagged):
- Find the image in ECR with the tagged commit
- Add the tag “latest” to that image
Areas for Improvement
Testing is conspicuously missing from our pipeline and is currently only done locally during the model development process. This is due to the non-deterministic results returned by the training model, which is inherent to inference-making. One workaround for this might be to have a feature where deterministic sampling of data is enabled so tests can be compared against expected results.
Conclusion
With this in pipeline in place, the data science team’s deployment tools and processes are aligned with that of the rest of the engineering team. Moreover, having multiple environments and syncing our deployed assets with version control means data science endpoints can be matched with the version and/or environment of the consuming application. All this allows us to rapidly and consistently iterate and serve our many data science models.
Purdy Picture (Flow Diagram of the whole process):