How to automatically deploy a ML classifier to the cloud with AWS CDK

Christian Schäfer
Mar 15 · 10 min read
Photo by Andreas Kind:

In this step-by-step guide you will learn how to set up a basic deployment pipeline for a machine learning classifier using Amazon’s cloud development kit (CDK). The pipeline is going to include all stages from continuous code integration up to serving the model in a virtual private cloud.

Let’s get started and think about the ingredients we need for this project:

  1. Repository for our code → GitHub
  2. Classifier → Sklearn model
  3. CI/CD pipeline → AWS CodePipeline
  4. Web framework → FastAPI
  5. Serving platform → AWS Elastic Container Service (ECS) with Fargate

The solutions above are picked out of many alternatives mainly due to their simplicity and ease of extension and scaling. Especially with Fargate it is very straightforward to scale up your service horizontally in case you want to deploy heavier machine learning models with much longer latencies. We use similar infrastructure architectures to deploy our services at Axel Springer Ideas Engineering, one example being our neural speech synthesis used by the WELT website.

Our application is going to be a basic FastAPI server that downloads a trained text classifier from a S3 bucket and serves it via a simple API. Below is a diagram with the deployment infrastructure:

Deployment infrastructure of our production pipeline

Essentially, the deployment happens in two stages:

  1. CodePipeline clones the Github repository, builds a docker image for the app and uploads it to the Elastic Container Registry (ECR).
  2. An Elastic Container Service (ECS) cluster downloads the image from ECR and runs it in a container via Fargate. The cluster runs our app in a Virtual Private Cloud (VPC) and exposes it to the internet via a Load balancer.

Luckily we can use CDK to integrate all the above infrastructure into our project, most of it as plain Python code. For this demonstration I have prepared all the necessary code in this Github Repo. You can simply fork it if you want to try out the deployment. You can use the repo as a simple baseline to build your own advanced deployable machine learning application.

This is the structure of the repository:

There are three main subfolders:

classifier: Contains the logic for training the classifier and applying it for inference.

infrastructure: Contains the whole deployment infrastructure mainly written in Python for CDK. The infrastructure is composed of three Stacks that will spin up a bunch of AWS resources interacting with each other, for instance CodePipeline, Elastic Container instances, S3 Buckets.

templates: Contains the HTML template for FastAPI to render our internet-facing website.

The top file will be our entry-point to run the FastAPI app that loads and serves our classifier.

Step 1: Train a classifier

Let us build a simple Sklearn classifier on the readily available 20-newsgroup data using all the default settings. We use the out-of-the-box tf-idf vectorizer plus logistic regression:


Since we are good engineers, we wrap the model plus classification categories into a more generic TextClassifier class that can easily be saved and loaded:


The classifier outputs the prediction probabilities for each class as a dictionary. Lets run the training!

The output of the training script above should look like this:

Great, the probability distribution looks about what we would expect for ‘May god bless you.’, hence the training went fine. The trained classifier is saved under /tmp/classifier.pkl. It would not take a lot of effort to build a better model, but this will be good enough for our purpose.

Finally, let us store the model in a S3 bucket that will later be accessed by the deployment pipeline. First, we manually create a new bucket on AWS S3 with the name classifier-serving-model-bucket and upload the /tmp/classifier.pkl file to the bucket.

Step 2: Serve the Classifier via FastAPI

To make the classifier available via an API let us write a small FastAPI application that downloads the model from the S3 bucket and serves it to the endpoint /classify:


When the module is being initialised, it downloads the model from the S3 bucket and loading it into memory. Then it spins up a uvicorn server that renders a simple HTML template with a form allowing you to insert text and receive the classification result. Let’s try it out:

Once the server is up, you can go to http://localhost:80/classify where the response should look like this:

This might be not the most pretty interface, but hey it’s classifying text for us!

Step 3: Dockerize the Application

Great, so we tested the application locally, which is probably the most fun part, but now let us think about how we can serve it to an AWS instance. For this purpose we will use the AWS elastic container registry (ECR) that will store our docker image and make it accessible to other AWS services such as elastic container service (ECS). Hence, we dockerize the application using a basic Dockerfile:

Let’s try to build the docker image and run it locally:

In the code snippet above we need to provide our AWS credentials to access the Bucket. The credentials are usually stored at ~/.aws/credentials. You can again go to http://localhost:80/classify to verify that the container is up and exposed. Now our dockerized FastAPI app is ready to be deployed!

Step 4: Set up a Github Connection

Before we can get started building our deployment infrastructure, we need to set up a Github connection in the AWS console to be able to automatically clone our repository. Go to the AWS console and create a new Gihub connection with the name github-classifier-connection. It should look like this:

When creating the connection, the console should forward you to your Github account where you can manage the connection app. Make sure you provide access rights for cloning the ClassifierAWS repo from your fork. You can always go to your Github account to settings/Applications to view and configure the freshly created AWS Connector:

Once the connection is set up (which should take you less than 5 minutes), we save its Amazon Resource Name (ARN) for later as we are simply going to hardcode it into the repo. There are surely more safe and elegant ways of providing it to the deployment pipeline, e.g. using the AWS SecretsManager, but for the sake of simplicity let’s just do it the simple way. The ARN is shown under AWS connections as shown above.

Step 5: Create the CDK Infrastructure

So now we are ready to create the infrastructure code for CDK, which is comprised of three main stacks: cicd_stack, networking_stack, serving_stack.

Each stack is a subclass of aws_cdk.core.Stack and will be instantiated in the main application that serves as an entry point for CDK to synthesize your CloudFormation template. We need to provide some global information such as your AWS account id, the above-mentioned Github connection ARN, and the model bucket name. These go into the in the context file cdk.json below (replace XXX). For simplicity we simply hardcode the values here, which is of course not recommended for security reasons. If you want to build your own app, you could store and retrieve the most sensitive information via the AWS Secrets Manager.


Now let’s quickly go through the stacks and explain what they do, without recreating the full code.

CI/CD Stack

The CI/CD stack will create an AWS CodePipeline with two stages. The first stage clones the Github repository into an S3 bucket and the second stage invokes the docker build command followed by an upload of the resulting image to the AWS Elastic Container Registry (ECR).

CodePipeline of the project on the AWS console

Optionally, one could add a third (deployment) stage that explicitly updates the running service cluster whenever a new image is pushed. However, this would make the CI/CD stack dependent on the serving stack and create a bit of overhead for the initial deployment, hence we omitted it for this example.

All necessary build steps are defined in the file ClassifierAWS/buildespec.yml that looks something like this (abbreviated):

All necessary build steps are defined in the file ClassifierAWS/buildespec.yml that looks something like this (abbreviated):

python: 3.8
- $(aws ecr get-login --no-include-email)
- docker build -t $IMAGE_URI -f Dockerfile .
- docker run --entrypoint="./" $IMAGE_URI
- docker tag $IMAGE_URI $REPOSITORY_URI:latest
- ### optional: run Python tests
- docker push "${REPOSITORY_URI}:latest"
- docker push "$IMAGE_URI"

Note that the target repository URI of the elastic container registry is provided by the environment, which is set in the CI/CD stack of our infrastructure. Do not forget to add the aws ecr get-login line that is required to authenticate the docker push command to the ECR instance. More common buildspec examples can be found at the AWS documentation.

Networking Stack

Our newly created AWS resources live in a so-called Virtual Private Cloud (VPC) that wires them together. We just spin up a standard VPN with default settings and add a gateway for exposing the network to the S3 service for reading and writing to buckets for storage.

Serving Stack

This stack defines the resources that will run our app. It creates a cluster via the AWS Elastic Container Service (ECS) that runs a Docker container with our image from the build stage. The actual computation resources are then allocated via the serverless compute engine Fargate that removes the need for administrating a real server instance and is easy to scale. Defining the cluster and Fargate service in the class ServingStack(core.Stack) is pretty straightforward (see file ClassifierAWS/infrastructure/ Our desired computation capacity goes into the Fargate task definition. We use the minimum resources of 512 mb memory and 256 cpu units:

Then, we simply add the ECR repository with our docker image to the service to the task definition. In CDK code this is done by adding the repository: ecr.Repository variable that was created in the CI/CD stack:

environment = {
'MODEL_BUCKET_NAME': shared_context['model_bucket_name']
app_container = self.task_definition.add_container(

Note that we provide the environment variable MODEL_BUCKET_NAME (=classifier-serving-model-bucket) that will be used by our app on startup to pull the classifier model from the corresponding S3 bucket. Our app will be run by a FargateService that is fronted by an ApplicationLoadBalancer.

Step 6: Synthesize and Deploy!

Assuming that the infrastructure is finished, we can compile the code and list the resulting stacks:

The output should show the ids of your stacks (classifier-cicd-stack
classifier-networking-stack, classifier-serving-stack
). Now we are able to synthesize the CloudFormation template and deploy our Application!

That’s it! Your resources should spin up and run. Go to the AWS console and check whether the respective instances pop up (CodePipeline, ECR, ECS). If you cant find an instance, go to CloudTrail for debugging the formation of your template.

Now let’s try out our classifier online. First, we need to find out the address of the Load Balancer: Go to the service EC2 and to Load balancers, where it should show your single load balancer instance with the info about its DNS:

Simply copy paste the DNS into your browser (dns/classify) and voilà:

Finally, don’t forget to destroy the stacks once you don’t need them anymore:

You may also want to delete the retained ECR repository in the console as CDK cannot automatically delete it as for now: It is always recommended to sporadically check your AWS account for retained resources and clean it up.


Congratulations, you have successfully deployed a classifier to the cloud with a single command! Actually, you did not only deploy it but also create all the necessary resources attached to it including a full code integration pipeline which will run its stages on your code push. You can modify the repository to build your own deployable application, but you will probably need to consider some scaling options and think about how to set up your server for handling a large number of requests. Moreover, if you want to serve very large machine learning models (talking deep learning here), then you probably want a more sophisticated architecture that is able to deal with the model’s long inference latencies, i.e. by using a queueing system and concurrent prediction workers. All of this can be implemented using CDK as well but is probably the topic of another blog post.


Thanks to my team members Stefan Bieler and Tom Rusko who set up the deployment infrastructure for our common AI projects at Axel Springer and helped me to get started on the topic. In the above article I tried to distill some of their knowledge for machine learning engineers seeking to broaden their dev-ops experience.

Axel Springer Tech

Tech and product folks writing about their work across the…