Building reliable machine learning pipelines with AWS Sagemaker and

Cecelia Shao
Jul 31 · 7 min read

This tutorial is Part II of a series. See Part I here.

Successfully executing machine learning at scale involves building reliable feedback loops around your models. As your pipeline grows, you will reach a point where your data can no longer fit in memory on a single machine, and your training processes will have to run in a distributed way. Regular retraining, and hyper-parameter optimization of models, will become necessary as new data becomes available and your underlying feature distributions change.

There is also the added complexity of trying new modeling approaches on your data and communicating the results across teams and other stakeholders.

In this blog post, we will illustrate how to use AWS Sagemaker and to simplify this process of monitoring and improving your training pipeline.

Data complexity + model needs are growing

The challenges involved in creating functional feedback loops extend beyond data concerns like tracking changing data distributions to issues at the model level. Robust models require frequent retraining so that hyperparameters are optimal in the face of new, incoming data.

With each iteration, it becomes harder to manage subsets and variations of your data and models. Keeping track of which model iteration ran on which dataset is key to reproducibility.

Take that complexity + multiply it by team complexity

Managing these complex pipelines can become even more difficult in team settings. Data scientists will often store results from several models in log files, making it impossible to reproduce models or communicate results effectively.

Traditional software development tools are not optimized for the iterative nature of machine learning or the scale of data machine learning requires. The lack of tools and processes to make collaboration easy across members of the same data science teams (and across functions like engineering) has led to dramatically slower iteration cycles. Organizations also constantly suffer the pain of slow on-boarding times for new employees and bear the risk of employees churning along with their work and proprietary knowledge.


This tutorial covers how to integrate with AWS Sagemaker’s Tensorflow Estimator API. We will be adapting running the Resnet model on the CIFAR10 dataset with Tensorflow.

Instead of logging experiment metrics to Tensorboard, we’re going to log them to

This allows us to keep track of various hyperparameter configurations, metrics, and code across different training runs.

  • AWS SageMaker provides convenient and reliable infrastructure to train and deploy machine learning models.
  • automatically tracks and monitors Machine Learning experiments and models.

You can check out here and learn more about AWS Sagemaker here.

Environment Setup

When using AWS Sagemaker, your account comes with multiple pre-installed virtual environments that contain Jupyter kernels and popular python packages such as scikit, Pandas, NumPy, TensorFlow, and MXNet.

  1. Create a Sagemaker account. You’ll be guided through all the steps for getting your credentials to submit a job to the training cluster here.

2. Set up your account here. Once you login, we’ll take you to the default project where you’ll see the Quickstart Guide that provides your Project API Key.

3. Create a Sagemaker notebook instance, and start a new terminal from this instance.

4. Open up a new Terminal instance (using the New dropdown). Using the command line, activate the tensorflow_p36 virtual environment

$ source activate tensorflow_p36

Using the terminal, clone the Sagemaker example from into your Sagemaker instance

$ git clone && cd comet-sagemaker && pip install -r requirements.txt

Integrating Sagemaker Estimator API with

The Sagemaker Estimator API uses a Tensorflow (TF) EstimatorSpec to create a model, and train it on data stored in AWS S3. The EstimatorSpec is a collection of operations that define the model training process (the model architecture, how the data is preprocessed, what metrics are tracked, etc).

  1. In order to get Comet to work with the TF EstimatorSpec, we will have to create a training hook that wraps our Comet Experiment object in a TF SessionRunHook object. We will adapt code from the tf.train.LoggingTensorHook to log metrics to Comet. You can find a reference implementation here.

Initializing your Project

  1. Add your API key, project name, and workspace name to the Experiment object in CometSessionHook . You can find the Experiment Object here
  2. We initialize the CometSessionHook object in the model_fn function within the file. Themodel_fn function defines the tensors and parameters we would like to track.
  3. The hyperparameters for this experiment are set in, but feel free to adjust these default values. The hyperparameters for this model include: learning rate, batch size, momentum, num_data_batches, weight decay.
  4. Run python or ! python if running from within a notebook.
  5. You will receive a message in the Sagemaker Notebook output that your experiment is live in with a direct url to the experiment.
  6. You change the parameters in the file and rerun the script.

Monitoring experiments in

  1. Once you run the script, you’ll be able to see your different model runs in through the direct url. As an example for this tutorial, we have created a Comet project that you can view at
  2. Let’s see how we can use Comet to get a better understanding of our model.
  • First let’s sort our model based on accuracy by clicking on the accuracy column in our experiments table. These columns are populated based on the hyperparameters provided by the user, or automatically inferred from command line arguments in the code. They can also be customized to track the relevant parameters, and metrics for your model using the Customize Columns option.
  • Now let’s add a few visualizations to our project so that we can see how are hyperparameters are affecting our experiment.
  • Then let’s use a line chart to visualize our learning curves, a bar chart to track our final accuracy scores across experiments, and a parallel coordinates chart to visualize which parts of our hyperparameter space are producing the best results.
  • We can filter our experiments using Comet’s query builder, so that we only analyze the models that meet our threshold criteria. For example, perhaps you only want to see experiments above a certain accuracy rate.
  • We can see that our charts all change based on the filters applied to the experiments.
  • Our parallel coordinates chart is able to show us parts of the parameter space that we have explored, as well as where we have gaps. In this example, we see that a larger Resnet Size, and Batch Size produces a more accurate model.

However, this best performing experiment was also allowed to run much longer than the others. In order to get a better estimate of the model performance, we can exclude this from our charts by hiding it in the experiment table

  • Our charts table updates to reflect our Experiment Table, and we can now see that given a similar training time, a larger Resnet Size does not necessarily produce the best results.

We can use these insights to continue iterating our model design, until we have satisfied our requirements. allows you to create visualizations like bar charts and line plots to track your experiments with along with parallel coordinate charts. These experiment-level and project-level visualizations help you quickly identify your best-performing models and understand your parameter space.

If you’d like to share your results publicly, you can generate a link through the Project’s Share button. Alternatively, you can also directly share experiments with collaborators by adding people as collaborators in your Workspace Settings.

Tutorial Summary

You have learned how to prepare and train a Resnet model using and AWS Sagemaker’s Tensorflow Estimator API. To summarize the tutorial highlights, we:

  • Trained a Resnet model on the CIFAR-10 dataset in an Amazon Sagemaker notebook instance using one of Sagemaker’s pre-installed virtual environments
  • Explored a different iteration of the training experiment where we varied the hyperparameters
  • Used to automatically capture our model’s various hyperparameter configurations, metrics, and code across different training runs.

Sign-up for a free trial of here.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade