Published in


Keras in the cloud with Amazon SageMaker

Amazon SageMaker is a cloud service providing the ability to build, train and deploy Machine Learning models. It aims to simplify the way developers and data scientists use Machine Learning by covering the entire workflow from creation to deployment, including tuning and optimization.

Amazon SageMaker provides many tools for Machine Learning, including the implementation of well-known ML algorithms (K-Means, RandomTree, …). But here we are talking about Keras and our ability to create our own Deep Learning models. SageMaker implements a wrapper of TensorFlow which enables training, building, deployment and monitoring of TensorFlow models. What matters is the high-end API of TensorFlow brought by Keras, which means Keras models are TensorFlow models. Thanks to that, we can use SageMaker with Keras and enjoy the bonus implementations on TensorFlow done by Amazon.

The objective of this article is to show how to create and train a simple Keras model of a Convolutional Network able to classify an image between two classes: either a cat or a dog.

Before we start, some useful details and links:

We are going to process everything locally, and use the SageMaker’s Python SDK to send our work to Amazon. Amazon SageMaker also gives access to “Notebook Instances” which basically are Jupyter Notebooks.

SageMaker application “lifecycle”

What is going to be exposed here could be executed in one of these Notebooks. I just personally prefer using it from a local python script. Amazon S3 is a storage service which will store data concerning model and datasets.

There are basically four “classical” steps: model creation, data gathering, training and deployment.

A Github repository is given at the end of this article, containing the code summary of what is being said here.

The entry script

SageMaker needs a script which contains four functions, one for each main task:

  • Training and evaluation input (providing the training data)
  • Serving input (the placeholder for the input data)

Let’s start with the model. The function must be named keras_model_fn and takes as arguments the hyperparameters (which you will have to provide later). It will return the compiled Keras model. It is the classical way to create a Keras model.

SageMaker will call this function to instantiate your model. When it will start to train, it will need your input data: the images and their labels. To do so, it will call two functions named train_input_fn and eval_input_fn that take the training data directory and the hyperparameters as arguments. In these functions you are just asked to give images and labels in batch. The data directories are the ones that you will yourself specify and will be handled later in this article.

The serving_input_fn function is just an export of how is the model supposed to be provided with data: here a tensorflow placeholder.

Be careful, for train_input_fn and eval_input_fn as well, the first argument must be named “training_dir”.

Because the task in itself is the same in both cases, let’s wrap the content of these functions into one.

To make things easier, we use the ImageDataGenerator tool provided by Keras. With a data source, it creates a generator of new images that are modified and enhanced to make a richer dataset. For instance, we can zoom, flip, shear, crop, … our images, with a random factor (please see the documentation for more details:

We configure our generator as taking its data from a directory (method flow_from_directory), the given directory by SageMaker. It is actually an Amazon S3 Bucket where images are stored into as many directories as main tasks (training, testing) and inside each task directory, images are sorted using one directory per class (cat, dog). For our use case, this means the structure of our bucket will roughly be described as the image below:

Structure of our dataset, in the S3 Bucket.

This way the ImageDataGenerator will be able to understand how is organized your data and classes, and will automatically generate matching (image, label) tuples. Doing this way, we have nothing to do but to format the data folder.

The ImageDataGenerator can also take data from .pickle files or pandas dataframes (formats like csv, xlsx, …).

All that remains now is to launch the training phase. But before that, we will show quickly how to upload the data to Amazon S3.

The Data

Amazon S3 is the storage service of Amazon Web Services. Datas are organized into root folders called “buckets”. Once they are uploaded, they can be used by any other AWS service, and in our case, SageMaker.

We are using a dataset of cats and dogs from a Microsoft sponsored Kaggle competition that took place in 2013. You can download it there: With the dataset’s zipfile, we are writing a python script to load, format and export it to Amazon S3. The dataset is made of approximately 25000 images. Because we do not need so many images and because we are uploading them one by one (as required by the ImageDataGenerator flow_from_directory), we are going to use only 1000 images per class for training, and 200 for evaluation.

Here we just unzipped and structured a bit our dataset, ready to be uploaded to Amazon S3. We now have to create a bucket, and send everything on it. We are using AWS’s CLI. It is also possible from the Python API, but you will have to upload everything file by file which may be a bit boring. Hopefully, the CLI provides recursive option to its upload function for directories which make things easier.

Now the bucket contains the dataset and everything is ready for training !

The training

Amazon implemented through SageMaker a wrapper of TensorFlow, aiming to simplify the training and evaluation. The wrapper will start an Estimator (the main AWS object to start a learning sequence on SageMaker, no matter the framework) specialized for TensorFlow and then launch the entire process of a new training job. There will be required the entry script we created earlier (the merge of the 3 first gists of this article: find it here). Also, will be needed the hyperparameters, and some configuration details like the Amazon’s role — on which you will enable full access to Amazon SageMaker. Most of the time, this SageMaker role has been created altogether with the Notebook in AWS’s console. We are not using a Notebook here, so you might need to create a role yourself. It is this role which is required here.

It is now possible to see in the SageMaker’s console in the tab “Training Jobs” the job. Because nothing was indicated concerning temporary datas and outputs, SageMaker will create a new “temporary” S3 Bucket where it will store checkpoints during training, and export model and weights once finished. The name of the bucket will be printed in the console output (in our case, local console output), and you will be able to see and access it in the S3’s console.

Once the training is done, it will be automatically registered in AWS SageMaker’s done jobs and will result in an accessible model under the “inference” tab in SageMaker. Now that the model is prepared and trained, the last remaining step is to serve it to use for predictions.

The deployment

The final step is pretty straightforward. We just need to tell the estimator to deploy the model. We store the result of the method “deploy” which is an object that handles the reference to the endpoint.

Now is stored in predictor an object able to access the endpoint and to send requests. Amazon advises to use the client’s SDK (in Javascript, Python, …) because it makes things really easier when it comes to send and receive data from endpoints.

Now we test our endpoint by sending a simple cat image from our database (one that was not from the training).

Such magnificence

The results are stored into a JSON, in the last layer’s name. Because we did not give a special name it has been generated as “dense_2”. The “float_val” field gives the result which indicates that the image we gave is a cat at 100% accuracy (labels and classes associations were generated by the ImageDataGenerator which takes folders in sorted order. So first is “cat”, second is “dog”).

When the endpoint is not needed anymore, it can be removed manually using SageMaker’s web interface (e.g. clicking on buttons), or with Python by using the method “delete_endpoint” with the endpoint in argument.


We saw how to create, train and deploy a Keras model in the cloud with Amazon SageMaker. Here we used a “classical” model whose main objectives were to do a simple inference on an input. If you want to create a more specific model or train it in a certain way, you will have to use a “Script mode” training script, which requires some more diving into SageMaker’s documentation and examples.

The sources can be found at the following Github repository :

The Github repository is approximately made of the same content but organized into individual tasks and files for a better use.


This article is an enhanced “summary” with feedback and bonus “how-to” from the official amazon examples repository.

Liked what you read? Please click the 👏 below so others can find it!

Click here to have a look at what we do at BeTomorrow.



We'll be sharing and discussing code, technologies, new trends, dos and don'ts

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store