How Will You (Not) Use AWS SageMaker Jobs — Part One

Anastasia Lebedeva
DataSentics
Published in
4 min readMay 3, 2020

We all have to give SageMaker service a credit — it enhances capabilities and releases new features steadily. One of them is SageMaker jobs. They provide a way to process data, train, and evaluate models using algorithms provided by SageMaker or custom ones.

AWS SageMaker always uses Docker containers when running jobs. While the service provides pre-built Docker images for its built-in algorithms, users can utilize custom Docker images to define and provision jobs runtime. The last statement sounds general, so you may wonder: “Does it mean I can do in SageMaker anything I want as long as it is wrapped into a Docker container?” Not exactly. Let us introduce in this blog series some of the features and requirements of SageMaker jobs.

In this post, the first in the series, let’s take a look at batch processing using two different types of SageMaker jobs. In the second post, we deep dive into SageMaker hosting service, which is tightly coupled with Sagemaker jobs. Note that in the series we are focused on the so-called “bring your own Docker” use case.

Batch Processing Using SageMaker Jobs

Batch processing using SageMaker jobs is absolutely possible and is incredibly convenient. The jobs capable of this type of task manage at least three things for you:

  • A virtual machine having Docker installed, so the user only needs to specify the required EC2 instance type in the job definition. The VM lifetime is as short as the job lifetime. Note that in practice even a trivial job will take at least three to four minutes (comparing to ECS forty to fifty seconds).
  • Pulling a Docker image from ECR and launching a Docker container. Note that other registries are not supported yet.
  • Mounting of data between a container and S3. This includes input data as well as the output of the algorithm. Note that other storage services are not supported yet.

Even though there is a dedicated type of job called Batch Transform job for the task, I would suggest taking a look at the so-called Processing job as well. Let’s compare the capabilities and requirements of these two.

Processing Job

The processing job is meant to “analyze data and evaluate machine learning models”. It poses only the following two requirements, which makes it a perfect candidate for the task:

  • The program running in the container should exit with a zero exit code in case of success, and with a non-zero in case of a failure
  • Data have to be mounted under /opt/ml/processing directory within the container (the path is specified in the job definition)

Note that the type supports only single-container applications

Batch Transform Job

According to the official documentation, the type of job is perfectly suitable for the task. It does not emphasize though that there are quite a few requirements on the application:

  • It poses all requirements of the processing job
  • It has to be defined by a model instance, which, in order, is defined by a docker image and optional model artifacts
  • There has to exist an executable script serve which has to be available in the system path or be located in the working directory
  • SageMaker then runs the container using docker run image serve(see the documentation)

The list does not seem extensive, but the requirements restrict the environment and scripts executed within, making the job type much less suitable for a general application.

Comparison

While admitting that processing job is more flexible, let's take a look at some of the batch transform job features, that the processing job misses:

  • It is visible in the SageMaker console, while processing job is not
  • It has more complete SDKs, e.g. processing job SDKs misses a watcher
  • It can be integrated into a Step Function pipeline — feature, processing job type does not have yet

Verdict

Both types of jobs are great for batch-like tasks, for which data and other artifacts are stored within the AWS. Batch transform, however, imposes more requirements on the application. The requirements potentially matter less if you are using other SageMaker capabilities, e.g. training jobs. Apart from that, the job type seems more developed than the processing job and hence more convenient to operate with.

Thank you for reading up to this point. If you find this post interesting, you may also like “How will you (not) use AWS SageMaker Jobs — Part Two: Model hosting using SageMaker” — the next post in the series.

As always, if you have any further questions or suggestions, feel free to leave a comment. Also, if you have a topic in mind that you would like us to cover in future posts, let us know.

--

--