Train and Deploy the Mighty BERT based NLP models using FastBert and Amazon SageMaker

Kaushal Trivedi
8 min readSep 12, 2019

--

FastBert — The story so far…

In my earlier introduction to FastBert, I described it as a library that will allow developers and data scientists to train and deploy BERT based models for NLP tasks beginning with Text Classification. The scope of BERT (read Transformers) based models have widened a bit since I wrote my earlier blog and includes BERT, XLNet, RoBERTa, DistilBERT and a few more.

I am happy to report that with lots of support from Hugging Face, FastBert now supports all the above mentioned model architectures and with a couple of changes in input parameters, you can try out all the above model architectures on your custom datasets. With the current pace of research in the area of Transformer based models, I expect the model architectures to grow rapidly in coming days/weeks/months and I hope to support all or most of them.

BERT meets Amazon SageMaker

One of the key necessities in training BERT based models is access to GPUs, the more the better. I personally have been fortunate to have access to multiple GPUs in order to experiment with different Transformer architecures and parameters but I am sure it is one of the major issues for the research and developer community. A single GPU AWS p3.2xlarge EC2 instance will cost about $80 a day and a multi-gpu AWS p3.8xlarge EC2 instance will set you back by $320 a day. One has to be incredibly disciplined to switch off the virtual machines when not in use in order not to get a shock bill. Another issue with using a virtual machine approach is that you will limited scope of testing out different hyper-parameters or BERT architectures in parallel as you are limited by the number of GPUs available in each virtual machine.

What about Inference?

Training is just half of the job. Once the model is trained to your satisfaction, you would like to have a simple way to deploy the trained model in a highly scalable, available and secure environment with a REST API endpoint. Developers and data scientists would agree with me that this step is generally ignored by most academic researchers, however in the industry this step is what counts for the most.

Amazon SageMaker

In Amazon’s own words:

Amazon SageMaker provides every developer and data scientist with the ability to build, train, and deploy machine learning models quickly. Amazon SageMaker is a fully-managed service that covers the entire machine learning workflow to label and prepare your data, choose an algorithm, train the model, tune and optimize it for deployment, make predictions, and take action. Your models get to production faster with much less effort and lower cost.

and I must say that I tend to agree for most part.

FastBert includes the support for training BERT models on Amazon SageMaker. With FastBert on SageMaker, you only pay for the time (in seconds) your experiment is actually executing the training loop. Once the training epochs are complete, the training resources are automatically released and your trained model artefacts are securely stored in the S3 bucket of your AWS account, ready to be deployed as a RESTFul endpoint.

In this blog, I will describe how to train and deploy BERT based models using FastBert on Amazon SageMaker.

The AWS components used here are:

EC2 Container Repository (ECR) Image

In order for us to use FastBert with SageMaker we will have to pack together the library, training code and pretrained weights as a Docker image stored in AWS EC2 Container Repository(ECR). We will be using the same image to hold both the training and inference code for FastBert.

S3 Bucket

S3 bucket holds the training and validation data and other config files. The data in S3 bucket can be encrypted using AWS KMS.

S3 bucket also holds the output of the training job which will be the trained model artefacts, log files and tensorboard output.

SageMaker Training Job

To train a model in SageMaker, you will need to create a training job. The training job includes the following:

  1. Reference to S3 bucket training location (input bucket)
  2. Reference to the S3 bucket to store trained model artefacts (output bucket)
  3. Reference to the AWS ECR image that holds our FastBert library and training code

The training job will also be passed the ML compute resources, i.e. the type in instance used for the training job (p3.2xlarge, p3.8xlarge, etc). The compute resoures are managed by SageMaker. The training job also gets the defined model hyperparameters.

After you create the training job, Amazon SageMaker launches the ML compute instances and uses the training code and the training dataset to train the model. It saves the resulting model artifacts and other output in the S3 bucket you specified for that purpose.

SageMaker Endpoint

SageMaker provides the model hosting service to deploy the trained model and provides an HTTPS endpoint to provide inferences. The SageMaker training job creates a trained model that allows us to create a so-called SageMaker model. By creating a model, you tell Amazon SageMaker where it can find the model components. This includes the S3 path where the model artifacts are stored and the Docker registry path for the image that contains the inference code.

When hosting models in production, you can configure the endpoint to elastically scale the deployed ML compute instances. For each production variant, you specify the number of ML compute instances that you want to deploy. When you specify two or more instances, Amazon SageMaker launches them in multiple Availability Zones. This ensures continuous availability. Amazon SageMaker manages deploying the instances.

How does this work?

Prerequisites

  1. Install Docker on your computer.
  2. Create an AWS Account.
  3. Install and configure AWS CLI on your computer.

Create the FastBert ECR image

In order to use BERT based transformer model architectures using fast-bert, we need to provide the custom algorithm code to SageMaker. This is done in the shape of a docker image stored in Amazon Elastic Container Registry (ECR). The image is created using DockerFile contained in the fast-bert repository.

  1. Clone the fast-bert repository on your local machine using git clone https://github.com/kaushaltrivedi/fast-bert.git
  2. Navigate to the container folder of the fast-bert repository.
  3. Run the script build_and_push.sh. On successful execution of the script, you will have a docker image named sagemaker-bert in your AWS account. This script will also prepackage some of the most used pre-trained weights in the docker image. This is particularly useful if you decide to run SageMaker training jobs in a network isolation mode or withing a VPC without any internet gateway. Feel free to update this script for your own purpose.

This docker image can be used to train and deploy any number of models that are supported by the fast-bert library. At this point you can use the AWS Console to create a training job. But I have created a “helper” Jupyter notebook for uploading data and config files to S3 bucket, creating a training job, and then deploying the model as a SageMaker endpoint.

Note that this Sagemaker notebook doesn’t need any GPUs. This can also be executed on your local machine or a low-cost virtualmachine. The training and inference will be delegated to the managed Amazon SageMaker instance.

SageMaker Helper Notebook

Import the necessary libraries.

Setup the paths for your local data locations. The data and label files must already be stored in the DATA_PATH location. We will be creating the training_config.json file shortly.

Hyper-parameters and Training configuration

I have split the parameters required by SageMaker into Hyper-parameters and general configuration parameters. Hyper-parameters are passed directly to SageMaker training job and can be tuned to optimise model.

The general parameters that cannot be tuned by SageMaker are stored in training_config.json and provided to SageMaker through the S3 bucket.

These are the parameters that are either used by data-bunch or learner objects. This particular example is for the multi-label scenario and hence the label_col list is serialised as a string. I hope to improve this in the future. In case of a multi-class text classification, label_col will just be the name for the label column.

As you would notice we also save the training_config object in a file at CONFIG_PATH location.

Upload data and config to S3 bucket

Create an estimator object and start training

At this point SageMaker will create the training instance using the Docker image that you have provided. It will then download the data and config files from S3 bucket to the SageMaker instance and start the training job.

The fit function calls the Amazon SageMaker CreateTrainingJob API to start model training. The function uses configuration you provided to create the estimator and the specified input training data to send the CreatingTrainingJob request to Amazon SageMaker.

You should see the logs similar to the following which keeps you informed on the status of training job. The logs are displayed in the notebook and they are also available in AWS CloudWatch logs for future reference.

2019-08-27 10:15:06 Starting - Starting the training job...
2019-08-27 10:15:08 Starting - Launching requested ML instances......
2019-08-27 10:16:08 Starting - Preparing the instances for training...
2019-08-27 10:17:05 Downloading - Downloading input data...
2019-08-27 10:17:11 Training - Downloading the training image............
2019-08-27 10:19:19 Training - Training image download completed. Training in progress.

You can also see the training job details in AWS console.

Once the training job is complete, the trained model and all the accompanying files such as config file, tokenizer vocabulary and labels.csv are zipped and stored in the S3 bucket specified in the estimator object’s output_path parameter.

You can call the deploy() method to host the model using the Amazon SageMaker hosting services.

Voila!!! You now have an active model endpoint that you can invoke to get real-time inference. You can use AWS SDK for all the major supported platforms and call the InvokeEndpoint API to get the inference.

As you see from the example above, we have used different types of instances for training and hosting. For training we use an instance with multiple GPUs. However for hosting the model to get the inference, you can use a cheaper instance such as m5.large which is optimised for general compute but does not contain any expensive GPUs.

The complete notebook is available in the fast-bert github repo at:

Conclusion and next steps

Hopefully this story will help you leverage the power of Amazon SageMaker to train and deploy BERT based models on your own data using the fast-bert library.

Amazon SageMaker abstracts away the complexities related to maintaining secure and expensive GPU-powered virtual machines for training phase and also simplifies the process of deploying the model to production.

You will be able to customise most of the fast-bert parameters through the use of hyper-parameters and training config file and at the same time build sophisticated training and hosting production workflows.

Some of the next steps would be to use additional SageMaker features such as hyper-parameter tuning, elastic inference, batch inference and more.

I would love to hear your suggestions on further improvements and also welcome your code contribution to the fast-bert github repo.

References

--

--

Kaushal Trivedi

Chief Architect & Technologist, AI & Machine Learning, Co-founder at utterworks