Operate Large-volume Machine Learning Models on AWS

Published in

Analytics Vidhya

9 min readFeb 29, 2020

(Modified bodybuilding-gym-with-strong-man photo created by freepik)

This article introduces a model case of an AWS server-less architecture using large-volume machine learning model files (hereinafter called “large-volume model” and “ML model”) and how to build it.

The target audiences are those who have the following hopes.

A hope to operate large-volume models on new services
A hope to operate ML services at low cost
A hope to separate the development areas for ML model developers and users
A hope to begin BERT models on operation quickly anyway

Consider an AWS server-less architecture with large-volume models

Can’t use Lambda

With AWS, If you design a “Web service that uses ML models internally”, what system architecture will you design? Perhaps most people will first come up with a system architecture that uses ML models in a Lambda function.

Fig. 1 - System architecture using Lambda

For example, in the Fig.1 architecture, Lambda that stored ML models is executed from a client such as a web page hosted on S3 via API Gateway. As the similar proposal, an AWS official hands-on material (ja) also introduces the Lambda architecture to use the models deployed by SageMaker.

However, due to a limited capacity of packages deployed to Lambda, large-volume models cannot be included in the package. As an alternative, it is also conceivable to store large-volume models in S3 and load it with Lambda. Unfortunately, Lambda functions have a limited local storage (~512MB), so large-volume models cannot be loaded.

So in brief, Lambda cannot handle large-volume models at this time (probably in the future).

Well then, what AWS system architecture is good for an API that handles large-volume models? There are several architectures, but I recommend using AWS Elastic Beanstalk (EB) and Amazon Elastic File System (EFS).

Use EB and EFS

First, EB is a service that allows you to build and manage EC2 instances that have the necessary settings for server operation (load balancer, Auto Scaling, etc.) according to the deploy package prepared by the user and a few parameters. The deployed package is stored in S3, and we can easily revert to the previous state using those packages. This service is recommended when you want to build and operate a server easily, leaving complex settings to AWS.

Next, EFS is a service that can build and manage storage that can share files on a network, like an NFS server. For example, if you mount a shared file storage built with an EFS file system on an EC2 instance, you can use it just like a local storage on an EC2 instance.

Those are to say, using EB and EFS, you can quickly get an EC2 instance with an external large-capacity local storage.

Fig. 2 - System architecture using EB and EFS

With the architecture shown in Fig. 2 (instead of Fig. 1), large-volume models stored in an EFS shared file storage can be directly called from an API server started by an EC2 instance.

Why use EFS?

This article introduces an AWS system architecture that uses EFS as local storage, but you can also use Amazon Elastic Block Store (EBS) for similar role.

However, EFS allows simultaneous access from multiple EC2 instances, while EBS only allows access from a single instance. When using EBS, it is necessary to deploy models for each instance that uses the models. This operation complicate model management.

In contrast, EFS makes model management easier. Model developers only need to deploy models to a fixed storage, and model users only need to access models in the fixed storage.

Based on the above, considering the actual business operation that accesses the same model from multiple instances, I recommended EFS.

Create an AWS server-less API with a large-volume model

Now, let’s build an API that uses a large-volume model using the EB and EFS architecture introduced in the previous section. This time, we will create an “API that calculates and outputs the similarity between input sentences using a BERT model”.
(This hands-on uses free resources collected online, but probably exceeds the AWS Free Tier. Be careful.)

The steps are as follows:

Prepare a large-volume model
Build API on local
Deploy files except a large-volume model to EC2 instance created by EB
Deploy a large-volume model on EFS shared file storage mounted on EC2 instance created by EB

Step 1. Prepare a large-volume model

Prepare BERT model data. This time we use original BERT model files. Get from the Github link.

https://github.com/google-research/bert

Download one of the models (ex. BERT-Base, Multilingual Cased) and unzip it.

Step 2. Build API on local

Using the downloaded model files, build an API to calculate similarity values of sentences. I prepared sample codes using the example from the original Github repository, so download this as a clone or zip.

https://github.com/Hirosaji/bert-on-eb

The prepared sample has the following directory structure.

Fig. 3 - Sample codes directory structure

On the directory, let’s set up a local server.

First, place the downloaded model files to the directory in ./efs. For now, place each model file as follows.

Fig. 4 - efs directory structure where BERT model files is placed

Once placed, install the package described in ./requirements.txt and run ./application.py to launch the local Flask server.

Next, execute the following curl command to the Flask server.

$ curl -X POST -H "Content-Type: application/json" \  -d '{"target":"JSON", "texts":["text", "format", "data"]}' \  http://127.0.0.1:5000/sim

The above is a command to send {“target”: “JSON”, “texts”:["text", "format", "data"]} to route /sim by POST method.

Executing this command returns the following response from the Flask server.

Fig. 5 - Response to POST request by the curl command

The response includes the target text (context.target) and the similarity values (context.sims) of each text (context.texts) to it. If you receive this response, there is no problem executing code running on the Flask server.

By the way, I avoid detailed explanation because it is different from the purpose of this article, but I explain only the code that shows the general flow of processing from request to response above.

Here is a snippet of ./application.py that controls HTTP requests in Flask.

In the above, the instance object defined in the first half is linked to the lambda function routed an endpoint in the second half.

The instance object in the first half, it is a structure that converts received sentences into embedded expressions (numeric vectors) and calculates similarity values from them. Please note that the [CLS] token was treated as an embedded expression of a whole sentence. And the similarity calculation method is cosine similarity.

The routing in the second half, it is based on Flask’s add_url_rule method. This routing is set to fire the lambda function when it receives a POST request at the endpoint of the route /sim. If you want to learn more about add_url_rule method, see the official documentation.

Step 3. Deploy files except a large-volume model to EC2 instance created by EB

Deploy files except a large-volume model under the efs directory to EC2 instance created by EB.

By the way, before deploying to EC2 instance, change the model paths for explanatory convenience. Please change the paths written in ./ert_script/modeling.py as follows.

At first glance it’s hard to notice, just change relative paths to absolute paths.

So here goes the Step 3 main part, the process up to deploying files is shown in the AWS official developer guide, please refer to that.

https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/create-deploy-python-flask.html#python-flask-deploy

At that time, it is necessary to install EB CLI beforehand. Again, see also the dedicated developer guide for these steps.

Step 4. Deploy a large-volume model on EFS shared file storage mounted on EC2 instance created by EB

Mount an EFS shared file storage for an EC2 instance created by EB, and deploy BERT model files on the efs directory to that storage.

First, follow the official developer guide to create an EFS file system.

After creating an EFS file system, open the modal indicated by the link shown in Fig. 6.
Follow the instructions in this modal to mount a shared file storage on an EC2 instance.

Fig. 6 - Modal link with mounting instructions for EC2 instances (UI as of 2020/2/24)

Just before that, you need to add a rule to the security group to allow inbound traffic to the NFS port (port 2049) so that EC2 can connect to EFS file storage.

Fig. 7 - Security group to assign to EC2 instance

Also, after this, add a rule to ssh connect to EC2 from your local PC. Allow inbound traffic from My IP to SSH port (port 22).
(If you have never created EC2 key pair, please follow the guide before setting up the SSH port.)

Fig. 8 - Security group assigned to EFS mount target

Now, back to the modal guide shown in Fig. 6. The procedure with the supplement is as follows.

1) ssh connection to EC2 instance

→ It is recommended to use eb ssh command to connect easily.

2) Create an empty directory for mounting

→ Execute the sudo mkdir efs command in the EC2 instance connected to ssh, and change the efs dir permission with the chmod 755 efs command.

3) Install EFS mount helper

→ Install by executing sudo yum install -y amazon-efs-utils command.

4) Mount with EFS mount helper

→ Execute the sudo mount -t efs fs-********: /efs command to mount.

After completing the 1)~4) process, execute df -h command with the ssh connected to check if the mount is successful. If you find a large-capacity disk mounted on /efs, you have completed this procedure.

Please note that EC2 instances may be rebooted due to AWS maintenance, etc. If you leave it as it is, you will have to remount it every time you reboot. If you set up automatic mounting as a countermeasure, you will not have to remount. See the official development guide for details.

When you’re done, log out with the exit command.

Finally, deploy a BERT model to the mounted shared file storage. Use the scp command to deploy the entire efs directory containing BERT model files.

$ scp -r \  -i [private key path set by key pair (ex. ~/.ssh/***.pem)] \  . ec2-user@[public IP (ex. xxx.xxx.xxx.xxx)]:/efs

For your information, the private key path and public IP are displayed on the command line immediately after executing the eb ssh command before. Follow the logs and copy them.

That’s All!
It completed “API that calculates and outputs the similarity between input sentences using a BERT model” based on the Fig. 2 architecture.

Execution test

Let’s execute the curl command to the built API server.

$ curl -X POST -H "Content-Type: application/json" \  -d '{"target":"JSON", "texts":["text", "format", "data"]}' \  http://app-name.***.region.elasticbeanstalk.com/sim

(The endpoint URL can be obtained from the EB app management screen.)

After execution, you will get the same response as Fig. 5.

In conclusion

In this article, I showed a server-less architecture running a large-volume model on AWS and a hands-on example of its construction.

For the hands-on, I introduced the procedure for developing “API that calculates and outputs the similarity between input sentences using a BERT model” as a specific example. The hands-on is based on practice.

Of course, there are various an implementation method even if it is not the method shown in the specific example. Try implementing it the your best.

Thank you!