Deploying an MLflow Model in Sagemaker

Jagane Sundar
InfinStor
Published in
4 min readJan 17, 2022

Recently Mlflow’s capabilities have been augmented with the ability to deploy MLflow models to Sagemaker. This is relatively easy, but still has some gotchas that tripped me up. This article is a step-by-step guide to deploying an MLflow model in Sagemaker.

Pre-requisites:

Configure your MLflow environment variables. For example, with our Infinstor free MLflow service, we set the MLFLOW_TRACKING_URI environment variable to infinstor://mlflow.free.infinstor.com/ and we set the experiment ID to a suitable one, for example MLFLOW_EXPERIMENT_ID is set to 2 . If your MLflow service supports authentication, then you must be logged in to the service. For example, the following command accomplishes this in our free MLflow service:

python -m infinstor_mlflow_plugin.login

Further, AWS credentials in ~/.aws/credentials must be set up correctly.

Model

I have a transformers mode that I fine-tuned in MLflow run 7–16420961991120000000062— this is the model that I want to deploy using Sagemaker. First, let’s make sure that this model can be deployed locally. The command that I use to do this is:

mlflow models serve -m runs:/7–16420961991120000000062/model

Here is the response:

mlflow models serve -m runs:/7–16420961991120000000062/model
2022–01–17 09:33:21,209–9792 — botocore.credentials — INFO — Found credentials in shared credentials file: ~/.aws/credentials
2022/01/17 09:33:22 INFO mlflow.models.cli: Selected backend for flavor ‘python_function’
2022/01/17 09:33:54 INFO mlflow.utils.conda: === Creating conda environment mlflow-751ecc3c22368dcf8c4f69f17296ddee1b3a0657 ===
2022/01/17 09:35:13 INFO mlflow.pyfunc.backend: === Running command ‘source /home/jagane/anaconda3/bin/../etc/profile.d/conda.sh && conda activate mlflow-751ecc3c22368dcf8c4f69f17296ddee1b3a0657 1>&2 && gunicorn — timeout=60 -b 127.0.0.1:5000 -w 1 ${GUNICORN_CMD_ARGS} — mlflow.pyfunc.scoring_server.wsgi:app’
[2022–01–17 09:35:13 -0800] [10277] [INFO] Starting gunicorn 20.1.0
[2022–01–17 09:35:13 -0800] [10277] [INFO] Listening at: http://127.0.0.1:5000 (10277)
[2022–01–17 09:35:13 -0800] [10277] [INFO] Using worker: sync
[2022–01–17 09:35:13 -0800] [10284] [INFO] Booting worker with pid: 10284

You can now invoke this local deployment of the model using curl as follows:

$ curl -X POST -H "Content-Type:application/json; format=pandas-split" --data '{"columns":["text"],"data":[["This is lousy weather"], ["This is great weather"]]}' http://127.0.0.1:5000/invocations[{"text": "This is lousy weather", "label": 0.0, "score": 0.6562787294387817}, {"text": "This is great weather", "label": 4.0, "score": 0.508899450302124}

Required AWS Resources

The following three AWS resources are required for deploying an MLflow model into your default VPC using Sagemaker.

  1. Suitable container stored in AWS Elastic Container Repository
  2. S3 bucket
  3. Role for Sagemaker

Required Resource 1: Create a suitable container and push it to AWS ECR

Run the command from the CLI:

mlflow sagemaker build-and-push-container --build --push -c for-sagemaker-deployment

If you now go to the AWS console -> Amazon Container Services -> Repositories, you will see a repository named for-sagemaker-deployment. If you click on the repository, an image with a specific tag will be present. In my case, I was using MLflow 1.22.0, so the tag indicates the same.

Screen Capture of AWS ECS page
Screen Capture of AWS ECS Page with a suitable container listed

Required Resource 2: Create a bucket for Sagemaker to use

Use the AWS Console or the CLI to create a bucket. In this example, I create a bucket called jaganes-sagemaker.

Required Resource 3: Create an Execution Role for Sagemaker to assume

In this step, we are going to create an execution role for Sagemaker to assume. This role gives Sagemaker permissions to access our Sagemaker services, S3, Cloudwatch, and a few other necessary services. Go to the AWS Console -> IAM -> Roles page. Click on ‘Create Role’ and choose AWS Service, Sagemaker in the ‘Select type of trusted entity’ page. Here is a screen capture of the same:

Choose the SageMaker AWS Service as the trusted entity

Next, in the permissions policies page, choose ‘AmazonSageMakerFullAccess’ permissions, as shown below:

Choose AmazonSageMakerFullAccess permissions

Finally, add tags if you need them and provide a name for the role. In this example, I used the name role-for-mlflow-sagemaker-deploy

Deploy Model

With the three resources created above, we are ready to deploy the model. Here is the command line I used for this:

$ mlflow sagemaker deploy --app-name xformer -m runs:/7-16420961991120000000062/model --execution-role-arn arn:aws:iam::612652722220:role/role-for-mlflow-sagemaker-deploy --bucket jaganes-sagemaker --image-url 612652722220.dkr.ecr.us-east-1.amazonaws.com/for-sagemaker-deployment:1.22.0 --region-name us-east-1 --mode create --instance-type ml.m5.large --instance-count 1 --flavor python_function

Parameters used in the above call:

  1. app-name: This name, xformer in my example, is what we will use while invoking the model in the next step
  2. -m: The MLflow run. The format must be runs:/<run-id>/model
  3. execution-role-arn: The ARN of the execution role created earlier
  4. bucket: The bucket that we created earlier
  5. image-URL: The container image created earlier using the command mlflow sagemaker build-and-push-container
  6. region-name: AWS region for the model deployment
  7. mode: we use create here and options such as update and delete are available
  8. instance-type: Must be an AWS ML instance type
  9. instance-count: Number of instances to create
  10. flavor: MLflow model flavor

Invoke

Here is a small python script that I used to invoke this newly deployed model:

import boto3
import json
endpoint=’xformer’client = boto3.client(‘sagemaker-runtime’)
response = client.invoke_endpoint(EndpointName=endpoint,
ContentType=’application/json; format=pandas-split’,
Body=’{“columns”:[“text”],”data”:[[“This is terrible weather”],
[“This is great weather”]]}’)
print(str(json.loads(response[‘Body’].read())))

And the response I got is:

[{‘text’: ‘This is terrible weather’, ‘label’: 0.0, ‘score’: 0.8280780911445618}, {‘text’: ‘This is great weather’, ‘label’: 4.0, ‘score’: 0.5088995099067688}]

That’s all folks!

--

--

Jagane Sundar
InfinStor

Entrepreneur, Technology Enthusiast, Machine Learning student, Cloud Computing expert, Big Data expert, Distributed Coordination expert