Deploy Stable Diffusion on Amazon SageMaker Endpoint

7 min readNov 2, 2022

Stable Diffusion just crossed 1,000,000 downloads on the Hugging Face hub!
— Clem Delangue, Co-founder & CEO at Hugging Face
28-Oct-2022

In this post, we’d introduce a convenient process to deploy Stable Diffusion Model on Amazon SageMaker Endpoint, for a secure, robust and scalable deployment solution in product environment.

All the code is available on GitHub.

Download Stable Diffusion Model Weights

Prerequisite

Prepare a disk space no less than 60 GB. Consider Amazon Elastic File System (EFS) https://aws.amazon.com/efs/. If you choose to use Amazon SageMaker Studio or Notebook instance, you have access to EFS by default.
Make sure there is sufficient quota of Amazon SageMaker Endpoint instance type of your choice in your AWS Account, e.g. ml.g4dn.xlarge , before starting.
Install git-lfs per description at: https://git-lfs.github.com/
Create a HuggingFace account if haven’t already at: https://huggingface.co/
Navigate to HuggingFace repository model card at: https://huggingface.co/runwayml/stable-diffusion-v1-5 and Agree to Accept the terms and conditions.
(optional) Navigate to HuggingFace repository model card at: https://huggingface.co/runwayml/stable-diffusion-inpainting and Agree to Accept the terms and conditions.
Create a Amazon S3 bucket.

Clone HuggingFace Stable Diffusion Text-to-Image and Image-to-Image Repository

# on Amazon EFS  
mkdir diffusion && cd diffusion 
git lfs install --skip-smudge 
git clone https://huggingface.co/runwayml/stable-diffusion-v1-5 
cd stable-diffusion-v1-5 
git lfs pull 
rm -rf .git 
rm -rf .gitattributes 
cd ..

These command will download the model weight and remove unnecesary repository history to save disk space and network traffic later.

(optional) Clone HuggingFace Stable Diffusion Inpainting Repository

git clone https://huggingface.co/runwayml/stable-diffusion-inpainting 
cd stable-diffusion-inpainting 
git lfs pull 
git lfs install --force 
rm -rf .git 
rm -rf. gitattributes cd ..

Deployment on Amazon SageMaker

Boilerplate

pip install boto3
pip install sagemaker

Assume IAM Role

import sagemaker 
import boto3  
sess = sagemaker.Session() 
# sagemaker session bucket -> used for uploading data, models and logs 
# sagemaker will automatically create this bucket if it not exists 
sagemaker_session_bucket = None 
if sagemaker_session_bucket is None and sess is not None:     
    # set to default bucket if a bucket name is not given  
    sagemaker_session_bucket = sess.default_bucket()
try:
    role = sagemaker.get_execution_role() 
except ValueError:
    iam = boto3.client("iam")
    role = iam.get_role(RoleName="sagemaker_execution_role")["Role"]["Arn"]
sess = sagemaker.Session(default_bucket=sagemaker_session_bucket)  
print(f"sagemaker role arn: {role}")
print(f"sagemaker bucket: {sess.default_bucket()}") 
print(f"sagemaker session region: {sess.boto_region_name}")

Make sure the IAM Role that you are assuming has:

AmazonS3FullAccess
AmazonSageMakerFullAccess

Path to S3

S3_BUCKET = "YOUR_S3_BUCKET_NAME"
MODEL_ID_TEXT2IMAGE = "stable-diffusion-text-to-image" 
MODEL_ID_IMAGE2IMAGE = "stable-diffusion-image-to-image" 
MODEL_ID_INPAINT = "stable-diffusion-inpainting"  
s3_location_t2i = f"s3://{S3_BUCKET}/custom_inference\
/{MODEL_ID_TEXT2IMAGE}/model.tar.gz" 
s3_location_i2i = f"s3://{S3_BUCKET}/custom_inference\
/{MODEL_ID_IMAGE2IMAGE}/model.tar.gz" 
s3_location_inpaint = f"s3://{S3_BUCKET}/custom_inference\
/{MODEL_ID_INPAINT}/model.tar.gz"

Deployment Helper Function

from sagemaker.huggingface.model import HuggingFaceModel

def deploy_huggingface_sagemaker(model_s3_location, role):
    # create Hugging Face Model Class
    huggingface_model = HuggingFaceModel(
        model_data=model_s3_location,  # path to your model & script
        role=role,  # iam role with permissions for Endpoint
        transformers_version="4.17",  # transformers version used
        pytorch_version="1.10",  # pytorch version used
        py_version="py38",  # python version used
    )

    # deploy the endpoint endpoint
    predictor = huggingface_model.deploy(
        initial_instance_count=1, instance_type="ml.g4dn.xlarge"
    )
    return predictor.endpoint_name

Text-to-Image Model

In the cloned Stable Diffusion Repository directory, create a folder code/ and put 2 files:

requirements.txt

numpy==1.23.4
torch==1.12.1
diffusers==0.6.0
transformers==4.23.1
spacy==3.4.2

inference.py

from diffusers import StableDiffusionPipeline
import torch
import base64
import numpy as np


def process_data(data: dict) -> dict:
    return {
        "prompt": [data.pop("prompt", data)] * min(data.pop("number", 2), 5),
        "guidance_scale": data.pop("guidance_scale", 7.5),
        "num_inference_steps": min(data.pop("num_inference_steps", 50), 50),
        "height": 512,
        "width": 512,
    }


def model_fn(model_dir: str):
    t2i_pipe = StableDiffusionPipeline.from_pretrained(
        model_dir,
    )
    if torch.cuda.is_available():
        t2i_pipe = t2i_pipe.to("cuda")

    t2i_pipe.enable_attention_slicing()
    return t2i_pipe


def predict_fn(data: dict, hgf_pipe) -> dict:

    with torch.autocast("cuda"):
        images = hgf_pipe(**process_data(data))["images"]

    # return dictionary, which will be json serializable
    return {
        "images": [
            base64.b64encode(np.array(image).astype(np.uint8)).decode("utf-8")
            for image in images
        ]
    }

Folder structure

.
├── code
│   ├── inference.py
│   └── requirements.txt
├── feature_extractor
│   └── preprocessor_config.json
├── model_index.json
├── model.tar.gz
├── README.md
├── safety_checker
│   ├── config.json
│   └── pytorch_model.bin
├── scheduler
│   └── scheduler_config.json
├── text_encoder
│   ├── config.json
│   └── pytorch_model.bin
├── tokenizer
│   ├── merges.txt
│   ├── special_tokens_map.json
│   ├── tokenizer_config.json
│   └── vocab.json
├── unet
│   ├── config.json
│   └── diffusion_pytorch_model.bin
├── v1-5-pruned.ckpt
├── v1-5-pruned-emaonly.ckpt
├── v1-inference.yaml
└── vae
    ├── config.json
    └── diffusion_pytorch_model.bin

Now compress the repository in itself and upload to S3, via CLI:

cd stable-diffusion-v1-5
tar zcvf model.tar.gz *
aws s3 cp model.tar.gz YOUR_S3_BUCKET_DIRECTORY

Then we can deploy the model via:

deploy_huggingface_sagemaker(YOUR_S3_BUCKET_DIRECTORY, role)

You can check the progress in AWS Console under `SageMaker > Inference > Endpoint`. Record the returned Endpoint name to use for inference later.

Invoke Endpoint

request_body = {
    "prompt": "ancient chinese garden, overlooking full moon, ethereal colors, trending on artstation",
    "number": 3,
    "num_inference_steps": 50,
}# Serialize data for endpoint
payload = json.dumps(request_body)client = boto3.client("sagemaker-runtime")
response = client.invoke_endpoint(
    EndpointName="endpoint name returned in the previous step",
    ContentType="application/json",
    Body=payload,
)
res = response["Body"].read()

Then decode and visualise them:

import matplotlib.pyplot as plt 
import base64 
import numpy as np  
for img_encoded in eval(res)["images"]:
     pred_decoded_byte = base64.decodebytes(
            bytes(img_encoded, encoding="utf-8")
     )
     pred_decoded = np.reshape(
       np.frombuffer(pred_decoded_byte, dtype=np.uint8),
       (512, 512, 3)
     )
     plt.imshow(pred_decoded)
     plt.axis("off")
     plt.show()

Text Prompt: “ancient chinese garden, overlooking full moon, ethereal colors, trending on artstation”. 44.8s

Other Model Capabilities

Image-to-Image Model

Repeat the above steps but replacing the content of inference.py:

from diffusers import StableDiffusionImg2ImgPipeline
import torch
import base64
import numpy as np
from PIL import Imageheight, width = 512, 512def process_data(data: dict) -> dict:
    global height, width
    height = data.pop("height", 512)
    width = data.pop("width", 512)init_image_decoded = np.reshape(
        np.frombuffer(
            base64.decodebytes(bytes(data.pop("init_image"), encoding="utf-8")),
            dtype=np.uint8,
        ),
        (height, width, 3),
    )return {
        "prompt": [data.pop("prompt", data)] * min(data.pop("number", 2), 5),
        "init_image": Image.fromarray(init_image_decoded),
        "strength": data.pop("strength", 0.75),
        "guidance_scale": data.pop("guidance_scale", 7.5),
        "num_inference_steps": min(data.pop("num_inference_steps", 50), 50),
        "height": height,
        "width": width,
    }def model_fn(model_dir: str):
    i2i_pipe = StableDiffusionImg2ImgPipeline.from_pretrained(
        model_dir,
    )
    if torch.cuda.is_available():
        i2i_pipe = i2i_pipe.to("cuda")i2i_pipe.enable_attention_slicing()
    return i2i_pipedef predict_fn(data: dict, hgf_pipe) -> dict:    with torch.autocast("cuda"):
        images = hgf_pipe(**process_data(data))["images"]    # return dictionary, which will be json serializable
    return {
        "images": [
         base64.b64encode(np.array(image).astype(np.uint8)).decode("utf-8")
            for image in images
        ],
        "height": height,
        "width": width,
    }

Invoke it via:

from PIL import Image  init_image = Image.open("sketch-mountains-input.jpg").convert("RGB")
request_body = {
     "prompt": "A fantasy landscape, trending on artstation",
     "init_image":
     base64.b64encode(
         np.array(init_image).astype(np.uint8)).decode(         "utf-8"),
     "height": init_image.size[1],
     "width": init_image.size[0],
     "number": 2,
     "num_inference_steps": 40
}  # Serialize data for endpoint 
payload = json.dumps(request_body)client = boto3.client("sagemaker-runtime") 
response = client.invoke_endpoint(
     EndpointName="endpoint name returned in the previous step",     
     ContentType="application/json",
     Body=payload, )res = response["Body"].read()

Then decode and visualise them:

import matplotlib.pyplot as pltfor img_encoded in eval(res)["images"]:
     pred_decoded_byte = base64.decodebytes(
         bytes(img_encoded, encoding="utf-8")
     )
     pred_decoded = np.reshape(
         np.frombuffer(pred_decoded_byte, dtype=np.uint8),
         (eval(res)["height"], eval(res)["width"], 3),
     )
     plt.imshow(pred_decoded)
     plt.axis("off")
     plt.show()

Text Prompt: “A fantasy landscape, trending on artstation”. 54.7s

Image In-painting

Repeat the above steps in repo: `stable-diffusion-inpainting` and replacing the content of inference.py:

from diffusers import StableDiffusionInpaintPipeline
import torch
import base64
import numpy as np
from PIL import Image

height, width = 512, 512


def process_data(data: dict) -> dict:
    global height, width
    height = data.pop("height", 512)
    width = data.pop("width", 512)

    init_image_decoded = np.reshape(
        np.frombuffer(
            base64.decodebytes(bytes(data.pop("image"), encoding="utf-8")),
            dtype=np.uint8,
        ),
        (height, width, 3),
    )

    mask_image_decoded = np.reshape(
        np.frombuffer(
            base64.decodebytes(bytes(data.pop("mask_image"), encoding="utf-8")),
            dtype=np.uint8,
        ),
        (height, width, 3),
    )

    return {
        "prompt": data.pop("prompt", data),
        "image": Image.fromarray(init_image_decoded),
        "mask_image": Image.fromarray(mask_image_decoded),
        "strength": data.pop("strength", 0.75),
        "guidance_scale": data.pop("guidance_scale", 7.5),
        "num_inference_steps": min(data.pop("num_inference_steps", 50), 50),
        "height": height,
        "width": width,
    }


def model_fn(model_dir: str):
    inp_pipe = StableDiffusionInpaintPipeline.from_pretrained(
        model_dir,
    )
    if torch.cuda.is_available():
        inp_pipe = inp_pipe.to("cuda")

    inp_pipe.enable_attention_slicing()
    return inp_pipe


def predict_fn(data: dict, hgf_pipe) -> dict:

    with torch.autocast("cuda"):
        images = hgf_pipe(**process_data(data))["images"]

    # return dictionary, which will be json serializable
    return {
        "images": [
            base64.b64encode(np.array(image).astype(np.uint8)).decode("utf-8")
            for image in images
        ],
        "height": height,
        "width": width,
    }

Invoke the Endpoint:

from PIL import Imageinit_image = Image.open("overture-creations-5sI6fQgYIuo.png").convert("RGB")
mask_image = Image.open("overture-creations-5sI6fQgYIuo_mask.png" ).convert("RGB")request_body = {
     "prompt": "a cute cat lying on a park bench",
     "image": base64.b64encode(
np.array(init_image).astype(np.uint8)).decode("utf-8"),
     "mask_image": base64.b64encode(
np.array(mask_image).astype(np.uint8)).decode("utf-8"),
     "height": init_image.size[1],
     "width": init_image.size[0],
     "num_inference_steps": 50,
}# Serialize data for endpoint
payload = json.dumps(request_body)client = boto3.client("sagemaker-runtime")
response = client.invoke_endpoint(
     EndpointName="endpoint name returned in the previous step",     
     ContentType="application/json",
     Body=payload, )res = response["Body"].read()

Then decode and visualise them:

import matplotlib.pyplot as plt
import base64
import numpy as npfor img_encoded in eval(res)["images"]:
    pred_decoded_byte = base64.decodebytes(bytes(img_encoded, encoding="utf-8"))
    pred_decoded = np.reshape(
        np.frombuffer(pred_decoded_byte, dtype=np.uint8),
        (eval(res)["height"], eval(res)["width"], 3),
    )    plt.imshow(pred_decoded)
    plt.axis("off")
    plt.show()

Text Prompt: “a cute cat lying on a park bench”, 14.6s

Clean up

Make sure that you delete the following resources to prevent any additional charges:

Amazon SageMaker endpoint.
Amazon SageMaker endpoint configuration.
Amazon SageMaker model.
Amazon S3 buckets.

Deploy Stable Diffusion on Amazon SageMaker Endpoint

Download Stable Diffusion Model Weights

Prerequisite

Clone HuggingFace Stable Diffusion Text-to-Image and Image-to-Image Repository

(optional) Clone HuggingFace Stable Diffusion Inpainting Repository

Deployment on Amazon SageMaker

Boilerplate

Assume IAM Role

Path to S3

Deployment Helper Function

Text-to-Image Model

Invoke Endpoint

Other Model Capabilities

Image-to-Image Model

Image In-painting

Clean up

Reference

Written by Baichuan Sun