Deploy Stable Diffusion on Amazon SageMaker Endpoint

Baichuan Sun
7 min readNov 2, 2022

--

Stable Diffusion just crossed 1,000,000 downloads on the Hugging Face hub!

— Clem Delangue, Co-founder & CEO at Hugging Face

28-Oct-2022

In this post, we’d introduce a convenient process to deploy Stable Diffusion Model on Amazon SageMaker Endpoint, for a secure, robust and scalable deployment solution in product environment.

All the code is available on GitHub.

Download Stable Diffusion Model Weights

Prerequisite

  1. Prepare a disk space no less than 60 GB. Consider Amazon Elastic File System (EFS) https://aws.amazon.com/efs/. If you choose to use Amazon SageMaker Studio or Notebook instance, you have access to EFS by default.
  2. Make sure there is sufficient quota of Amazon SageMaker Endpoint instance type of your choice in your AWS Account, e.g. ml.g4dn.xlarge , before starting.
  3. Install git-lfs per description at: https://git-lfs.github.com/
  4. Create a HuggingFace account if haven’t already at: https://huggingface.co/
  5. Navigate to HuggingFace repository model card at: https://huggingface.co/runwayml/stable-diffusion-v1-5 and Agree to Accept the terms and conditions.
  6. (optional) Navigate to HuggingFace repository model card at: https://huggingface.co/runwayml/stable-diffusion-inpainting and Agree to Accept the terms and conditions.
  7. Create a Amazon S3 bucket.

Clone HuggingFace Stable Diffusion Text-to-Image and Image-to-Image Repository

# on Amazon EFS  
mkdir diffusion && cd diffusion
git lfs install --skip-smudge
git clone https://huggingface.co/runwayml/stable-diffusion-v1-5
cd stable-diffusion-v1-5
git lfs pull
rm -rf .git
rm -rf .gitattributes
cd ..

These command will download the model weight and remove unnecesary repository history to save disk space and network traffic later.

(optional) Clone HuggingFace Stable Diffusion Inpainting Repository

git clone https://huggingface.co/runwayml/stable-diffusion-inpainting 
cd stable-diffusion-inpainting
git lfs pull
git lfs install --force
rm -rf .git
rm -rf. gitattributes cd ..

Deployment on Amazon SageMaker

Boilerplate

pip install boto3
pip install sagemaker

Assume IAM Role

import sagemaker 
import boto3
sess = sagemaker.Session()
# sagemaker session bucket -> used for uploading data, models and logs
# sagemaker will automatically create this bucket if it not exists
sagemaker_session_bucket = None
if sagemaker_session_bucket is None and sess is not None:
# set to default bucket if a bucket name is not given
sagemaker_session_bucket = sess.default_bucket()
try:
role = sagemaker.get_execution_role()
except ValueError:
iam = boto3.client("iam")
role = iam.get_role(RoleName="sagemaker_execution_role")["Role"]["Arn"]
sess = sagemaker.Session(default_bucket=sagemaker_session_bucket)
print(f"sagemaker role arn: {role}")
print(f"sagemaker bucket: {sess.default_bucket()}")
print(f"sagemaker session region: {sess.boto_region_name}")

Make sure the IAM Role that you are assuming has:

AmazonS3FullAccess
AmazonSageMakerFullAccess

Path to S3

S3_BUCKET = "YOUR_S3_BUCKET_NAME"
MODEL_ID_TEXT2IMAGE = "stable-diffusion-text-to-image"
MODEL_ID_IMAGE2IMAGE = "stable-diffusion-image-to-image"
MODEL_ID_INPAINT = "stable-diffusion-inpainting"
s3_location_t2i = f"s3://{S3_BUCKET}/custom_inference\
/{MODEL_ID_TEXT2IMAGE}/model.tar.gz"
s3_location_i2i = f"s3://{S3_BUCKET}/custom_inference\
/{MODEL_ID_IMAGE2IMAGE}/model.tar.gz"
s3_location_inpaint = f"s3://{S3_BUCKET}/custom_inference\
/{MODEL_ID_INPAINT}/model.tar.gz"

Deployment Helper Function

from sagemaker.huggingface.model import HuggingFaceModel

def deploy_huggingface_sagemaker(model_s3_location, role):
# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
model_data=model_s3_location, # path to your model & script
role=role, # iam role with permissions for Endpoint
transformers_version="4.17", # transformers version used
pytorch_version="1.10", # pytorch version used
py_version="py38", # python version used
)

# deploy the endpoint endpoint
predictor = huggingface_model.deploy(
initial_instance_count=1, instance_type="ml.g4dn.xlarge"
)
return predictor.endpoint_name

Text-to-Image Model

In the cloned Stable Diffusion Repository directory, create a folder code/ and put 2 files:

requirements.txt

numpy==1.23.4
torch==1.12.1
diffusers==0.6.0
transformers==4.23.1
spacy==3.4.2

inference.py

from diffusers import StableDiffusionPipeline
import torch
import base64
import numpy as np


def process_data(data: dict) -> dict:
return {
"prompt": [data.pop("prompt", data)] * min(data.pop("number", 2), 5),
"guidance_scale": data.pop("guidance_scale", 7.5),
"num_inference_steps": min(data.pop("num_inference_steps", 50), 50),
"height": 512,
"width": 512,
}


def model_fn(model_dir: str):
t2i_pipe = StableDiffusionPipeline.from_pretrained(
model_dir,
)
if torch.cuda.is_available():
t2i_pipe = t2i_pipe.to("cuda")

t2i_pipe.enable_attention_slicing()
return t2i_pipe


def predict_fn(data: dict, hgf_pipe) -> dict:

with torch.autocast("cuda"):
images = hgf_pipe(**process_data(data))["images"]

# return dictionary, which will be json serializable
return {
"images": [
base64.b64encode(np.array(image).astype(np.uint8)).decode("utf-8")
for image in images
]
}

Folder structure

.
├── code
│ ├── inference.py
│ └── requirements.txt
├── feature_extractor
│ └── preprocessor_config.json
├── model_index.json
├── model.tar.gz
├── README.md
├── safety_checker
│ ├── config.json
│ └── pytorch_model.bin
├── scheduler
│ └── scheduler_config.json
├── text_encoder
│ ├── config.json
│ └── pytorch_model.bin
├── tokenizer
│ ├── merges.txt
│ ├── special_tokens_map.json
│ ├── tokenizer_config.json
│ └── vocab.json
├── unet
│ ├── config.json
│ └── diffusion_pytorch_model.bin
├── v1-5-pruned.ckpt
├── v1-5-pruned-emaonly.ckpt
├── v1-inference.yaml
└── vae
├── config.json
└── diffusion_pytorch_model.bin

Now compress the repository in itself and upload to S3, via CLI:

cd stable-diffusion-v1-5
tar zcvf model.tar.gz *
aws s3 cp model.tar.gz YOUR_S3_BUCKET_DIRECTORY

Then we can deploy the model via:

deploy_huggingface_sagemaker(YOUR_S3_BUCKET_DIRECTORY, role)

You can check the progress in AWS Console under `SageMaker > Inference > Endpoint`. Record the returned Endpoint name to use for inference later.

Invoke Endpoint

request_body = {
"prompt": "ancient chinese garden, overlooking full moon, ethereal colors, trending on artstation",
"number": 3,
"num_inference_steps": 50,
}
# Serialize data for endpoint
payload = json.dumps(request_body)
client = boto3.client("sagemaker-runtime")
response = client.invoke_endpoint(
EndpointName="endpoint name returned in the previous step",
ContentType="application/json",
Body=payload,
)
res = response["Body"].read()

Then decode and visualise them:

import matplotlib.pyplot as plt 
import base64
import numpy as np
for img_encoded in eval(res)["images"]:
pred_decoded_byte = base64.decodebytes(
bytes(img_encoded, encoding="utf-8")
)
pred_decoded = np.reshape(
np.frombuffer(pred_decoded_byte, dtype=np.uint8),
(512, 512, 3)
)
plt.imshow(pred_decoded)
plt.axis("off")
plt.show()
Text Prompt: “ancient chinese garden, overlooking full moon, ethereal colors, trending on artstation”. 44.8s

Other Model Capabilities

Image-to-Image Model

Repeat the above steps but replacing the content of inference.py:

from diffusers import StableDiffusionImg2ImgPipeline
import torch
import base64
import numpy as np
from PIL import Image
height, width = 512, 512def process_data(data: dict) -> dict:
global height, width
height = data.pop("height", 512)
width = data.pop("width", 512)
init_image_decoded = np.reshape(
np.frombuffer(
base64.decodebytes(bytes(data.pop("init_image"), encoding="utf-8")),
dtype=np.uint8,
),
(height, width, 3),
)
return {
"prompt": [data.pop("prompt", data)] * min(data.pop("number", 2), 5),
"init_image": Image.fromarray(init_image_decoded),
"strength": data.pop("strength", 0.75),
"guidance_scale": data.pop("guidance_scale", 7.5),
"num_inference_steps": min(data.pop("num_inference_steps", 50), 50),
"height": height,
"width": width,
}
def model_fn(model_dir: str):
i2i_pipe = StableDiffusionImg2ImgPipeline.from_pretrained(
model_dir,
)
if torch.cuda.is_available():
i2i_pipe = i2i_pipe.to("cuda")
i2i_pipe.enable_attention_slicing()
return i2i_pipe
def predict_fn(data: dict, hgf_pipe) -> dict: with torch.autocast("cuda"):
images = hgf_pipe(**process_data(data))["images"]
# return dictionary, which will be json serializable
return {
"images": [
base64.b64encode(np.array(image).astype(np.uint8)).decode("utf-8")
for image in images
],
"height": height,
"width": width,
}

Invoke it via:

from PIL import Image  init_image = Image.open("sketch-mountains-input.jpg").convert("RGB")
request_body = {
"prompt": "A fantasy landscape, trending on artstation",
"init_image":
base64.b64encode(
np.array(init_image).astype(np.uint8)).decode( "utf-8"),
"height": init_image.size[1],
"width": init_image.size[0],
"number": 2,
"num_inference_steps": 40
}
# Serialize data for endpoint
payload = json.dumps(request_body)
client = boto3.client("sagemaker-runtime")
response = client.invoke_endpoint(
EndpointName="endpoint name returned in the previous step",
ContentType="application/json",
Body=payload, )
res = response["Body"].read()

Then decode and visualise them:

import matplotlib.pyplot as pltfor img_encoded in eval(res)["images"]:
pred_decoded_byte = base64.decodebytes(
bytes(img_encoded, encoding="utf-8")
)
pred_decoded = np.reshape(
np.frombuffer(pred_decoded_byte, dtype=np.uint8),
(eval(res)["height"], eval(res)["width"], 3),
)
plt.imshow(pred_decoded)
plt.axis("off")
plt.show()
Text Prompt: “A fantasy landscape, trending on artstation”. 54.7s

Image In-painting

Repeat the above steps in repo: `stable-diffusion-inpainting` and replacing the content of inference.py:

from diffusers import StableDiffusionInpaintPipeline
import torch
import base64
import numpy as np
from PIL import Image

height, width = 512, 512


def process_data(data: dict) -> dict:
global height, width
height = data.pop("height", 512)
width = data.pop("width", 512)

init_image_decoded = np.reshape(
np.frombuffer(
base64.decodebytes(bytes(data.pop("image"), encoding="utf-8")),
dtype=np.uint8,
),
(height, width, 3),
)

mask_image_decoded = np.reshape(
np.frombuffer(
base64.decodebytes(bytes(data.pop("mask_image"), encoding="utf-8")),
dtype=np.uint8,
),
(height, width, 3),
)

return {
"prompt": data.pop("prompt", data),
"image": Image.fromarray(init_image_decoded),
"mask_image": Image.fromarray(mask_image_decoded),
"strength": data.pop("strength", 0.75),
"guidance_scale": data.pop("guidance_scale", 7.5),
"num_inference_steps": min(data.pop("num_inference_steps", 50), 50),
"height": height,
"width": width,
}


def model_fn(model_dir: str):
inp_pipe = StableDiffusionInpaintPipeline.from_pretrained(
model_dir,
)
if torch.cuda.is_available():
inp_pipe = inp_pipe.to("cuda")

inp_pipe.enable_attention_slicing()
return inp_pipe


def predict_fn(data: dict, hgf_pipe) -> dict:

with torch.autocast("cuda"):
images = hgf_pipe(**process_data(data))["images"]

# return dictionary, which will be json serializable
return {
"images": [
base64.b64encode(np.array(image).astype(np.uint8)).decode("utf-8")
for image in images
],
"height": height,
"width": width,
}

Invoke the Endpoint:

from PIL import Imageinit_image = Image.open("overture-creations-5sI6fQgYIuo.png").convert("RGB")
mask_image = Image.open("overture-creations-5sI6fQgYIuo_mask.png" ).convert("RGB")
request_body = {
"prompt": "a cute cat lying on a park bench",
"image": base64.b64encode(
np.array(init_image).astype(np.uint8)).decode("utf-8"),
"mask_image": base64.b64encode(
np.array(mask_image).astype(np.uint8)).decode("utf-8"),
"height": init_image.size[1],
"width": init_image.size[0],
"num_inference_steps": 50,
}
# Serialize data for endpoint
payload = json.dumps(request_body)
client = boto3.client("sagemaker-runtime")
response = client.invoke_endpoint(
EndpointName="endpoint name returned in the previous step",
ContentType="application/json",
Body=payload, )
res = response["Body"].read()

Then decode and visualise them:

import matplotlib.pyplot as plt
import base64
import numpy as np
for img_encoded in eval(res)["images"]:
pred_decoded_byte = base64.decodebytes(bytes(img_encoded, encoding="utf-8"))
pred_decoded = np.reshape(
np.frombuffer(pred_decoded_byte, dtype=np.uint8),
(eval(res)["height"], eval(res)["width"], 3),
)
plt.imshow(pred_decoded)
plt.axis("off")
plt.show()
Prompt: “a cute cat lying on a park bench”
Text Prompt: “a cute cat lying on a park bench”, 14.6s

Clean up

Make sure that you delete the following resources to prevent any additional charges:

  • Amazon SageMaker endpoint.
  • Amazon SageMaker endpoint configuration.
  • Amazon SageMaker model.
  • Amazon S3 buckets.

Reference

  1. https://github.com/huggingface/diffusers
  2. https://github.com/aws-samples/amazon-sagemaker-image-based-transformers-examples
  3. https://huggingface.co/docs/sagemaker/inference#deploy-a-transformers-model-trained-in-sagemaker
  4. https://huggingface.co/blog/deploy-hugging-face-models-easily-with-amazon-sagemaker
  5. https://github.com/huggingface/notebooks/blob/main/sagemaker/10_deploy_model_from_s3/deploy_transformer_model_from_s3.ipynb
  6. https://github.com/huggingface/notebooks/blob/main/sagemaker/17_custom_inference_script/sagemaker-notebook.ipynb

--

--