Deploy Stable Diffusion on Amazon SageMaker Endpoint
Stable Diffusion just crossed 1,000,000 downloads on the Hugging Face hub!
— Clem Delangue, Co-founder & CEO at Hugging Face
28-Oct-2022
In this post, we’d introduce a convenient process to deploy Stable Diffusion Model on Amazon SageMaker Endpoint, for a secure, robust and scalable deployment solution in product environment.
All the code is available on GitHub.
Download Stable Diffusion Model Weights
Prerequisite
- Prepare a disk space no less than 60 GB. Consider Amazon Elastic File System (EFS) https://aws.amazon.com/efs/. If you choose to use Amazon SageMaker Studio or Notebook instance, you have access to EFS by default.
- Make sure there is sufficient quota of Amazon SageMaker Endpoint instance type of your choice in your AWS Account, e.g.
ml.g4dn.xlarge
, before starting. - Install
git-lfs
per description at: https://git-lfs.github.com/ - Create a HuggingFace account if haven’t already at: https://huggingface.co/
- Navigate to HuggingFace repository model card at: https://huggingface.co/runwayml/stable-diffusion-v1-5 and Agree to Accept the terms and conditions.
- (optional) Navigate to HuggingFace repository model card at: https://huggingface.co/runwayml/stable-diffusion-inpainting and Agree to Accept the terms and conditions.
- Create a Amazon S3 bucket.
Clone HuggingFace Stable Diffusion Text-to-Image and Image-to-Image Repository
# on Amazon EFS
mkdir diffusion && cd diffusion
git lfs install --skip-smudge
git clone https://huggingface.co/runwayml/stable-diffusion-v1-5
cd stable-diffusion-v1-5
git lfs pull
rm -rf .git
rm -rf .gitattributes
cd ..
These command will download the model weight and remove unnecesary repository history to save disk space and network traffic later.
(optional) Clone HuggingFace Stable Diffusion Inpainting Repository
git clone https://huggingface.co/runwayml/stable-diffusion-inpainting
cd stable-diffusion-inpainting
git lfs pull
git lfs install --force
rm -rf .git
rm -rf. gitattributes cd ..
Deployment on Amazon SageMaker
Boilerplate
pip install boto3
pip install sagemaker
Assume IAM Role
import sagemaker
import boto3
sess = sagemaker.Session()
# sagemaker session bucket -> used for uploading data, models and logs
# sagemaker will automatically create this bucket if it not exists
sagemaker_session_bucket = None
if sagemaker_session_bucket is None and sess is not None:
# set to default bucket if a bucket name is not given
sagemaker_session_bucket = sess.default_bucket()
try:
role = sagemaker.get_execution_role()
except ValueError:
iam = boto3.client("iam")
role = iam.get_role(RoleName="sagemaker_execution_role")["Role"]["Arn"]
sess = sagemaker.Session(default_bucket=sagemaker_session_bucket)
print(f"sagemaker role arn: {role}")
print(f"sagemaker bucket: {sess.default_bucket()}")
print(f"sagemaker session region: {sess.boto_region_name}")
Make sure the IAM Role that you are assuming has:
AmazonS3FullAccess
AmazonSageMakerFullAccess
Path to S3
S3_BUCKET = "YOUR_S3_BUCKET_NAME"
MODEL_ID_TEXT2IMAGE = "stable-diffusion-text-to-image"
MODEL_ID_IMAGE2IMAGE = "stable-diffusion-image-to-image"
MODEL_ID_INPAINT = "stable-diffusion-inpainting"
s3_location_t2i = f"s3://{S3_BUCKET}/custom_inference\
/{MODEL_ID_TEXT2IMAGE}/model.tar.gz"
s3_location_i2i = f"s3://{S3_BUCKET}/custom_inference\
/{MODEL_ID_IMAGE2IMAGE}/model.tar.gz"
s3_location_inpaint = f"s3://{S3_BUCKET}/custom_inference\
/{MODEL_ID_INPAINT}/model.tar.gz"
Deployment Helper Function
from sagemaker.huggingface.model import HuggingFaceModel
def deploy_huggingface_sagemaker(model_s3_location, role):
# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
model_data=model_s3_location, # path to your model & script
role=role, # iam role with permissions for Endpoint
transformers_version="4.17", # transformers version used
pytorch_version="1.10", # pytorch version used
py_version="py38", # python version used
)
# deploy the endpoint endpoint
predictor = huggingface_model.deploy(
initial_instance_count=1, instance_type="ml.g4dn.xlarge"
)
return predictor.endpoint_name
Text-to-Image Model
In the cloned Stable Diffusion Repository directory, create a folder code/
and put 2 files:
requirements.txt
numpy==1.23.4
torch==1.12.1
diffusers==0.6.0
transformers==4.23.1
spacy==3.4.2
inference.py
from diffusers import StableDiffusionPipeline
import torch
import base64
import numpy as np
def process_data(data: dict) -> dict:
return {
"prompt": [data.pop("prompt", data)] * min(data.pop("number", 2), 5),
"guidance_scale": data.pop("guidance_scale", 7.5),
"num_inference_steps": min(data.pop("num_inference_steps", 50), 50),
"height": 512,
"width": 512,
}
def model_fn(model_dir: str):
t2i_pipe = StableDiffusionPipeline.from_pretrained(
model_dir,
)
if torch.cuda.is_available():
t2i_pipe = t2i_pipe.to("cuda")
t2i_pipe.enable_attention_slicing()
return t2i_pipe
def predict_fn(data: dict, hgf_pipe) -> dict:
with torch.autocast("cuda"):
images = hgf_pipe(**process_data(data))["images"]
# return dictionary, which will be json serializable
return {
"images": [
base64.b64encode(np.array(image).astype(np.uint8)).decode("utf-8")
for image in images
]
}
Folder structure
.
├── code
│ ├── inference.py
│ └── requirements.txt
├── feature_extractor
│ └── preprocessor_config.json
├── model_index.json
├── model.tar.gz
├── README.md
├── safety_checker
│ ├── config.json
│ └── pytorch_model.bin
├── scheduler
│ └── scheduler_config.json
├── text_encoder
│ ├── config.json
│ └── pytorch_model.bin
├── tokenizer
│ ├── merges.txt
│ ├── special_tokens_map.json
│ ├── tokenizer_config.json
│ └── vocab.json
├── unet
│ ├── config.json
│ └── diffusion_pytorch_model.bin
├── v1-5-pruned.ckpt
├── v1-5-pruned-emaonly.ckpt
├── v1-inference.yaml
└── vae
├── config.json
└── diffusion_pytorch_model.bin
Now compress the repository in itself and upload to S3, via CLI:
cd stable-diffusion-v1-5
tar zcvf model.tar.gz *
aws s3 cp model.tar.gz YOUR_S3_BUCKET_DIRECTORY
Then we can deploy the model via:
deploy_huggingface_sagemaker(YOUR_S3_BUCKET_DIRECTORY, role)
You can check the progress in AWS Console under `SageMaker > Inference > Endpoint`. Record the returned Endpoint name to use for inference later.
Invoke Endpoint
request_body = {
"prompt": "ancient chinese garden, overlooking full moon, ethereal colors, trending on artstation",
"number": 3,
"num_inference_steps": 50,
}# Serialize data for endpoint
payload = json.dumps(request_body)client = boto3.client("sagemaker-runtime")
response = client.invoke_endpoint(
EndpointName="endpoint name returned in the previous step",
ContentType="application/json",
Body=payload,
)
res = response["Body"].read()
Then decode and visualise them:
import matplotlib.pyplot as plt
import base64
import numpy as np
for img_encoded in eval(res)["images"]:
pred_decoded_byte = base64.decodebytes(
bytes(img_encoded, encoding="utf-8")
)
pred_decoded = np.reshape(
np.frombuffer(pred_decoded_byte, dtype=np.uint8),
(512, 512, 3)
)
plt.imshow(pred_decoded)
plt.axis("off")
plt.show()
Other Model Capabilities
Image-to-Image Model
Repeat the above steps but replacing the content of inference.py:
from diffusers import StableDiffusionImg2ImgPipeline
import torch
import base64
import numpy as np
from PIL import Imageheight, width = 512, 512def process_data(data: dict) -> dict:
global height, width
height = data.pop("height", 512)
width = data.pop("width", 512)init_image_decoded = np.reshape(
np.frombuffer(
base64.decodebytes(bytes(data.pop("init_image"), encoding="utf-8")),
dtype=np.uint8,
),
(height, width, 3),
)return {
"prompt": [data.pop("prompt", data)] * min(data.pop("number", 2), 5),
"init_image": Image.fromarray(init_image_decoded),
"strength": data.pop("strength", 0.75),
"guidance_scale": data.pop("guidance_scale", 7.5),
"num_inference_steps": min(data.pop("num_inference_steps", 50), 50),
"height": height,
"width": width,
}def model_fn(model_dir: str):
i2i_pipe = StableDiffusionImg2ImgPipeline.from_pretrained(
model_dir,
)
if torch.cuda.is_available():
i2i_pipe = i2i_pipe.to("cuda")i2i_pipe.enable_attention_slicing()
return i2i_pipedef predict_fn(data: dict, hgf_pipe) -> dict: with torch.autocast("cuda"):
images = hgf_pipe(**process_data(data))["images"] # return dictionary, which will be json serializable
return {
"images": [
base64.b64encode(np.array(image).astype(np.uint8)).decode("utf-8")
for image in images
],
"height": height,
"width": width,
}
Invoke it via:
from PIL import Image init_image = Image.open("sketch-mountains-input.jpg").convert("RGB")
request_body = {
"prompt": "A fantasy landscape, trending on artstation",
"init_image":
base64.b64encode(
np.array(init_image).astype(np.uint8)).decode( "utf-8"),
"height": init_image.size[1],
"width": init_image.size[0],
"number": 2,
"num_inference_steps": 40
} # Serialize data for endpoint
payload = json.dumps(request_body)client = boto3.client("sagemaker-runtime")
response = client.invoke_endpoint(
EndpointName="endpoint name returned in the previous step",
ContentType="application/json",
Body=payload, )res = response["Body"].read()
Then decode and visualise them:
import matplotlib.pyplot as pltfor img_encoded in eval(res)["images"]:
pred_decoded_byte = base64.decodebytes(
bytes(img_encoded, encoding="utf-8")
)
pred_decoded = np.reshape(
np.frombuffer(pred_decoded_byte, dtype=np.uint8),
(eval(res)["height"], eval(res)["width"], 3),
)
plt.imshow(pred_decoded)
plt.axis("off")
plt.show()
Image In-painting
Repeat the above steps in repo: `stable-diffusion-inpainting` and replacing the content of inference.py:
from diffusers import StableDiffusionInpaintPipeline
import torch
import base64
import numpy as np
from PIL import Image
height, width = 512, 512
def process_data(data: dict) -> dict:
global height, width
height = data.pop("height", 512)
width = data.pop("width", 512)
init_image_decoded = np.reshape(
np.frombuffer(
base64.decodebytes(bytes(data.pop("image"), encoding="utf-8")),
dtype=np.uint8,
),
(height, width, 3),
)
mask_image_decoded = np.reshape(
np.frombuffer(
base64.decodebytes(bytes(data.pop("mask_image"), encoding="utf-8")),
dtype=np.uint8,
),
(height, width, 3),
)
return {
"prompt": data.pop("prompt", data),
"image": Image.fromarray(init_image_decoded),
"mask_image": Image.fromarray(mask_image_decoded),
"strength": data.pop("strength", 0.75),
"guidance_scale": data.pop("guidance_scale", 7.5),
"num_inference_steps": min(data.pop("num_inference_steps", 50), 50),
"height": height,
"width": width,
}
def model_fn(model_dir: str):
inp_pipe = StableDiffusionInpaintPipeline.from_pretrained(
model_dir,
)
if torch.cuda.is_available():
inp_pipe = inp_pipe.to("cuda")
inp_pipe.enable_attention_slicing()
return inp_pipe
def predict_fn(data: dict, hgf_pipe) -> dict:
with torch.autocast("cuda"):
images = hgf_pipe(**process_data(data))["images"]
# return dictionary, which will be json serializable
return {
"images": [
base64.b64encode(np.array(image).astype(np.uint8)).decode("utf-8")
for image in images
],
"height": height,
"width": width,
}
Invoke the Endpoint:
from PIL import Imageinit_image = Image.open("overture-creations-5sI6fQgYIuo.png").convert("RGB")
mask_image = Image.open("overture-creations-5sI6fQgYIuo_mask.png" ).convert("RGB")request_body = {
"prompt": "a cute cat lying on a park bench",
"image": base64.b64encode(
np.array(init_image).astype(np.uint8)).decode("utf-8"),
"mask_image": base64.b64encode(
np.array(mask_image).astype(np.uint8)).decode("utf-8"),
"height": init_image.size[1],
"width": init_image.size[0],
"num_inference_steps": 50,
}# Serialize data for endpoint
payload = json.dumps(request_body)client = boto3.client("sagemaker-runtime")
response = client.invoke_endpoint(
EndpointName="endpoint name returned in the previous step",
ContentType="application/json",
Body=payload, )res = response["Body"].read()
Then decode and visualise them:
import matplotlib.pyplot as plt
import base64
import numpy as npfor img_encoded in eval(res)["images"]:
pred_decoded_byte = base64.decodebytes(bytes(img_encoded, encoding="utf-8"))
pred_decoded = np.reshape(
np.frombuffer(pred_decoded_byte, dtype=np.uint8),
(eval(res)["height"], eval(res)["width"], 3),
) plt.imshow(pred_decoded)
plt.axis("off")
plt.show()
Clean up
Make sure that you delete the following resources to prevent any additional charges:
- Amazon SageMaker endpoint.
- Amazon SageMaker endpoint configuration.
- Amazon SageMaker model.
- Amazon S3 buckets.
Reference
- https://github.com/huggingface/diffusers
- https://github.com/aws-samples/amazon-sagemaker-image-based-transformers-examples
- https://huggingface.co/docs/sagemaker/inference#deploy-a-transformers-model-trained-in-sagemaker
- https://huggingface.co/blog/deploy-hugging-face-models-easily-with-amazon-sagemaker
- https://github.com/huggingface/notebooks/blob/main/sagemaker/10_deploy_model_from_s3/deploy_transformer_model_from_s3.ipynb
- https://github.com/huggingface/notebooks/blob/main/sagemaker/17_custom_inference_script/sagemaker-notebook.ipynb