AI-Powered Image Editing with Inpainting Using SAM and Stable Diffusion

4 min readAug 25, 2024

Introduction

In the ever-evolving field of artificial intelligence, image editing has seen remarkable advancements. With the rise of sophisticated tools like the Segment Anything Model (SAM) and Stable Diffusion, AI-driven photo editing has become more accessible and powerful. This blog post delves into a project where these technologies are harnessed to create a web app that can swap out backgrounds in images. By leveraging SAM for segmentation and Stable Diffusion for inpainting, we can seamlessly replace parts of an image with AI-generated content, all through a simple text prompt. This was part of the Generative AI Udacity Nanodegree program.

Project Overview

The goal of this project was to build a web app that allows users to swap the background of a subject in an image with another background generated by Stable Diffusion based on a text description. This involved several key steps:

Loading and Configuring the SAM Model: Iused SAM to generate segmentation masks that define the area of the image to be replaced.
Inpainting with Stable Diffusion: With the mask in hand, I applied Stable Diffusion to fill in the masked area with new content generated from a user-provided text prompt.

Step 1: Loading and Configuring the SAM Model

I began by loading a pre-trained SAM model from Facebook/Meta’s repository. This model is designed to segment objects in images, which is crucial for identifying the area we want to replace. Here’s the code snippet for loading the SAM model:

from transformers import SamModel, SamProcessor

# Load the SAM model and processor
model = SamModel.from_pretrained("facebook/sam-vit-base").to("cuda")
processor = SamProcessor.from_pretrained("facebook/sam-vit-huge")

After loading the model, function was made to generate segmentation masks from the input image. This mask would later be used to isolate the area of the image that needs inpainting.

def mask_to_rgb(mask):
    bg_transparent = np.zeros(mask.shape + (4,), dtype=np.uint8)
    bg_transparent[mask == 1] = [0, 255, 0, 127]  # Green with transparency
    return bg_transparent

def get_processed_inputs(image, input_points):
    
    # Use the processor to generate the right inputs for SAM
    inputs = processor(image,
                      input_points=input_points,
                      return_tensors="pt").to("cuda")
    
    # Call SAM
    outputs = model(**inputs)
    
    # Now let's post process the outputs of SAM to obtain the masks
    masks = processor.image_processor.post_process_masks(
       outputs.pred_masks.cpu(), 
       inputs["original_sizes"].cpu(), 
       inputs["reshaped_input_sizes"].cpu()
    )
    
    # Here we select the mask with the highest score
    # Other criterias can also be used
    best_mask = masks[0][0][outputs.iou_scores.argmax()] 

    # NOTE: we invert the mask by using the ~ operator
    # This will make it more convenient to infill the background
    return ~best_mask.cpu().numpy()

Step 2: Applying Inpainting with Stable Diffusion

Once the mask was generated, the next step was to use Stable Diffusion for inpainting — the process of filling in the masked area with new content. diffusers library and specifically the AutoPipelineForInpainting class was utilized to handle this.

from diffusers import AutoPipelineForInpainting

# Load the inpainting pipeline
pipeline = AutoPipelineForInpainting.from_pretrained(
    'diffusers/stable-diffusion-xl-1.0-inpainting-0.1',
    torch_type=torch.float16).to("cuda")

# Optimize the pipeline for the hardware
pipeline.enable_model_cpu_offload()

This code sets up the inpainting pipeline, loading a pre-trained model that is specifically designed for this task. By using the segmentation mask from SAM, we can direct Stable Diffusion to generate new content only in the areas we want to replace.

Step 3: Executing the Inpainting Process

Finally, combine the segmented image and inpainting model to swap out the background. I fed an image of a car into the system, generated a segmentation mask for the car, and then replaced the background with AI-generated content.

raw_image = Image.open("car.png").convert("RGB").resize((512, 512))

# These are the coordinates of two points on the car
input_points = [[[150, 170], [300, 250]]]

mask = get_processed_inputs(raw_image, input_points)
Image.fromarray(mask_to_rgb(mask)).resize((128, 128))

# Load the AutoPipelineForInpainting pipeline
pipeline = AutoPipelineForInpainting.from_pretrained(
    'diffusers/stable-diffusion-xl-1.0-inpainting-0.1',
     torch_type=torch.float16).to("cuda")

# This will make it more efficient on our hardware
pipeline.enable_model_cpu_offload()

This created the masked image and loaded the AutoPipelineForInpainting pipeline. Now, let’s go to the Inpainting process in detail.

def inpaint(raw_image, input_mask, prompt, negative_prompt=None, seed=74294536, cfgs=7):
    
    mask_image = Image.fromarray(input_mask)
    
    rand_gen = torch.manual_seed(seed)
    
    image = pipeline(
        prompt=prompt,
        negative_prompt=negative_prompt,
        image=raw_image,
        mask_image=mask_image,
        generator=rand_gen,
        guidance_scale=cfgs
    ).images[0]
    
    return image

Conclusion

This project highlights the power of AI in the field of image editing. By combining SAM for accurate segmentation and Stable Diffusion for creative inpainting, we built a tool that can seamlessly replace backgrounds in images. As AI continues to evolve, we can expect even more impressive capabilities in image editing and beyond. This project is just the beginning of what’s possible when you combine the right models and techniques in creative ways.