Generated Image from Stable Diffusion 3 Medium

Is it a Volcano, or is it a Seashore? I say it’s a Volcano, and AI says it’s a Seashore. What do you think?

Generate images to fool AI!

Shashank Kotyan

Published in

Blogs of Shashank

4 min readJun 20, 2024

Hands-on Tutorial to use Stable Diffusion to generate images that can fool ResNet-50! (But doesn’t fool you!)

Introduction

Deep Neural Networks are vulnerable to various adversarial attacks; it can be done by changing one pixel [1] or subtly modifying all pixels [2] in the image. DNNs prove to be highly vulnerable to such image modifications. In this context, let me show a novel approach using Diffusion Models, a type of Generative AI to create images that can fool such DNNs [3]. This is a paradigm shift in creating such adversarial images that do not require image manipulations.

In this blog post, I explain how to create an adversarial image using Stable Diffusion to fool ResNet-50 using this approach.

Python Code using PyTorch, Diffusers, and CMA libraries

First, let’s initialize the Stable Diffusion Model. I will generate the images using SD-Turbo [4] for this tutorial.

pipe = AutoPipelineForText2Image.from_pretrained(
        "stabilityai/sdxl-turbo", 
        torch_dtype=torch.float16, 
        variant="fp16"
      )
pipe.unet = torch.compile(
              self.pipe.unet, 
              mode="reduce-overhead", f
              fullgraph=True
            )

Let’s initialize the random seed or latent vector for the Stable Diffusion. This initial seed is the random noise in the diffusion model that gets denoised with the sampling.

initial_seed = torch.random.randn(1, 4, 64, 64)

Let’s see how the generated image of a “Volcano” looks using this initial seed.

image = pipe( 
          ['A ultra realistic photo of a volcano'], 
          latents=initial_seed, 
          num_inference_steps=1, 
          guidance_scale=0.0
        )['images']

Looks like a Volcano! Good Job, Stable Diffusion!

Let us initialize the ResNet-50 [5] and see if it can classify the generated image. Here, we use the standard Torchvision ResNet-50, which has over 80% Top-1 Accuracy on the ImageNet-1k dataset.

model      = models.resnet50(weights="IMAGENET1K_V2")
output     = model(image)
logits     = F.softmax(output)
prediction = logits.argmax(dim=1)

ResNet can also classify this image! Good Job, ResNet!

We can now set up our CMA-ES optimizer [6] to optimize the initial seed. To not let the initial vector go too far, I use the L-infinity norm to restrict the optimization within an Ɛ-ball. Here, I explicitly use Ɛ = 0.5. This means each value in the optimized seed can vary ± 0.5 from the initial seed value. Authors report that higher epsilon generally has a higher success rate but produces poorer image quality.

eps  = 0.5
opts = cma.CMAOptions()
opts.set("bounds", [initial_seeds - eps, initial_seed + eps])
optimizer = cma.CMAEvolutionStrategy(list(initial_seed), 0.5, opts)

Let’s optimize the initial seed for 100 generations and find an image that can fool ResNet-50!

for i in range(100):
    seeds = np.asarray(optimizer.ask())

    generated_images = pipe(
                        ['A ultra realistic photo of a volcano']*len(seeds), 
                        latents=seeds, 
                        num_inference_steps=1, 
                        guidance_scale=0.0
                       )['images']

    output           = model(generated_images)

    logits           = F.softmax(output)  
    # Get the confidence on the class of Volcano
    # 980 is the ImageNet-1k index for Volcano
    fitness          = logits[:, 980]    

    optimizer.tell(seeds_population, fitness)

After optimization, the CMA-ES finds a seed vector to create this image! It looks like a “Volcano,” but somehow, ResNet-50 thinks it is a “Seashore.”

Err!!! What happened ResNet? I don’t think it is a seashore !!!

What do you think? Is it a Volcano? or Is it a Seashore?

Conclusion

This tutorial used EvoSeed [3], a novel approach to create photo-realistic images that can fool deep neural networks!

Full Code at https://github.com/shashankkotyan/EvoSeed/

Read Full Article at https://arxiv.org/abs/2402.04699

References

[1] Su, J., Vargas, D. V., & Sakurai, K. (2019). One pixel attack for fooling deep neural networks. IEEE Transactions on Evolutionary Computation, 23(5), 828–841.

[2] Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Vladu, A. (2017). Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083.

[3] Kotyan, S., Mao, P., & Vargas, D. V. (2024). Breaking Free: How to Hack Safety Guardrails in Black-Box Diffusion Models! arXiv preprint arXiv:2402.04699.

[4] Sauer, A., Lorenz, D., Blattmann, A., & Rombach, R. (2023). Adversarial diffusion distillation. arXiv preprint arXiv:2311.17042.

[5] Vryniotis, V. (2021). How to train state-of-the-art models using torchvision’s latest primitives.

[6] Hansen, N., Müller, S. D., & Koumoutsakos, P. (2003). Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (CMA-ES). Evolutionary computation, 11(1), 1–18.