Is it a Volcano, or is it a Seashore? I say it’s a Volcano, and AI says it’s a Seashore. What do you think?
Generate images to fool AI!
Hands-on Tutorial to use Stable Diffusion to generate images that can fool ResNet-50! (But doesn’t fool you!)
Introduction
Deep Neural Networks are vulnerable to various adversarial attacks; it can be done by changing one pixel [1] or subtly modifying all pixels [2] in the image. DNNs prove to be highly vulnerable to such image modifications. In this context, let me show a novel approach using Diffusion Models, a type of Generative AI to create images that can fool such DNNs [3]. This is a paradigm shift in creating such adversarial images that do not require image manipulations.
In this blog post, I explain how to create an adversarial image using Stable Diffusion to fool ResNet-50 using this approach.
Python Code using PyTorch, Diffusers, and CMA libraries
First, let’s initialize the Stable Diffusion Model. I will generate the images using SD-Turbo [4] for this tutorial.
pipe = AutoPipelineForText2Image.from_pretrained(
"stabilityai/sdxl-turbo",
torch_dtype=torch.float16,
variant="fp16"
)
pipe.unet = torch.compile(
self.pipe.unet,
mode="reduce-overhead", f
fullgraph=True
)
Let’s initialize the random seed or latent vector for the Stable Diffusion. This initial seed is the random noise in the diffusion model that gets denoised with the sampling.
initial_seed = torch.random.randn(1, 4, 64, 64)
Let’s see how the generated image of a “Volcano” looks using this initial seed.
image = pipe(
['A ultra realistic photo of a volcano'],
latents=initial_seed,
num_inference_steps=1,
guidance_scale=0.0
)['images']
Let us initialize the ResNet-50 [5] and see if it can classify the generated image. Here, we use the standard Torchvision ResNet-50, which has over 80% Top-1 Accuracy on the ImageNet-1k dataset.
model = models.resnet50(weights="IMAGENET1K_V2")
output = model(image)
logits = F.softmax(output)
prediction = logits.argmax(dim=1)
We can now set up our CMA-ES optimizer [6] to optimize the initial seed. To not let the initial vector go too far, I use the L-infinity norm to restrict the optimization within an Ɛ-ball. Here, I explicitly use Ɛ = 0.5. This means each value in the optimized seed can vary ± 0.5 from the initial seed value. Authors report that higher epsilon generally has a higher success rate but produces poorer image quality.
eps = 0.5
opts = cma.CMAOptions()
opts.set("bounds", [initial_seeds - eps, initial_seed + eps])
optimizer = cma.CMAEvolutionStrategy(list(initial_seed), 0.5, opts)
Let’s optimize the initial seed for 100 generations and find an image that can fool ResNet-50!
for i in range(100):
seeds = np.asarray(optimizer.ask())
generated_images = pipe(
['A ultra realistic photo of a volcano']*len(seeds),
latents=seeds,
num_inference_steps=1,
guidance_scale=0.0
)['images']
output = model(generated_images)
logits = F.softmax(output)
# Get the confidence on the class of Volcano
# 980 is the ImageNet-1k index for Volcano
fitness = logits[:, 980]
optimizer.tell(seeds_population, fitness)
After optimization, the CMA-ES finds a seed vector to create this image! It looks like a “Volcano,” but somehow, ResNet-50 thinks it is a “Seashore.”
What do you think? Is it a Volcano? or Is it a Seashore?
Conclusion
This tutorial used EvoSeed [3], a novel approach to create photo-realistic images that can fool deep neural networks!
Full Code at https://github.com/shashankkotyan/EvoSeed/
Read Full Article at https://arxiv.org/abs/2402.04699
References
[1] Su, J., Vargas, D. V., & Sakurai, K. (2019). One pixel attack for fooling deep neural networks. IEEE Transactions on Evolutionary Computation, 23(5), 828–841.
[2] Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Vladu, A. (2017). Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083.
[3] Kotyan, S., Mao, P., & Vargas, D. V. (2024). Breaking Free: How to Hack Safety Guardrails in Black-Box Diffusion Models! arXiv preprint arXiv:2402.04699.
[4] Sauer, A., Lorenz, D., Blattmann, A., & Rombach, R. (2023). Adversarial diffusion distillation. arXiv preprint arXiv:2311.17042.
[5] Vryniotis, V. (2021). How to train state-of-the-art models using torchvision’s latest primitives.
[6] Hansen, N., Müller, S. D., & Koumoutsakos, P. (2003). Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (CMA-ES). Evolutionary computation, 11(1), 1–18.