AI Generated QR codes with ControlNet, HuggingFace and Google Colab

13 min readAug 6, 2023

AI generated QR codes are a new concept which I think will become mainstream very soon. Why would a restaurant, coffee shop or retail store want to use a boring old QR Code, when they could use a captivating, branded, AI generated one.

In this article we will explore some of the latest news and work around AI generated QR Codes, and then use an open source model from HuggingFace to generate our own QR codes inside a Google Colab notebook!

Disclaimer: Not all the QR Codes generated from the Colab, including some of the ones shown below, work properly, for reasons we also explore a little bit later in the article. Some of them will work. In any case, the aim of this article is not solely to create fully functioning QR Codes, but also to help get you started along that journey, and help captivate your imagination with some of the awe inspiring things that are going on in this space.

Contents

The Beginning and Now

Creating our own AI QR Code Generator in Colab

The Future

The Beginning and Now

It all started on Monday, June 5th, 2023 when a Redditor shared a bunch of AI generated QR code images he created, that captured the community. 7.5K upvotes on reddit, and the images started doing their rounds on social media.

Here are some of those QR Codes. They actually work, and can be scanned using your phones camera.

They are created using Stable Diffusion and Control Net. Stable Diffusion is the popular and open source text to image generator. ControlNet is a neural network that controls a pretrained image diffusion model (e.g. Stable Diffusion). Its function is to allow the input of a conditioning image, which can then be used to manipulate the image generation.

Basically it allows us to control the output image by using another image as the control. In this case, we use the QR Code as the control image, and the text to image generation is built around our control image. This allows the creation of some amazingly creative QR Code’s.

The original creators of the reddit post have gone on to create a website where you can create your own AI generated codes. You can also check out their Space on HuggingFace to try generating some of your own QR codes.

Dion Timmer has also generously create a model repository on HuggingFace which allows you to create your own QR Codes, as well as the source code to start interacting with the model. In the Dion Timmer model, you can use not only a text prompt, but also another image, to help guide the generation of your final QR code. The great thing about that is that you might have a logo, or brand sign, you would like to incorporate into your QR image, and this makes that possible, by merging the QR code with the image your provide.

Later in this article, we will create a Google Colab notebook, based on the Dion Timmer model, to start generating our own QR codes!

I also discovered the QRCode monster website, which has some really amazing QR Code images. They have also open sourced the initial models they used for generating these QR Code monsters on HuggingFace. You can generate your own QR Code monsters on their website also.

They have a nice thread on twitter on how they were inspired by the original reddit post, and went on to start training their own control nets to generate funky looking QR codes. The source code for the first version of their model is available open source on HuggingFace, which means you can start using it yourself to start generating your own spooky looking QR codes.

This is an image they shared from some of their earlier attempts at generating these spine chilling QR codes.

Someone on HuggingFace asked them how to start training their own models and control nets. A very fairy question indeed! They suggested starting with the following docs, so do look into that also if the idea sounds interesting.

https://github.com/lllyasviel/ControlNet/blob/main/docs/train.md

diffusers/examples/controlnet at main · huggingface/diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch …

github.com

Just out of interest, if you also want to find some non-AI generated artistic QR codes, check out these guys. https://art-qrcode.com. There are pretty cool.

Since publishing this post, there have been some neat developments in the field also. Hassan El Mghari and Kevin Hou have created an open source web application using Vercel, NextJS and tailwind, that allows you to generate artistic QR Codes. The link to the open repository is here.

Open Source Web Application for generating QR Codes by Hassan El Mghari and Kevin Hou

In order to generate the QR Codes, it uses AI models provided by Replicate that allow you to generate the QR codes simply by making a call to their API. Here is an example of one of the QR Code models on there. Replicate essentially provides a number of different AI Models you can access over API, making it super easy to build applications based around these AI models.

Creating our own AI QR Code Generator in Colab

So, let’s get started creating our own QR codes. Here is the link to the Google Colab I created to start doing just that.

It is essentially the same as the one from the Dion Timmer repository on HuggingFace, except I tweaked a few of the variable names, and added some more comments to make it easier read. Do copy the Colab and give it a few runs your self. The images its creates are super cool.

Here are the steps to get it running once you have opened the Colab.

First we need to setup the Colab runtime to use a GPU, as a lot of the commands used to generate the images require to be run on top of a GPU. To do that, click the Runtime option in the main menu, and click on Change runtime type.

Once the dialog opens, select the T4 GPU option from the checklist in the Hardware accelerator section. Then click Save on the dialog.

Next, we need to setup our QR Code image, and base image.

To get a QR Code, you can generate one yourself, for example on qrfy.com, or just google search for one and use whatever you find, or you can use the one I have copied here. Full disclosure, it link’s to my twitter profile, to make it easy for you to go there and smash me a follow :)

Next, you need a base image, which will serve as the sort of palette to build the image around. I did most of my tests using the samurai image below, but feel free to use whatever image you would like the final QR code to look like. Make sure the image is square though, otherwise the script will generate an error and you won’t be able to process the image. So crop the image to a square, if needed.

Now, we need to upload the these images to the Colab so that they can be used by our script. To do that click on the folder icon in the left hand menu, and then click on the upload icon. Then proceed to upload both images. That will put them inside the /content folder. I named my images qr_code_no_bg.png and samurai.png.

Whatever you name the images, make sure to update lines in the Colab that specify the image paths.

qr_code_image = load_image("/content/qr_code_no_bg.png")

init_image = load_image("/content/samurai.png")

For the QR code image I removed the white border before uploading it, because I preferred the outcome when the AI rendered image was just across the QR code itself, instead of going across the border also.

And that should be it!

Run the first part of the colab that does all the pip installs, and then run the second part of the code blocks which runs the actual code. The first time this will take longer than usual, because it needs to download all of the model files from the hugging face repository.

Note that the output images do not always work properly as QR Codes. One of the main downside of the model I found is that it usually makes the QR code part in white or light colours, which the camera readers cannot seem to read. I found that darker, landscape type pictures get the QR code to come out in a darker colour, which works better, for example this one. But unfortunatley it does seem to bias towards lighter QR codes. One trick could be to try inverting the image colours once you have generated it, and see if that helps.

Note that for the prompt part I had “a samurai side profile, realistic, 8K, fantasy”. In other words, my text prompt lined up exactly with the base image I provided. I felt that provided the best results with this model, but you can experiment with changing the prompt and pictures to see what the outcome is like.

The main variables to control and play around with when running the image generation with the Dion Timmer model are the guidance_scale, controlnet_conditioning_scale and strength . I played around with varying these from small to large values, and this is how I found they affected the pictures.

guidance_scale: The higher the values, the sharper the final image is, including the QR Code and base image. Below is an idea of what it looks like when varying the value between 0 and 100, where the middle one is at the near optimum value of 20

varying the `guidance_scale between 0 — 100`

controlnet_conditioning_scale: This tweaks how strong to make the QR image vs the base image. The images below you can see the results when varying the value between 0–5. At low values, the QR code does not come through at all. At high values, all you can see is the QR code. This field seems to have the ability to remove the base image nearly completely. With the other fields, the base image always comes through at least a little, even at higher values.

varying the controlnet_conditioning_scale `between 0 — 5`

strength: this also controls how strong the QR code comes through over the base image. At low values the QR code barely comes though. At high values it is much more dominant.

If you would like to see a more detailed break down of the image outcomes step by step as I incremented these values, you can check out this presentation link, where I included around 10 images at different values per field, just to get a better idea of how these changes affect the image across the entire range of values.

The text prompt is important also. For example, if I replace a “a samurai side profile, realistic” with “a barbie side profile, realistic”, the generated image looks like this. So it is still using the profile of the samurai from the base image, but kind of converting it into a more barbie type figure.

So I think this image gives a good example of how the base image, QR code, and prompt all work together to bring together the final image.

Let’s explore the Google Colab script a little.

from diffusers import StableDiffusionControlNetImg2ImgPipeline, ControlNetModel, DDI

The StableDiffusionControlNetImg2ImgPipeline, ControlNetModel, DDIMScheduler are all part of the HuggingFace diffusors library. As per the HuggingFace docs:

Diffusers is the go-to library for state-of-the-art pretrained diffusion models for generating images, audio, and even 3D structures of molecules. Whether you’re looking for a simple inference solution or want to train your own diffusion model, 🤗 Diffusers is a modular toolbox that supports both

So the diffusors library is our go to place for working with ML models, and using them to generate things like images and text.

Here, we are creating a ControlNetModel using the “DionTimmer/controlnet_qrcode-control_v11p_sd21” model from HuggingFace as our source. This will literally cause the script to go to that model repository from HuggingFace, and pull all the model files from there, in order to generate our images

controlnet = ControlNetModel.from_pretrained(
  "DionTimmer/controlnet_qrcode-control_v11p_sd21",
  torch_dtype=torch.float16
)

Then we create our pipeline using our control net model. Pipelines are like the higher level wrapper classes we can use to run our inferencing on. As per the HuggingFace docs,

A pipeline is an end-to-end class that provides a quick and easy way to use a diffusion system for inference by bundling independently trained models and schedulers together.

Inferencing just means running our tasks against an AI model. So in this case it means generating our images against our control net model. For ChatGPT, the inference would be the process of asking it the questions, and the responses we get back from it.

pipe = StableDiffusionControlNetImg2ImgPipeline.from_pretrained(
    "stabilityai/stable-diffusion-2-1",
    controlnet=controlnet,
    safety_checker=None,
    torch_dtype=torch.float16
)

This method just does some resizing to get the images to the same size. It will fail if the base image provided is not a square, at least approximately, so make sure to use square images only for the base image. Crop it if needed.

def resize_for_condition_image(input_image: Image, resolution: int):

And finally, we have the inference call against the pipeline. This is the pipeline for our control net mode, and where we can enter the prompt, as well as control the guidance_scale, controlnet_conditioning_scale and strength parameters that will decide the balance of whether our image looks more like our QR Code, or more like our base images and prompt. the The balance we are looking for is to get QR code to looking as much like our base image and prompt as possible, while still being able to function as a QR Code.

image = pipe(
    prompt="a samurai side profile, realistic, 8K, fantasy",
    negative_prompt="ugly, disfigured, low quality, blurry, nsfw",
    image=base_image,
    control_image=qr_code_image,
    width=768,
    height=768,
    guidance_scale=20,
    controlnet_conditioning_scale=2.0,
    generator=generator,
    strength=0.85,
    num_inference_steps=150
)

The Future

I am struggling to see a world where all QR codes do not end up looking like this, especially when it comes to more brand and entertainment type uses for things like menu’s, restaurants, and retail outlets.

It becomes another opportunity to promote your brand, by embedding a logo, image, or slogan inside the QR code. Early adopters will have the advantage of being able to amaze customers with these artisitic creations, as it would be the first time many of them will have seen such QR codes.

The magic sauce in all of this seems to be how the diffusion models and control nets are fine tuned. The first ones from the redditor, captured our imagination, and they seem to be biased towards Anime type image outputs. The images generated by the QR Code Monster team are also captivating, and they seem to be training their models towards these fantastic monster-like creations.

But it does seem that each model creates a certain type style of QR code. So there also opportunities for people to fine tune these models in certain ways, in order to create unique and differentiated QR codes.

Not all the QR Codes we generated with our model work, but it is still early days. The models, including the open source ones, will improve over time.

And then there will be the building of actual web applications and apps that make these features accessible to the public. Web applications would be more in my domain, so I might even try and get a QR code app up some day. But you now know as much as I do, so beat me to it if you can!

Edit: I did actually end up making a QR Code site :) You can check it out here on qrious.art. Let me know your thoughts!

Hope you enjoyed the read! Please do give me a follow on Medium if you did, and feel free to follow me on Twitter, where I post much more daily updates on the things I am working on and looking into. Cheers!