Fun with QR Code generation using Stable Diffusion

Published in

CJ Express Tech (TILDI)

14 min readJun 22, 2023

A large single-layer pile of white and black very small smooth rice, ((top view)).

Images with Quick Response (QR) codes embedded can be generated using Stable diffusion with ControlNet. We will look into how to generate QR codes (and sample only good ones). But if you are unfamiliar with Stable Diffusion, here is my previous post. Now, let’s look at the basics of QR and ControlNet before we proceed to the fun part.

Background:

QR code:

QR code was developed in the 1990s as a way to provide more information than a standard barcode. Unlike barcodes, which require a beam of light to bounce off parallel lines, QR codes can be scanned digitally by devices such as mobile phones. A QR code consists of black squares arranged in a square grid on a white background, including some fiducial markers, which can be read by an imaging device such as a camera and processed using Reed–Solomon error correction until the image can be appropriately interpreted. The required data is then extracted from patterns that are present in both the horizontal and vertical components of the image. (Source: Wiki)

QR codes are robust in its ability to sustain “damage” and continue to function when part of the QR code image is obscured, defaced or removed. There are four error correction levels used for QR codes:

Level L — up to 7% damage
Level M — up to 15% damage
Level Q — up to 25% damage
Level H — up to 30% damage

Each level add different amounts of backup data depending on how much error correction may be required because of damage. (Source: Detailed blog)

ControlNet:

ControlNet is an end-to-end neural architecture that controls large image diffusion models, such as Stable diffusion, to learn task-specific input conditions. It aims to address the problem of data required to train such large diffusion models where data availability or computation is limited. Moreover, some tasks like depth-to-image, pose-to-human, etc, essentially require the interpretation of raw inputs into object-level or scene-level understandings, making hand-crafted procedural methods less feasible.

The image above shows what changes are between a diffusion block and a ControlNet block. The ControlNet clones the weights of a large diffusion model into a “trainable copy” and a “locked copy”. The locked copy preserves the network capability learned from billions of images, while a trainable copy is trained on task-specific datasets to learn conditional control. The trainable and locked convolutional layer is connected with a unique type of convolutional layer called “zero convolution”, where convolutional weights progressively grow from zeros to optimized parameters in a learned manner.

Setup:

First, we will be needing QR code images. Use any online webpage to generate the QR. The only limitation is to generate QR with high tolerance, i.e. Level H. Here is one such website for QR generation: Link
Git repos:
a. We will be using the widely popular Automatic1111 Stable diffusion WebUI for a generation.
b. In addition, we will be using the ControlNet extension to add ControlNet functionality to WebUI. Please follow the instruction is their respective pages for installation.
We will also be needing a few trained models to start. Download the models and paste them into the path: <install path>/stable-diffusion-webui/models/ControlNet
a. huggingface: control_v11f1e_sd15_tile.pth
b. huggingface: control_v1p_sd15_brightness (file named as: diffusion_pytorch_model.safetensors)
You may need to restart the UI to reflect any changes if not visible.

Toy example:

We will be using the txt2img tab for a generation. This will sample the latent vector from random noise and condition it on the prompt to generate the final image. This procedure is different from img2img generation where the given structural information in the image is destroyed slowly to random noise, and from there, the denoising process starts the reverse process to generate the image conditioned on the prompt.

We will present a toy example to show the workflow:

1. Enter the positive and negative prompts for the desirable output. Try to be as descriptive as possible. Here is a guide for reference: prompt-guide. Check Lexica art for inspiration.
2. Set the sampling method to DPM++ 2M Karras
3. Keep the CFG scale to 7
4. Set the seed to -1

5. Next, expand the ControlNet dropdown to enable two units.
a. If you don’t see more than 1 unit, please check the settings tab, navigate to the ControlNet settings using the sidebar, and configure the multi-ControlNet setting using the slider.
b. Apply the changes. Restart the UI if the changes does not reflect.
c. Upload the QR into both the unit

6. For ControlNet unit 0, use the following settings:
a. Preprocessor: inpaint_global_harmonious
b. Model: control_v1p_sd15_brightness[5f6aa6ed] (the hash could be different depending on the updated model)
c. Control weight: 0.35
d. Starting Control Step: 0
e. Ending Control Step: 1
f. Control Model: Balanced
g. Resize Mode: Crop and resize
h. Rest of the settings as default

7. For ControlNet unit 1, use the following settings:
a. Preprocessor: inpaint_global_harmonious
b. Model: control_v11f1e_sd15_tile[a371b31b] (the hash could be different depending on the updated model)
c. Control weight: 0.5
d. Starting Control Step: 0.4
e. Ending Control Step: 0.7
f. Control Model: Balanced
g. Resize Mode: Crop and resize
h. Rest of the settings as default

8. Click generate

9. If the image produced is not satisfactory, (check the tips section later on how to change the parameters)
10. The generated image might not be working QR. In that case, use a photo editing tool such as Photoshop or Photopea to overlay the QR code on top of the generated image. (Details in the section below)

Params:

Now that we have “hopefully” generated a decent-looking QR code, let us understand what each of the parameters means and how it changes the generation process.

Prompts: Positive prompts are used by the stable diffusion model to condition on, and generate images matching to the given prompts. When paired with negative prompts, the denoising process tries to steer the diffusion process away from the negative prompts. Hence, negative prompts are as important as positive prompts in steering the generation toward desirable outputs.
Sampling methods: A sampling method estimates the noise and removes it at each step of the denoising process to eventually obtain a clear image. There are different methods for the denoising process which mainly differ in speed, accuracy, and reproducibility. Here is a detailed guide for understanding all the available options: Link
Condition Free Generation scale (CFG): This scale determines the importance of the prompts in the generated image. A lower number means the prompts are less important, whereas a higher number means the image should have strong correspondence with the prompts.
Seed: For generation/sampling, random numbers are used by default. Setting it to -1 would mean every trial a new random number is used.
ControlUnits: ControlUnits are ControlNet models that influence the generation according to their given parameters, such as control weight, start, and end step, control modes, etc.
a. Control weight: This determines how much influence the ControlNet module should have in the guided generation. A lower value is less important, and vice versa.
b. Start and End step: Start and end steps denote the time steps between which the ControlNet model can influence the reverse diffusion process. 0–1 would mean the ControlNet model will keep sending guidance signals from start to end. Any number between 0–1 implies that ControlNet will actively engage in the denoising process only during those time steps. Note the 0–1 range is in percentage and converted to the number of steps during execution.
c. Control Modes: Balanced model implies the ControlNet model, and prompts will have almost equal influence on the generated image. The other two modes, as the name suggests, will have unequal importance.
d. Resize Mode: For a given specification, the generated output can be obtained by only resizing, resizing, and cropping, or resizing and filling.

Examples and settings:

Here are list of images generated images by other explorers with workflows:
1. Workflow link

2. Workflow link

3. Workflow link

4. Workflow link

My experiments: Try pointing the phone camera at each image. Zooming in and out may help sometimes. If you see try.this.out implies QR is working.

1. Img2img
Positive prompt: (featuring flowers and branches), solid pattern, natural, colorful, face
Negative prompt: ugly, blurry, obfuscated, circular, porcelain, ((broken))
Params: Steps: 100, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 743309854, Size: 512x512, Model hash: 6ce0161689, Model: v1–5-pruned-emaonly, Denoising strength: 1, ControlNet: “preprocessor: tile_resample, model: control_vilfle_sd15_tile [a371b31bJ, weight: 1.35, starting/ending: (0.3, 0.88), resize mode: Resize and Fill, pixel perfect: True, control mode: Balanced, preprocessor params: (-1, 1, -1)”, Version: v1.3.2

2. Img2img
Positive prompt: Green and yellow tile design for advertisement
Negative prompt: ugly. blurry, obfuscated, pixelated, dirty
Params: Steps: 123, Sampler: DPM++ 2M SDE Karras, CFG scale: 7, Seed: 2517964293, Size: 512X512, Model hash: 6ce0161689, Model: v1–5-pruned-emaonly, Denoising strength: 1, ControlNet O: “preprocessor:tile_resample, model: control_v11fle_sd15_tile [a371b31b], weight: 1.2, starting /ending: (0.32, 0.76), resize mode: Resize and Fill, pixel perfect: True, control mode: Balanced, preprocessor params: (-1, 1, -1)”, Version:v1.3.2

3. Img2img
Positive prompt: cyborg, ((masterpiece),(best quality),(ultra-detailed), ((full body:1.2)), 1 female, solo, hood up, upper body, mask, 1 girl, female focus, black gloves, cloak, long sleeves
Negative prompt: paintings, sketches, (worst quality:2), (low quality:2), (normal quality:2), lowres, normal quality, ((monochrome)), ((grayscale)), skin spots, acnes, skin blemishes, age spot, glans, nsfw, nipples, (((necklace))), (worst quality, low quality:1.2), watermark, username, signature, text, multiple breasts, lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, bad feet, single color, ((((ugly)))), (((duplicate))), ((morbid)), ((mutilated)), (((tranny))), (((trans))), (((trannsexual))), (hermaphrodite), extra fingers, mutated hands, ((poorly drawn hands)), ((poorly drawn face)), (((mutation))), (((deformed))), ((ugly)), blurry, ((bad anatomy)), (((bad proportions))), ((extra limbs)), (((disfigured))), (bad anatomy), gross proportions, (malformed limbs), ((missing arms)), (missing legs), (((extra arms))), (((extra legs))), mutated hands,(fused fingers), (too many fingers), (((long neck))), (bad body perspect:1.1)
Params: Steps: 113, Sampler: DPM++ 2M SDE Karras, CFG scale: 11, Seed: 3289247253, Size: 512x512, Model hash: 6ce0161689, Model: v1–5-pruned-emaonly, Denoising strength: 1, ControlNet 0: “preprocessor: depth_midas, model: control_sd15_depth [fef5e48e], weight: 1, starting/ending: (0, 1), resize mode: Crop and Resize, pixel perfect: False, control mode: Balanced, preprocessor params: (512, -1, -1)”, ControlNet 1: “preprocessor: tile_resample, model: control_v11f1e_sd15_tile [a371b31b], weight: 1.2, starting/ending: (0.27, 1), resize mode: Crop and Resize, pixel perfect: False, control mode: Balanced, preprocessor params: (-1, 1, -1)”, Version: v1.3.2

4. Img2img
Positive prompt: A large single-layer pile of white and black very small smooth rice, ((top view))
Negative prompt: poor quality, ugly, blurry, boring, text, blurry, pixelated, username, worst quality, (((watermark))), ((signature)), face, worst quality, painting, unrealistic, few, countable
Params: Steps: 116, Sampler: DPM++ 2M SDE Karras, CFG scale: 7, Seed: 2019563010, Size: 512x512, Model hash: 6ce0161689, Model: v1–5-pruned-emaonly, Denoising strength: 1, ControlNet 0: “preprocessor: depth_midas, model: control_sd15_depth [fef5e48e], weight: 1, starting/ending: (0, 0.95), resize mode: Resize and Fill, pixel perfect: False, control mode: Balanced, preprocessor params: (512, 1, -1)”, ControlNet 1: “preprocessor: tile_resample, model: control_v11f1e_sd15_tile [a371b31b], weight: 1.2, starting/ending: (0.25, 1), resize mode: Crop and Resize, pixel perfect: False, control mode: Balanced, preprocessor params: (512, 1, -1)”, Version: v1.3.2

5. Img2img
Positive promots: ((featuring flowers and branches)), chinese pattern, natural, negative space, traditional chinese ink painting
Negative prompt: ugly, blurry, obfuscated, circular, porcelain
Params: Steps: 85, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 743309854, Size: 512x512, Model hash: a504b5b137, Model: anything-v4.5-vae-swapped, Denoising strength: 1, ControlNet 0: “preprocessor: tile_resample, model: control_v11f1e_sd15_tile [a371b31b], weight: 1.2, starting/ending: (0.35, 0.85), resize mode: Resize and Fill, pixel perfect: True, control mode: Balanced, preprocessor params: (512, 1, -1)”, Lora hashes: “shukezouma_v1_1: 494301de3d6e”, Version: v1.3.2

6. Txt2img
Positive prompt: Cyborg in war, (detailed, hyper-detailed:2 ), (cinematic pose), centered, fire blazing, (ultra-realistic), detail metal armor with gunshot wounds
Negative prompt: paintings, sketches, (worst quality:2), (low quality:2), (normal quality:2), lowres, normal quality, ((monochrome)), ((grayscale)), skin spots, acnes, skin blemishes, age spot, glans, nsfw, nipples, (((necklace))), (worst quality, low quality: 1.2), watermark, username, signature, text, multiple breasts, lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, ipeg artifacts, signature, watermark, username, blurry, bad feet, single color, {((ugly)))) (((duplicate)]), ((morbid)), ((mutilated)), (((tranny)]), (((trans) ), (((trannsexual))), (hermaphrodite), extra fingers, mutated hands, ((poorly drawn hands)), ((poorly drawn face)), (((mutation))), (((deformed))), (ugly)), blurry, ((bad anatomy)), (((bad proportions)|), ((extra limbs)), (((disfigured))), (bad anatomy), gross proportions, (malformed limbs), ((missing arms)), (missing legs), (((extra arms))), ((extra legs))), mutated hands,(fused fingers), (too many fingers), (((long neck))), (bad body perspect:1.1)
Params: Steps: 24, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 3014335276, Size: 512x512, Model hash: 6ce0161689, Model: v1–5-pruned-emaonly, ControlNet 0: “preprocessor: inpaint_global_harmonious, model: control_v1p_sd15_brightness [5f6aa6ed], weight: 0.35, starting/ending: (0, 1), resize mode: Crop and Resize, pixel perfect: False, control mode: Balanced, preprocessor params: (-1, -1, -1)”, ControlNet 1: “preprocessor: inpaint_global_harmonious, model: control_v11f1e_sd15_tile [a371b31b], weight: 0.5, starting/ending: (0.4, 0.7), resize mode: Crop and Resize, pixel perfect: False, control mode: Balanced, preprocessor params: (-1, -1, -1)”, Version: v1.3.2

7. txt2img
Positive prompt: female sorcerer in a cloak armored with magical symbols, (intricate details, hyper detailed:1.2 ), cinematic shot, vignette, centered
Negative prompt: paintings, sketches, (worst quality:2), (low quality:2), (normal quality:2), lowres, normal quality, ((monochrome)), ((grayscale)), skin spots, acnes, skin blemishes, age spot, glans, nsfw, nipples, (((necklace))), (worst quality, low quality:1.2), watermark, username, signature, text, multiple breasts, lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, bad feet, single color, ((((ugly)))), (((duplicate))), ((morbid)), ((mutilated)), (((tranny))), (((trans))), (((trannsexual))), (hermaphrodite), extra fingers, mutated hands, ((poorly drawn hands)), ((poorly drawn face)), (((mutation))), (((deformed))), ((ugly)), blurry, ((bad anatomy)), (((bad proportions))), ((extra limbs)), (((disfigured))), (bad anatomy), gross proportions, (malformed limbs), ((missing arms)), (missing legs), (((extra arms))), (((extra legs))), mutated hands,(fused fingers), (too many fingers), (((long neck))), (bad body perspect:1.1)
Params: Steps: 24, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 2535223341, Face restoration: CodeFormer, Size: 512x512, Model hash: 6ce0161689, Model: v1–5-pruned-emaonly, ControlNet 0: “preprocessor: inpaint_global_harmonious, model: control_v1p_sd15_brightness [5f6aa6ed], weight: 0.35, starting/ending: (0, 1), resize mode: Crop and Resize, pixel perfect: False, control mode: Balanced, preprocessor params: (-1, -1, -1)”, ControlNet 1: “preprocessor: inpaint_global_harmonious, model: control_v11f1e_sd15_tile [a371b31b], weight: 0.6, starting/ending: (0.35, 0.7), resize mode: Crop and Resize, pixel perfect: False, control mode: Balanced, preprocessor params: (-1, -1, -1)”, Version: v1.3.2

Tips and tricks:

Aesthetically pleasing image with QR code generation is mostly a trial and error game. Not every image generated would be good-looking nor be a working QR code. Please follow these as a starting point and let your intuition guide you. There might be better optimal parameters.

Txt2img:

Seed:
a. Use a seed of -1 to generate random images.
b. Use the previous seed by clicking the ♻️ icon when experimenting with other parameters on the same starting point.
Prompts:
a. Use negative descriptions from other generations (internet) and keep adding more iteratively to avoid specific features during the generation.
b. To emphasize a particular word or group of words use () or multiple () around the word/s. E.g, (good), (((best quality))) (Only on Automatic1111 web UI)
c. Another notation that can be used is a word/words followed by a colon and a number usually higher than 1.1 within brackets. E.g: (best quality:2) (Automatic web UI)
Sampling methods:
a. Use this article for reference: Link and try different more than one.
b. DPM++ 2M Karras, DPM++ 2M SDE Karras, Euler a seemed to work in my case.
CFG scale:
a. Any value between 6 to 15 should work.
b. If more emphasis is required to the prompts, then increase the number or vice versa.
ControlUnits:
a. Selection of the control weight influence how the generation would appear. For strong influence, any number higher than one would be sufficient.
b. The control weight can be complemented by the start and end time steps of the control unit’s activation. The first 20–30% of the time steps are crucial for shaping the base features of the image, whereas the last 20–30% of the time steps are for finer details.
c. Each control unit will have a different influence on the image.
d. Using preprocessor: inpaint_global_harmonious and Model: control_v1p_sd15_brightness with weight 0.35 applied from the start till the end as the 0th unit appears to be beneficial if nothing else is in use.
e. Multiple variations of task-specific control can be used. Like depth map, pose, segmentation, line, tile, etc with each having its own trained model weights. Each one of these control types can influence image generation based on the reference image. For example, the image on the left is used as a reference image with the depth control unit followed by a QR control unit. Note this is an img2img generation, so the initial image used is the same as the reference image.
f. Preprocessor and model choice: Use the corresponding pre-processor available from the selection list, and select preview 💥 what the output would look like after preprocessing. If it does not change after processing, maybe it will not have any effect on the final generation. Try playing around!

Reference image (left) and generated QR image (right) using depth map Control unit.

Img2img: Along with above suggestions, here are few things that changes for img2img

Steps: For img2img generation, the number of steps required is usually around 80–120 steps. (But I have not experimented with lower values for other samplers. Feel free to ignore this suggestion)
Denoising value: Setting the denoising value for img2img generation to 1 would completely obliterate the original image. However, maintaining the denoising value between 0.8–0.99 or lower may help to produce an image that resembles the original image. In this manner, denoising can operate from a middle latent representation.
If the QR code is used as the initial and reference image, then higher control weight is generally used, i.e. > 1.1–1.5

Post processing: A post-processing step may help to improve QR quality by overlaying the QR code on top of the generated image.

Upload both images to Photoshop (drag and drop) in a single frame with the generated image (first) as the background followed by the QR code.

On the right, you can see the generated image added as background, followed by QR as a layer.

2. Resize the QR code to fit the generated image.

3. Use the opacity slider to control how much QR should be visible. To try more, use different blending options on the right-click menu on the QR image layer.