Stable Diffusion Character Creation

7 min readNov 16, 2022

A workflow for (Automatic1111) character creation

There are a lot of wonderful works that we’ve seen online, as well as examples of how Stable Diffusion can create works of art. And, while DALL-E 2 has it’s own (admitted, very excellent and fast) “Outpainting” canvas, not everyone can afford to use it for the few hundred generations these things can take; so Stable Diffusion (coupled with a cloud provider in my case) are an excellent solution with minimal cost.

Wonderful images, weak effect

What I’ve found challenging to find, though, are examples of building characters. Images that show whole bodies tend to be excellent…except for the face.

Thimble, Aleria’s friend, with a few facial issues

Facial profiles with generally look fantastic, but don’t give a very complete feel of the person.

From the above two images, you can get a different feel of the character, but it doesn’t give a full expression. How do they stand? Body type, style, mode of dress? It feels lacking, and with the tools that exist, seems like it can be better. Let’s see what we can do.

Note that this workflow below is what works for me; you’ll want to modify some settings and parameters as you go through your own.

Aleria, Priestess Healer

Let’s generate Thimble’s friend, Aleria, based on the below backstory.

Our friend Thimble, above has another member of the party, a Cleric named Aleria who is a priestess healer. She and her party are known, as such they’ve been invited to the Court for celebration.

Below is the image Stable Diffusion generated, stock model version 1.5, for Aleria. (I should have generated a background for her, but did not in this case.)

Values and notes on Aleria’s initial image

Being that we know what we want, and we are being specific, the CFG default of 7–7.5 is a little low. I started with 7 for the first image below. (Bump it to 12–15 for later generations)

Use the Batch Size to generate several renderings of your target character; not all outputs are going to be great; set it to 8 the first time through. It’s going to be set there for a while, so if you have slow machines, set it to 4 and hoppe you get a lucky seed.

Speaking of seeds, once you select the character, save the seed value. It’s typically in the text under the image. You’ll need that for the last step in our process. If you forget, don’t worry, as all PNG files store your prompt, negative prompt, seed and more information inside the PNG file itself; you can read it from the file with a PNG reader or put the image back into Automatic1111 to read it. This seed was 1212835026.

Select your preferred sampler. I’m using DPM++ 2M Karras as it allows for great generation in 30 steps. More than that with this sampler is a waste of GPU.

Finally, I used the following prompt and negative prompt.

Prompt: (steel armor), 24 year old caucasian human woman. Kind facial expression, ((Greenish eyes)), porcelain smooth skin, (((wispy))) blonde hair. ((thick eyebrows)), strong brow, dainty nose, lush lips
Negative Prompt: helmet, deformed head, deformed feet, deformed hands)), out of frame, black and white, game, text, b&w, cartoon, bad art, poorly drawn, blurry, disfigured, deformed, extra limbs, extra heads, noise, out of frame, extra hand, extra fingers

After a few renders, the sixth result was a good image. (Note that means her true seed value is 1212835026+6)

Anyone who wants to choose a different race, please do so! The results should be “similar enough” to do the work; be as specific as possible.

There are a few issues with the framing, but nothing problematic.

Growing the image

As we want more of the whole picture, we need to add the rest of her body. Now, she’s slightly pitched and a little out of frame, but overall that won’t affect the point of this generation.

To make “more” of her appear effectively, we need to switch to the SD 1.5 Inpainting model. This could be done without the Inpainting model, but that model does a much better job at understanding what is being asked; it can take the image and the image mask to create the result. Details of this are an excersise for the reader.

In Automatic, once the model is switched, use the buttons under your preferred image to “send to inpaint”, and wait for the image to load.

Inpaint has a huge number of values, but the ones that were the most important were the prompt, changing the CFG value to 12–15, and setting the inpaint at full resolution. The Padding can go to any value, so some experimentation there is still needed. Set Masked Content to ORIGINAL.

Finally, at the bottom, there is a menu for scripts, Choose Outpainting mk2.

You’ll want 512 pixels, with a blur or overlap of about 32, and only choose one of the directions; in this case, we choose BOTTOM.

Add to the text prompt what you want to add to your image. Keep your negative prompt and carry it over!

Prompt: Lower portion of woman’s armor, well worn, holding holy symbol of goddess

Keep your batch size at 8 and render a few images. Find the one you like.

Unfortunately, the PNG chunk doesn’t get carried over in the outpainting mk2, so your values will not be able to be recovered from subsequent images. Keep that first image!

Once you’ve found a good image, again, from here, send it to inpainting. If you’ve not managed to run out of RAM, or the image isn’t too big (yet) it will render.

Change your prompt to the next section of the picture, and if the batch of 8 doesn’t look good, go ahead and change the seed randomly. In my case, the seed worked well for the next image.

Prompt: armored woman standing, waist and upper thighs clad in armor and filigree

Prompt: armored woman standing, thighs and legs clad in armor and filigree

Usually, by this point legs get messed up. And that extra hand…we’ll have to get rid of that! But let’s get the rest of the legs first. Two more passes to get legs longer, as well as a little more “courtly”.

And finally, the bane of generation. Feet. Feet never come out easily, and typically take a long time to get the right “fit”. If you are doing feet, expect this to take longer than you want, with many batches of passes.

Note that DALLE-2 does a great job on footwear.

Prompt: armored woman standing, filigree armored footwear.

I do realize there’s not a lot of heels in armored suits.

Finally, we do have that extra hand. We can inpaint with the masking tool to “Erase” the hand and replace it.

Deselect Outpainting form the scripts menu. It’s done it’s job of extending the canvas downward. Use the editor to simply erase the hand and some area around it, and generate a new set of images. Pick through to find what you like.

Prompt: fabric covering

Clean up the head

Remember that grey? Well, now it’s come back to haunt us. And, the inpainting trick above doesn’t work. I’m not sure if it’s because the mask color is special, or if there’s some other trickiness to the hair, but in general, setting this to masked content of ORIGINAL seemed to never result in a change.

So, for the grey I had to switch masked content to FILL instead of ORIGINAL. (I also had a mask blur of zero, but I’m not sure that mattered.)

So I learned this: fix the initial image early if you can, because by now, the images are larger, slower, and typically need to be dragged into the inpaint window to work.

Prompt: woman in armor, standing, Outdoors in lush tree area.

Final pass!

Once the image is built up, do a final pass on it to even out any lines, soften tone, and give it a good once over.

Using inpainting, set your Denoise to 0.3, put in the original prompt from the fist step, and generate a few images. You’ll notice very subtle shifts in each, so pick the one that preserves what’s important to you. In my case, the face and boots were important.

Closing

This same technique can be used to add above the head, such as hats or skylines. Left and Right would only be useful in the initial image generation; later uses of it cause strange results and should be avoided.

Good luck to you on your adventures in Stable Diffusion!