Writing prompts for Stable Diffusion: tools and optimization

Published in

PHYGITAL

7 min readJun 13, 2023

Neural networks and text2image models are becoming a common tool for work and art, and many specialists want to try them in action. But for widespread usage of these tools there is one problem in the way — how to tell neural network what you want to get? In this article we will talk about the most accessible ways to write a text prompt in Stable Diffusion and touch upon how you can quickly get wow-results.

We understand that in MidJourney and other text2image tools you can get better results and much faster, but we focus on Stable Diffusion as this neural network has a great community with open source development, meaning a huge amount of extensions, models, etc. We think that although prompt engineering is a skill to master, writing prompts can be a simple process.

Prompt is a small sentence of 200–300 symbols long, structured in the way that can be understood by neural network. It is often measured in tokens, and in general we can say that one token is a word or small phrase.

In the prompt there’s always a subject (what exactly we want to see on the image), what this subject is doing and / or in which environment, and other keywords. In the more advanced prompt structure sometimes you can also see a type of a desired result in the beginning of the prompt — a picture, painting, photograph, sketch or 3D render, for instance.

One important thing to remember is the closer the word stands to the beginning of the prompt, the more influence it has. In our Prompt Guide we have experimented with keywords placements and punctuation, and also showed how the most popular keywords affect the result. You can get view it for free here.

While writing prompts we recommend to describe what you want to see as detailed as possible. For example, if you simply write down ‘an elf’, neural network will have much space for ‘imagination’ and will likely produce the result of poor quality or won’t match your wishes at all.

That’s why for generating images with people we recommend to always describe the context, clothes or pose, especially when it comes down to generating a specific character that can have many interpretations in various cultures.

That’s where you would need to write many keywords, which would guide the neural network in the right direction. Basically you need to ask yourself: how would I describe what I want to see? Is it a portrait? Is it a concept sheet? Is it a sketch? Is it a photorealistic concept?

Ways of writing prompts

In order to find right keywords you can choose one of the three common ways:

write prompts by yourself, finding keywords from references and practicing ‘trial and error’,
use prompt builders,
use prompt optmization tools.

If you have chosen the first option, you could find the following resources very useful: Stable Difffusion Prompt Book, : Stable Diffusion Modifier Studies, SD Artists Collection, SD Artists Style Studies, SD Artist Studies and Stable Diffusion Cheat-Sheet, that came out recently.

In these resources you can find many modifiers which you could use in your prompts to find the best suitable options for your ideas.

We also highly recommend to use lexica.art — a website that resembles Google search for Stable Diffusion. By putting down any word (it can be anything from character or object to any style), you will get a collection of generations with detailed information about the presented images. It will include prompt, seed settings and resolution. Think of it as a Pinterest for AI.

We also made a small collection of prompts for characters, locations and objects, which you can use as an inspiration.

However, if you don’t have much time or you don’t want to spend it by trying out different keywords in the prompt, you can try using prompt builders. Usually it is a website where you can build a prompt from blocks that represent keywords. The most popular prompt builder is Promptomania.

By having a keyword and its visualization next to it, you can much easier and faster create a prompt and use it in the further generations. Prompt Builder is actually also available for other text2image models: MidJourney and DALL-E 2.

Another way of writing prompts is by using prompt optimization tools. Here we want to put emphasis on how easy it is to get a good prompt with minimum efforts.

Generations based on one word

The first tool we want to discuss is Prompt Extend. It helps to extend a small prompt by adding the most relevant keywords at the end of the prompt.

This option is perfect for those who just start working with text2image models, because in the final result you get the most suitable modifiers tailored specifically for your short prompt, and you don’t have to study numerous pages.

The similar idea is used while working with ChatGPT as a Stable Diffusion Prompt helper, but it’s not as straightforward. To enable ChatGPT to generate prompts you have to set the right context, meaning you need to tell ChatGPT how to build prompts, describe what it means, what you want to get as a result and give examples of great working prompts. Only then it can generate prompts. Due to this exact reason we can’t recommend using ChatGPT for beginners, because it requires certain knowledge and skill, and advise it as a tool for advanced users.

Nevertheless, in Phygital+ we made it possible for everyone to use ChatGPT — in our interface all you need to do is to write a brief prompt (as you would do it in Prompt Extend), press on the Magic Wand and wait. It will suggest a relevant prompt right in the SD text field (watch the video for more in-depth tutorial)

Another development of ours is Artistic Mode. Hidden form the user, it improves a written prompt resulting in better images. It has 4 modes: general, portrait, full body character and landscapes.

With custom models (Stable Diffusion pretrained on a particular style) you can also get a good result much quicker. They give a decent looking results with short prompts, especially if we talk about models such as Lyriel, DreamShaper or RevAnimated. In Phygital+ we have more than 80+ styles available.

It’s also importrant to remember about negative prompt — a small text that tells AI what it should NOT generate on the final image. This field commonly contains artifacts in generations like deformed hands, second person in the frame, noise, bad quality, etc.

By using negative prompt you can also improve your image generation and achieve quite good looking results with less artifacts. Lengthy negative prompts don’t always guarantee a great image, so we recommend taking several keywords of the most commonly used ones: grumpy, ugly, cropped, blurry, noisy, oversaturated, deformed, extra fingers, extra legs, extra limbs, out of frame, cut off, weird, bad proportions, low quality, low resolution, text, watermark, signature.

Another tool for working with prompts and searching for ideas is Image-to-text prompt. It’s a great and useful tool if you have a visual reference and want to generate similar image in Stable Diffusion.

All described tools and resources we have collected into one collection in our AI Library.

If you don’t know where to start with writing prompts, we recommend to begin with the most popular tools — Lexica, ChatGPT or Prompt Extend, all of which you can try in Phygital+ for free.

Writing prompts for Stable Diffusion: tools and optimization

Ways of writing prompts

Generations based on one word

Written by Daria Wind