Fooocus — The Easy(ier) Desktop Open-Source Image Generator
A Step-By-Step Installation Guide and Overview
In this article:
- Why use more than one image generation tool?
- Benefits of open-source models.
- Why consider Fooocus?
- Step-by-Step installation.
- Comparing images generated by Fooocus, DALL-E 3, and Midjourney.
- Advanced features.
- Applying styles, with sample images for select styles.
- Editing/Altering images and using images as prompts; with inpainting and outpainting.
By now, most people are familiar with at least one image generating diffusion model. But one really isn’t enough. Just like LLMs, it is always good to have more than one at your disposal, simply because, results differ. This is true of image generating AI models as well. The more you use one model, the more you realize that it has certain tendencies and images will often have a similar look. Generally, this can be overcome through varying prompts (not always asking in the same way) and adjusting the models’ settings, but sometimes, its just easier to snap over to a different diffusion model for a contrasting result.
Open-Source (Free)
An open-source model is a great additional image generator tool to have in your back pocket. They are free to use. They don’t require an internet connection. And nothing leaves your computer; so, there are no privacy concerns.
Hocus Pocus by Fooocus
(Extra points if you got that 1970’s musical reference) — Arguably, the three main players in the text-to-image generation arena are, Stable Diffusion via Automatic 1111 (pronounced: automatic eleven eleven), Midjourney, and more recently DALL-E 3. Each has their strengths and weaknesses.
Midjourney is expensive (at $100/yr.) and can be difficult to use given its Discord interface, unique prompt formatting, and limited control over iterations and consistency. It does, however, produce great images, if you are skilled in its use; but the steep learning curve is undeniable.
DALL-E 3 (the newcomer) is relatively simple to use, in that you can use natural language in the prompting; have better luck producing text; easier iteration to change and update a generated image, and the cost is included with a ChatGPT Plus account or Microsoft account (through Copilot). The one drawback is that the image quality can be lacking and often have a similar look.
Automatic 1111 (Stable Diffusion) is the best of the text-to-image generative tools available. It is free. It can be run on your desktop. And it provides the most control over images and iterations. The downside? A relatively substantial learning curve. If you really want to master Automatic 1111 and all its capabilities and features… it is quite a lot of work. And with such an active community, it is an effort which is never finished. It is a commitment of time and energy which most of us simply don’t have.
Enter Fooocus
Fooocus is a free, open-source, locally run, text-to-image generative tool. It works much like Automatic 1111 but without the complexity and steep learning curve. Although it doesn’t have all the bells and whistles of its big brother, it produces great images, and provides quite a bit of control. It is also VERY fast.
Fooocus has been able to shed complexity by opting to fine-tune for more realistic images by default. There are three models to choose from; a general image model; an anime-tuned model; and a model tuned more specifically toward realistic images.
Step-By-Step Installation
You can download Fooocus from the github page here: https://github.com/lllyasviel/Fooocus#download
- The download file size is a manageable 1.96 GB, but the extracted folder will be 5.45 GB, and if you opt to install the general-purpose model and the realistic model (which I would recommend), the total space required, is 24 GB. So be sure your computer has the space to accommodate it.
- Select a folder where you would like to download and install the program. You can create a “Fooocus” folder in a location of your choice and download there.
- Highlight the downloaded zipped folder (#1) and select “Extract all” (#2). Just elect to extract it in the same folder as the zipped folder (see image below).
A new extracted folder will appear (#1). You can now delete the zipped folder (#2) (see image below).
- Double click on “run.bat”. The first time you open any of the three models’ .bat files, the model itself will download and install. This takes a few moments. Once a model is installed, subsequent launchings are pretty much instantaneous.
- Before a model installs, you will be warned that the app is unrecognized. This will happen just once, when you install a model.
Disclaimer: I scanned the files for viruses and have been using Fooocus for months without incident. BUT… use your own judgement. I am not a security expert. So, if you proceed, you do so at your own risk!
- To proceed, click the “More info” link in the upper left corner of the window (#1). This will open an option to “Run away” (#2). Clicking on this button will start the installation process via a Windows command prompt (#3).
Upon completion of the installation, a local browser tab will be opened. This is your Fooocus interface. The process for launching Fooocus is the exact same (double clicking on the .bat file of the model you want to work with), but it will not go through the installation process and Fooocus will open almost instantaneously.
Comparison Images:
Here are some random images using Fooocus (the generic model = “run.bat”), Midjourney, and DALL-E 3 (via ChatGPT Plus). In each instance, the exact same prompt was used. I didn’t attempt to manipulate the images in any way.
Using Fooocus without any of its features (I’ll get into features later), simply enter the prompt in box at the bottom where it says, “Type prompt here.” (#1) and hit the “Generate” button to the right (#2).
Prompt #1:
beautiful landscape with a lake in the foreground, with a dock, and mountains in the background, in the early morning with sunrise and mist on the lake.
Note: All three do a pretty good job. A close look at the DALL-E 3 image, shows that it is leaning towards illustrative, but generally they all did a pretty good job.
Prompt #2:
an office conference room with a number of white-collar workers sitting around a conference table with a manager talking in front of a whiteboard and leading the group in some learning.
Note: Here we see a general weakness with diffusion models and multiple people. Generally speaking, hands, feet and faces are problematic if you look closely. Midjourney did the best. Fooocus had some issues if you look closely. Some of these might be rectified with Fooocus’ negative prompt feature but I’m going apples to apples here and doing nothing to improve any of the images. DALL-E 3, had problems, not only with the hands but also with the feet of the closest man with his back to us. Additionally, it is a much less realistic image.
Prompt #3:
A studio photographic image of a female executive.
Note: Here Fooocus and Midjourney did a predictably great job, with very realistic images. DALL-E 3’s image was quite a bit less realistic and I found that tends to be the case with DALL-E 3.
Prompt #4:
A female elf in the woods, with blue braided hair and a broad smile.
Note: Here is where I think you can see Fooocus shine. An elf is a mythical being. So the model is not replicating a human, but a very specific variation on a human-like creature. The Fooocus image is quite representative of what we might expect, if elves were such a thing. DALL-E 3 exaggerates to an extreme degree and Midjourney simply created a woman that appears more Viking than elf (no pointy ears) and only a hint of blue in her hair.
Fooocus Performs
As you can see from these simple examples… Fooocus is a very viable tool to have in your toolkit. It is easy to use; free and does a great job.
Advanced Features
Fooocus also has quite a few features which let you manipulate images. I will go over a few here and point out the links to additional documentation, if you wish to take a deeper dive.
- Checking the box beside “Advanced” at the bottom of the window, under the prompt section (#1), will open up a panel to the right side of the window.
- Here you see several sections. At the top are four tabs, “Settings”, “Style”, “Model”, and “Advanced” (#2). For this article, I’ll only touch on the first two; but if you go to the Advanced tab, you will see a Document link that will take you to Fooocus’ advanced explanations of features.
- The next section down is the Performance area (#3) where you can select “Speed”, “Quality”, or “Extreme Speed”. “Speed” is the default setting and it was the one I used on the above images, but generally, I prefer quality over speed. Fooocus is fast anyway… so I prefer to wait an extra second or two to get a better image.
- Next is the “Aspect Ratio” section (#4). Here you can elect to have your image rendered in various shapes of portrait, square or landscape/banner.
- The “Image Number” section (#5) allows you to render multiple images during a single generation. Fooocus defaults to two images, however, you can opt for more samples if you desire. Just remember that the more images that are rendered at one time, the longer the model will take to finish the generation process.
- Finally, is the “Negative Prompt” area (#6). Here, you can add things you DON’T want to see in the image. For instance… too many people in the image. Fingers and hands aren’t rendered properly… etc. In the provided field you can add things like “extra fingers”, “deformed hands”, “poor quality”, etc. Remember, you are listing things you don’t want to see. Generally, I leave this area blank unless there are things the model keeps rendering which are problematic.
Next is the “Style” tab at the top of the right-hand side of the window (#1 below image). Here is a list of styles which can be applied to the images you wish to create. There is an eye watering 214 styles to choose from. Additionally, you can mix and match to create any number of combined styles.
Styles
What happens when you select a style? Cleverly, Fooocus adds some prompt language behind the scenes to help bring out the associated style. You just add your normal text-to-image prompt and then select a style which you would like applied. This is optional but a great way to get what you want from the rendered image.
- Fooocus defaults to selecting their three styles (#2). This is fine for normal image generation.
- If you wish to try out other styles, I suggest unchecking all boxes except the style you are interested in rendering. You can then add additional styles to tweak the images further. Starting with too many styles selected, will create unexpected results.
The following images were created by using the same elf image prompt I used previously but adding a style from the style list. That is the only change I’ve made. You can see the results are quite different based on the selected style.
As you can see… the possibilities are limitless! And for a free local image diffusion model… it can’t be beat for simplicity of use.
Editing/Altering Images and Using Images as Prompts
Finally, let’s delve into how you can easily tweak images.
- First, check the box beside “Input Image” in the lower left side of the Fooocus window (#1). This will open up a section at the bottom of the window, which will allow you to alter images or use images to influence new images. There are three tabs, “Upscale or Variation”, “Image Prompt”, “Inpaint or Outpaint” (#2).
- When the “Upscale or Variation” tab is checked, this new section has a place where you can drop/upload an image (#3) and to the right is where you can choose to vary the image appearance (#5) or upscale the image (#6). There is also a link to detailed documentation.
These options are pretty self-explanatory, and I won’t go into detail because of that. Needless to say, you can drag an image here and opt to change its appearance subtly or strongly and you can upgrade an image which might be too small for your needs.
The next tab over is “Image Prompt” (#1). This tab allows you to use an image in place of a text prompt and by checking the “Advanced” box at the very bottom (#2), additional fields will open to allow for image merging, like FaceSwap and adjusting the relative influences of the provided images. This is beyond the scope of this article. To find out more, select the document link at the bottom of the Fooocus window.
Inpaint and Outpaint
Here are two super useful tools available in Fooocus. Inpainting, which allows you to make changes to a generated image (for example, changing the type of necklace in an image, or removing a necklace entirely). Outpainting, which allows you to expand a generated image. Some image manipulation services call this “Zoom Out” or “Uncrop”.
I’ll provide samples of both inpainting and outpainting below.
Outpaint
Below is an example of vertical outpainting.
- First, select the “Inpaint or Outpaint” tab (#1)
- Next, drag an image into the image field below the tab (#2). Here, I have dragged the image of the earlier created elf into the field.
- You can now select how you would like to expand the image. In this instance, I have selected to expand the image of the elf downward, to include more of the body (#3).
- Note that no text prompt is required in the prompt field (#4)
- Now, to generate your new expanded samples, select the “Generate” button (#5).
- You can see the resulting two expanded samples (#6). Notice how the new images show more of the arms, while keeping the rest of the image unchanged.
Now we will look at the results of a horizontal outpainting generation. In this instance, I have selected “Left” and “Right” at the bottom, under “Outpaint Direction” (#1), to expand the image horizontally by including more background and turning the image into a landscape/banner image.
Inpaint
Finally, I’ll briefly touch on inpainting. Inpainting is the process of changing some feature of a generated image. Here, once again, I have used the image of the elf, which I generated earlier.
Removing features…
- Using the same elf image, I have used the masking tool (you can see the circle masking tool and the icon which can increase or decrease the size of the tool, labeled as #1 below).
- I’ve masked the elf’s necklace with the intention of removing it. Notice that I did not need to be exact in the masking process (#2).
- There are icons for “undoing” (#3) and “erasing” (#4) the mask I created.
- Select the “Generate” button after you have completed your masking and see the result in the upper section. The necklace has been skillfully removed by Fooocus. NOTICE: In order to remove the necklace, I did not put anything in the text prompt field. That informs Fooocus that I wanted to remove the masked item, rather than changing it.
Changing features…
To change a feature, simply mask the area you want to change (like above) and then describe the change in the text prompt area (#1). In the below image, I have requested that the necklace be changed from the metal necklace to a necklace made of roses (#2).
You can even change expressions…
As you can see… for a free open-source image generating tool, Fooocus has a lot to offer. It has become my go-to image generation tool because of its ease of use, exceptional image quality, speed, and flexibility.
Give Fooocus a try. I think you’ll be pleased with the results…
I hope you have enjoyed this article and found it useful. If you made it down to the bottom here, please like and follow. It helps get me motivated to do more articles like these. Thanks!