I Used Stable Diffusion to Create an Instagram Post

6 min readSep 25, 2022

#FoundersDiary #MarketingTech

I have been exploring how text-to-image models like Stable Diffusion could help in the creation of marketing and advertising visual assets. So, I thought of giving it a go by manually trying to re-create an Instagram post with the help of Stable Diffusion. I chose the following marketing asset for a friend’s startup — Just Dabao — which helps prevent food waste by offering fresh excess food from restaurants at high discounts:

Instagram post that I used as inspiration

APPROACH TO THE EXPERIMENT

I broke down my creation approach into 4different steps:

1. Generating the background (with a color scheme that works with the company logo)
2. Generating the small food plates
3. Generating the graphic of the earth celebrating
4. Combine assets in an editing tool and add text

I used Dreamstudio to generate the images which is tool where you can use Stable Diffusion and get 200 image generations for free. I generated 2 images per prompt. I also got the brand guide from my friend, so I had access to their exact colors.

Below, I share unfiltered prompts and the corresponding 2 images that I got for each prompt.

STEP#1: Generate the Background

Prompt#1: blank visual with a background color that works with this color scheme #F5AF2B and #007749 (brand colors from the logo of the company).

Prompt#2: Absolutely plain background in faded yellow color

Prompt#3: Uniform, plain, no design background in very light yellow color

Observation: With this prompt, I finally got a simple image that I could use as a background. Not ideal in terms of color, but it will do for now. Funnily, for the same prompt, it also generated the image on the right. I have little idea why this happened.

OKAY, so now we have a workable background. Let’s move on to step 2 — generating the images of the small food plates.

STEP-2: Generate the Small Food Plates

Prompt#4: image of an appetizing dessert kept on a white plate viewed from the top

Observation: the 2nd one looks reasonable and usable.

Prompt#5: image of a plate with barbecue chicken, rice and condiments, viewed from the top

Observation: I noticed a couple of times that it almost always cuts off a part of the plate in the generated image. So, next time I gave a prompt to show the “entire plate.”

Prompt#6: a circular image of an entire plate with barbecue chicken, rice and salad, viewed from the top

Observation: The 2nd one seems good and usable.

Prompt#7: a circular image of a fancy and delicate yellow cake viewed from the top

Observation: The 1st one seems good and usable.

Prompt#8: image of a bowl filled with almond bites viewed from the top

Observation: I made a mistake here. I asked for almond bites when what I really wanted was almond butter energy bites. Being specific is critical but it’s a challenge when you don’t know what you want. I had to scroll through Google to know that I was looking for the energy bites.

Prompt#9: entire image of a bowl filled with almond butter energy bites viewed from the top

Observation: Much better now. Both look good.

Prompt#10: entire image of a plate filled with strudels wrapped in a bamboo stick isometric view from the top

Observation: I was looking for something totally different, but this looks like sophisticated and exotic food, so works for me.

STEP-3: Generating the Graphic of the Earth Celebrating

This one was pretty challenging.

Prompt#11: graphical 2d lively animated image of earth

Observation: Need to make it more cartoony

Prompt#12: graphical 2d lively animated cartoon image of earth

Prompt#13: graphical 2d cartoon image of earth with 2 hands and a birthday cap on top of it

Prompt#14: graphical 2d cartoon image of a character in the shape of earth; it has 2 hands and a birthday cap on top

Prompt#15: a cartoon image of earth showing that it is happy and celebrating

No excitement from the earth, but it’s okay.

Prompt#16: happy earth cartoon celebrating birthday

After 6 attempts, I gave up on the idea of recreating the visual in the original post and decided to pick the one that could convey the same sentiment (rather than the same image. I kind of liked the 2nd image in prompt#14 so I decided to use that for the final version.

STEP#4: Combine Assets in an Editing Tool and Add Text

Output#1: Using all the raw assets

Observation: the squarish sizes of the images are a deal breaker for making anything pretty. So I decided to add another step of “background removal”

Output#2: Using assets with backgrounds removed

I used www.remove.bg to remove the background of the assets and it looked much better.

Side By Side Comparison

Not a terrible side-by-side comparison, right?

KEY TAKEAWAYS

A. Innovate rather than recreate

In this exercise, I was trying to re-create an existing asset, which is a template and the company’s marketer re-uses it for 2 out of the 3 weekly posts. However, the true power of generative AI models would come when marketers have to generate unique assets regularly. Use it for innovation and scale asset creation rather than replace the current “templatized world.”

B. Unpredictability is a feature and bug

With generative AI, it’s hard to know in advance what the model will produce. This can help with creativity and inspiration, and throw ideas that marketers may not have thought about. But on the other side, it can be annoying to get to something that you have a clear image of in your head. For example, this post's first step was frustrating because the model wouldn’t give me a simple, plain background.

C. Asset editing and positioning

If the marketing asset comprises multiple image concepts (eg: food plates, earth, etc.), generating an asset is just step 1. Being able to edit and position all of the things appropriately is another challenge. In an ideal world, the asset generation should be such that the AI tool takes the marketing goal, brand guide, and campaign's main messages and based on these, auto-generates an asset that has everything figured out.
The shape of the assets matters a lot. Currently, the models just spit out squares or rectangles, so there is a need for image editing and background removal.