An AI now generates 100% of our social media images. Here’s a use case.

Yegor Walowski
5 min readSep 2, 2022

--

As a startup founder, I keep following the most interesting developments in many areas including deeptech. A couple of months back I was browsing micro-documentaries on youtube. Since I am a huge fan of this genre I watch this vid all the time. Surprisingly enough simple short pieces of content can lead you to interesting technical topics you can later explore.

One of the videos that surprised me was a story about AI art by Vox. I was watching it with a constant notion to shout “Wait, it is actually happening”. The second I finished the vid I was saying to myself “We should try this”.

Quick note: I am a co-founder and CEO of an early-stage startup called Fintellect. We help IT agencies to make sense of their finances using the power of open-banking and no-code templates for financial routines. As we maintain an intelligent supercomputer tone of voice, I found it relevant to embed AI-generated content in our social media communication.

We’re a small team of six people and I am keeping a lean approach with the team. This means that hiring a social-media designer, even a part-time one would be a bit off our financial projection. So getting an AI to do this job wasn’t only a matter of hype but also a resource optimization issue.

If you would like to have a good text-to-image AI to serve you for social media visuals generation, basically you have two products to pick for this:

Looking for ways to start I have applied on both of the websites. Midjourney appeared to be the one that approved my request way earlier so I went with it.

The app itself is in the discord, which is a great way to test the MVP. It goes like this: you get your invite, sign in to the discord, read the intro and navigate to the channels where people are generating images. Seeing others playing around with semantic design helps you get better at prompt engineering.

Midjourney discord

After the first 100+ of test inputs with crazy mashups like “Ukio-E Sponge Bob Drawing” or “Navajo death metal album cover” I was trained enough to test the AI for the business needs.

We will be looking for the QA so I wanted to use this for the test imagery. I know that the QAs are very important for early-stage products since their work defines if the product will be bug-free and how successful the public launch will get.

Playing around with text-to-image technology I developed my flow to get relevant images:

  • Select wording
    By looking at the initial output it’s usually easy to say whether the style for the image is ok, or if there should be additional filters included in the text input.
  • Select output composition
    When the inquiry is abstract enough, for instance, if you don’t specify the alignment of elements (words like: in front, next to, standing, floating etc), the AI can get quite creative on the composition and elements that will be visualized. The task would be to select the proper composition in the outputs
  • Select relevant imagery
    Once you get a rough vision of the end result the task is to select the right variation and upscale it to the maximum.

Generating the image for QA hire I put this phrase into the engine:

A poster with a computer developer testing the software on the retro-futuristic computer in modern minimalistic office interior, extreme details, cinematic, studio lighting, poster art

Initial output results

The output variations seemed fine in terms of style, so I proceeded with V4 to get more composition variations

One of the variants variations

The composition seemed to be fine as well. Then I needed to pick a favorite end image to upscale and use for the final visual, so I ran the operation several times before getting the best end outputs

The end-images, upscaled

I would still need to place the images into Figma template with text to look at which one is better for posting. And so I did.

Social media visuals variations

As we have a blurred space at the bottom of our template I decided that #2 would be the best option to go with.

Summary: this test task was accomplished in 10 minutes. I’ve spent $30 on the unlimited image generation package.

Conclusion

Using text-to-image AI works for business tasks in their simple forms. However, the complete replacement of people in creative discipline is not yet happening. You still need to have somebody with a strong vocabulary and visual aesthetics to play around with the tools for a good end result.

Existing solutions are revolutionary. However, they do not input inquiries for you. Also, they do not produce basic visual styles. You can use filters like Disney cartoons, Cyberpunk, Salvador Dali, etc. The mashups can be also great. However, you operate the available dataset which is a product of the real world.

Bootstrapping the image generation is fun. And I would say it is way superior compared to picking stock footage. It’s not possible to do this for video or motion graphics, but I suppose it’s a matter of time. Also, I would really love to have an option to embed existing objects and combine them with AI-generated scenes. Probably that would be possible to do in the next releases.

Definitely worth trying.

--

--