Intro to Generative AI — Images (Stable Diffusion, Midjourney, DALL-E)

Tyler Nee
The Lab @ Apply Digital
6 min readDec 1, 2022

Generative AI is one of the current zeitgeists for disruptive innovation. Due to recent advancements in AI research and development, AI-generated content has been flooding social feeds. Thus, we’re experiencing AI summer. Contrary to the concept of crypto winter, AI summer refers to a period of increased interest, development, and funding for the space. In short, Generative AI is the application of machine learning models that produce predictive outputs based on inputs such as written text prompts, images, and video.

There are currently five main branches of Generative AI: images, audio, text, code, and video. In this article, we’ll focus on images, the leading players in this subcategory, and what the next 12–18 months could look like for companies looking to navigate this new technology.

Here are some examples of Generative AI:

The following images have been created using Open AI’s DALL-E Text-to-Image Model. You can try it out for yourself here.

If I’m a cartoon artist and need inspiration, I can write a prompt such as “cartoon dogs floating in space” to generate this image.
If I’m a travel blogger and need stock images to add to my articles, I can write a prompt such as “wind turbines located at the top of a cliff overlooking a vast ocean” to generate this image.

This is a remarkable evolution in AI technology, and we have yet to unlock its full potential. So, where did this all begin?

The Timeline

In 2017, Google Brain originally started working on predictive text generation via Large Language Models (LLMs) to help with the problem of auto-translation of languages while preserving context. Slowly, other companies such as Open AI began to adopt and build upon this work leading to breakthroughs such as Open AI’s natural language generator, GPT-3, released in Jun. 2020, and their first text-to-image generator, DALL-E, in Jan. 2021. The latest advancement in generative AI has been the development of latent diffusion models in Dec. 2021. This discovery has helped companies generate higher-resolution images which has led us to the current AI gold rush. Fast forward to the summer of 2022, DALL-E2, Midjourney, and Stable Diffusion were released, and the tech world was taken by storm.

Src: https://octoml.ai/blog/from-gans-to-stable-diffusion-the-history-hype-and-promise-of-generative-ai

Now that we understand a bit of the history, let’s dive into the leading players within Generative AI images: Stable Diffusion, Midjourney, and DALL-E.

Stable Diffusion

Stable Diffusion has become a leader in AI image generation in recent months. As previously mentioned, the recent breakthrough in Generative AI has resulted from the development of latent diffusion models. Stable diffusion is no different. It is an open-source version developed by a company called Stability AI and its collaborators, CompVis LMU, Runway, EleutherAI, and LAION. The company was founded in 2019 by Emad Mostaque; an ex-hedge fund manager turned tech entrepreneur. In Oct. 2022, Stability AI announced its $101M seed round, valuing the company at $1B. Stability AI generates revenue through providing consulting services related to AI as well as its digital service, DreamStudio, which provides access to its image generation tool without the need to install software, obtain technical knowledge, or invest in heavy computing resources.

Midjourney

Next up, with what also feels like a purposely confusing naming convention, is Midjourney. It’s the company behind a proprietary product using a latent diffusion model, also called Midjourney. Images generated by Midjourney are generally more artistic in style and often look like paintings. The self-funded company was founded by David Holz (who also co-founded Leap Motion). The difference between Midjourney and other products, such as Stable Diffusion and DALL-E, is that it utilizes the popular social platform Discord as the input tool for creating images. Users join the Midjourney discord channels and interact with the tool using Discord bot commands. This may prove to be a make-or-break strategy for the product compared to Stable Diffusion’s and DALL-E’s browser-based product, as Discord is a contemporary method of interaction that appeals to younger audiences.

DALL-E

As a seemingly honorific shoutout to the 2008 Disney movie, WALL-E, about a lonely robot in a dystopian future, DALL-E proves to be “painting” a path toward a similar future. Open AI, the company behind DALL-E, was founded by Elon Musk and Sam Altman, with a notable investment of $1B from Microsoft. DALL-E is an early version of image generation, released in January 2021, leveraging a modified version of GPT-3, a generative text machine-learning model. In April 2022, DALL-E 2 was launched using a combination of latent diffusion models and GPT-3. And in November 2022, Open AI released a DALL-E 2 API enabling developers to integrate the tool into their applications. Big tech companies such as Microsoft have already begun integrating the API into their products with the launch of their “Canva-killer” app, Designer, and an Image Creator tool for Bing and Microsoft Edge.

Here are some comparisons:

src: https://www.marktechpost.com/2022/11/14/how-do-dall%C2%B7e-2-stable-diffusion-and-midjourney-work/
src: https://www.marktechpost.com/2022/11/14/how-do-dall%C2%B7e-2-stable-diffusion-and-midjourney-work/

What can we expect to see next?

The next 12 to 18 months will be a creative renaissance for a content generation as creators and companies adopt these tools. One powerful aspect of generative AI models is that once they are trained, they can be tweaked and fine-tuned. Take this text-to-Pokemon-image model, for example, where you can type in a prompt, and a pokemon-like image is generated. If we ignore the potential for copyright infringement, companies such as Nintendo and Game Freak could benefit from this as well. For instance, The Pokemon Co. has shown signs of creative fatigue with the release of recent games. When Pokemon Sun and Moon was released in 2016, The Verge wrote, “these new Pokemon all look terrible.” Using a tool like text-to-Pokemon would enable The Pokemon Co. to rapidly ideate thousands of new Pokemon when designing new game themes and adding more parameters.

If you’re a brand with lots of IP, you can start to fine-tune these models to help create new concept designs or mash-ups. Companies like Nintendo, Disney, and Netflix will look to leverage this new technology to supercharge their creativity. Furthermore, design products with generative AI features have already started to pop up, such as Canva’s Image Generator, Figma’s Ando — Design Co-Pilot, and Microsoft’s Designer, showing an understanding within the design industry that these tools should be embraced rather than scorned. Although most of these tools are still in their early stages, it won’t be long until we begin to see these tools become part of our everyday workflows and industry best practices as the technology matures.

Apply Digital solves complex problems with well-executed digital solutions for some of the most respected brands in the world. We help our clients gain a competitive advantage and delight users with expertly executed digital solutions. Learn more about our services at www.applydigital.com.

--

--