Blooming of Generative Models ❁

6 min readApr 13, 2024

If you’re still amazed at how certain AI tools are able to create highly accurate results, from text and images to music and videos, results are so accurate even we get fooled about reality and AI-generated content, and you just can’t figure out what are they, and how they do it.

Well, they’re called Generative models, and rest assured, today we’re diving a bit into what makes these models so good at generating stuff!

Note: I’ll be talking purely about generating images of flowers, and the same concept is used to create videos, text, and audio.

Introduction

Generative models are a captivating branch of machine learning that focuses on understanding data patterns to create entirely new data that resembles the original data it learned from.

We’re used to seeing models that can distinguish existing data, models that can recognize different types of data (flowers), predict or classify them. Generative models learn the underlying structure and probability distribution of that data (flowers), allowing them to generate entirely new examples that follow the same patterns (i.e. can generate new flowers).

It’s like we teach the model to be creative and “dream up” its data, that resembles the data we want.

The thing that lets these kinds of models generate new data is their capacity to save these patterns and probabilities in something called a “Latent Space”

Latent space

Also called a Lower Dimensional Latent Space, a lower-dimensional representation of a higher-dimensional conceptual thing. — don’t get scared, it’s a relatively new concept, hence the difficulty in understanding it.

Every model has a latent space, this space is created and filled by a lower representation of data that our model trained on, we can imagine it as a vague place with random points clustered and classified by their type, genres, and similarities.

Latent space acts as a compressed representation of the training data. It captures the essential features and relationships within the data in a lower-dimensional space compared to the original data format. This allows the model to store the most relevant information about the data in a more efficient way.

Think of it as summarizing a book’s main ideas in a few key sentences.

Latent space is not exclusive to generative models

When we tell our model to generate a new flower, it navigates its latent space pinpointing the flower-like data, and from that subspace, it pulls the necessary structure and essential building blocks to assemble an entirely new flower.

Any nearby points in the latent space, produce similar results but never the same results, this gives our models a sense of continuity in its results

This continuity is crucial and extremely useful, it’s the reason why we do not get the same result twice using the same prompt, making these models useful for generating a large amount of content without changing the prompt each time.

We can say that the Latent space is how these models “understand”, models structure their latent space to match the overall desired output (flowers in our case), by looking at a huge number of pictures of flowers, it has actually extracted some of the structure of flower pictures in general.

Now that you understand from where generative models extract data, that’s more than enough for you to understand how it’s generated, the hard part is already done!

Decoding to generate (VAEs)

To generate new data, the model takes advantage of the latent space.

Imagine a painter who has studied countless paintings of flowers, their petals, color, and texture. Now, to create a new painting, he doesn’t simply copy an existing one. Instead, he draws inspiration from his internalized knowledge (latent space) and uses his artistic skills (decoder) to create a new, unique landscape.

The same goes for our model, it already has numerous points in the latent space similar to the results it wants to output, so, it takes a random sample of those similar points, representing new combinations of learned features, thus, the new result we get each time.

The part that lets the model perform this technique is called a “Decoder”, which is a neural network, that gets fed the selected points from the latent space (the compressed representation) and translates them back into the original data format, giving us the output we want.

This generative model architecture that uses an Encoder-Decoder is called “Variational Autoencoders”, short for VAEs

VAEs are awesome, but they’re not the only architecture in existence! Some generative models use a different approach altogether, ditching the decoder for something else. Let’s explore one of those next.

Generative Adversarial Networks (GANs)

representation of a generator and a discriminator

It’s a type of Generative models neural network, that enables us to generate realistic images or output in general.

It has two essential components:

Discriminator

Its role is to predict whether an image is real or AI-generated or distinguish the real data from the fake one.

Generator

This is the part responsible for generating images.

When the discriminator gives a wrong prediction, it gets a penalty, which basically is a punishment to the discriminator to do better in the next test, and the generator tries to fool the discriminator to make it believe that its generated photos are real.

The interesting thing about this network is that this harmony makes the generator improve by generating more and more realistic images, and the discriminator improves its prediction abilities to better distinguish generated photos from the real ones.

This model will keep looping this cycle till the Discriminator always gives a 0.5 output, meaning that it cannot distinguish whether this photo is real or fake, Which indicates that the Generator learned to generate realistic images.

Conclusion

Just imagine a world where the line between human creativity and machine-generated content blurs even further. Imagine AI composing symphonies that match the greats, or crafting hyper-realistic films indistinguishable from reality. Generative models are pushing the boundaries of what’s possible, not just in terms of creating new content, but also in challenging our understanding of what it means to be creative. As these models continue to evolve, the future holds exciting possibilities for artistic expression, scientific discovery, and perhaps even a redefined relationship between humans and the machines we create.

If you got here, don’t forget to mark your presence and be proud of yourself, go out and bloom into existence as you just understood one of the hardest concepts I’ve encountered in my Data Science journey so far, bravo!