Generative modeling: The overview

setting tone for GANS

Published in

Data Science in your pocket

7 min readJul 13, 2021

https://www.oreilly.com/library/view/generative-deep-learning/9781492041931/

GANs have taken the world by storm. The incoming of models like DALL-E (generating images given any fantasy text), the concept of Deepfakes, or apps like FaceApp, etc literally forced me to jump into generative modeling & at least have a taste of it. So, this time I may be penning down my longest series trying to unveil each & every possibility with GANs starting today. Though, taking a step back, I will start off with the basics of generative modeling

To begin with, On the basis of the learning approach on how a model learns, we have 2 types of models, Discriminative & Generative

Discriminative: Such models try to learn only the unique features that help in distinguishing between different classes present in training data & hence mostly classification algorithms like SVM, Random Forests, etc. They try to estimate P(y|x) that is the probability of a feature Y given a feature X

For example, if we need to classify horses & zebra amongst a given image dataset, a discriminative model will try to learn features that distinguish the two samples (say the stripes on zebra can be that distinguishing feature) but never understand how a zebra or horse looks like.

Generative: Such models, rather than figuring out the unique features that may help distinguish the different classes, try to learn the entire entities, i.e. how a horse/zebra looks like rather than just the unique feature that discriminates them like discriminative models. Such models actually understand the data & provide a very big advantage over discriminative models, data generation.

Generative models try to estimate

If unsupervised data: the probability of observing sample ‘x’ i.e. p(x) in a given dataset X
If supervised: the probability of seeing sample ‘x’ given a label ‘y’ in a given dataset X

A few key features to know about Generative models

Estimate the Distribution

The aim is to determine the distribution from which the training samples are generated. If such a distribution is estimated, generating numerous samples is very easy (recollect how we easily generate samples for a normal distribution where mean & std is given). By estimating the distribution, we mean to estimate parameters of the distribution (like mean & std for Normal Distribution)

Stochastic nature

Now that’s a tough pill to swallow. By stochastic, we mean an element of randomness. Such algorithms are stochastic in nature hence produce different outputs given an input. This property is necessary as if deterministic (fixed output for an input), the model may learn up some formula & hence the generative powers become exhaustible i.e. once unique inputs are over, unique outputs are over. This may be a big blocker where we wish to generate millions of images given a sample dataset.

Difficult to evaluate

That’s true. Comparative to a discriminative model where we can have numerous metrics like accuracy, F1-scores, etc. determining the quality of the artificial sample generated is tough. There exist no True or False but good or bad. The quality of the sample generated is more on the qualitative side.

Dependence on Latent representation for complex data

Latent(hidden) representation refers to a low(mostly) dimensional representation of complex data with usually higher dimensions. This plays a pivotal role in data generation as only important features of the data are preserved in the latent space that can trace back to approximately the original object. The stochastic elements are then added keeping the generated image similar to the theme/content of dataset X but not an exact copy of the training data.

Assume we try creating a latent space for cylinders. Now, when representing these cylinders in a lower dimension ‘latent space’, the latent space won’t remember every small detail but only the major ones that can define the basic characteristics of the cylinder-like height, radius, length, etc giving an idea of how a general cylinder looks like. Using this base knowledge preserved in latent space & small artificially added details (say texture, the color of the cylinder), new samples can be generated that is never seen but appears true to the dataset.

The greyed images are available samples & the dotted structures are possible structures the latent space may produce for some given input N that are never seen.

But why the world focusses majorly on Discriminative modeling?

Because it fits well with real-world problems. Rarely, problems need to know the distribution from which the samples are generated to generate artificial samples & hence ‘classification’ goes hand in hand with real-world issues. Consider the below problems

You get some medical images to detect Cancer. Here, you would wish your model to classify/detect cancerous tumors rather than generate more such images. Learning the distribution doesn’t make sense here !!
To mark an email as spam, the model needs to classify spam-no_spam content rather than to understand the distribution from which these emails are generated.

But the recent advancement in GANs and text generations has refilled a new life in generative modeling which is on a rise now & finding many real-world applications. But, still, if you are pretty early in your career, generative modeling can be ignored for now.

How does a general Generative Model framework look like?

We have a dataset X
We assume this X has been sampled from some distribution P_data
A generative model P_model will now try to act similarly to P_data generating similar samples. The goal is to train this P_model as close to P_data such that more samples of the same type can be generated using the P_model

But when do we believe P_model looks good?

When P_model generates similar samples (& not the same data seen from dataset X), the ‘theme’ of the samples remains the same, but other features change considerably. So if ‘X’ has facial data, P_model, to be successful, should be able to generate facial images such that

The samples aren’t a copy-paste from P_data (different faces not seen in the dataset)
The samples do resemble faces (and not horses !!)

The below example should setup an ideal case for us

Consider the above world map image. Assume the ‘black dots’ as the training dataset for which we wish to generate a Generatie model. Some key features of the training data

Each point is on ‘land’ region & not on the ocean parts of the world
There appears no specific choice of land. They are pretty evenly distributed over all continents

Assume we trained 3 different Generative models ‘a’, ‘b’ & ‘c’. These 3 models generated one point each. Let’s observe this

‘a’ produced ‘A’. But this isn’t an ideal generator as it produced a point in the ocean part of the world (observe A in the image). It couldn’t capture the most significant feature of the training data that is the point should be on land !!
‘b’ produced ‘B’. Again, looks like a failure as the model appears to be an overfit as produces exactly a similar point (very close to an existing black dot)
‘c’ produced ‘C’. This looks great as the point is on land & not very similar to the already existing data. Hence, ‘c’ is the ideal generator

We must clear a few common concepts that I will be referring to in my coming posts before ending this up

Sample space: The pool of values from which samples in training dataset ‘X’ can take values. For example: In the above world map scenario, the collection of all points on ‘land’ constitute the sample space
Probability density function (pdf): It’s the function that maps the probability of picking up a sample from a sample space.

For example, Any point on ‘water’ should have a probability=0 whereas on land is should be a constant ‘k’.

The summation of probabilities for all samples in a sample space should add up to 1 (pretty obvious). The P_data & P_model we discussed above are also a type of pdf. The only difference being P_data is the true pdf for dataset X and can only be one such pdf while P_model is something which we are trying to estimate & many P_models can be estimated for P_data. We must find a way to get the best P_model estimating P_data.

Parametric modeling: Representing any distribution using a set of parameters is called Parameter modeling. For example:

A. In Normal Distribution, if the mean & std of the distribution is known, samples from the respective distribution can be easily estimated & hence mean & std represnet an entire normal distribution N
B. In the distribution of all triangles that can be formed, it is majorly governed by 3 coordinates of the vertices. Hence, any sample can be estimated using (A, B, C) where A, B, C are co-ordinates

Likelihood: It is the probability of the parameters θ of a distribution given a sample dataset X. So, it is considered as the probability of some point ‘y’ from X given θ as parameters for the distribution.

For example: If y=0.5 & assuming the distribution to be Normal, the likelihood of mean=1 & std=2 is equal to probability of observing y=0.5 when mean=1 & std=2. When we have multiple samples similar to y, this likelihood is estimated using p(y1) X p(y2) X p(y3)….given the mean & std.

Maximum likelihood: Finding the optimal parameter θ that maximizes the probability of observing sample dataset X is called maximum likelihood. As the name suggests, given a set of samples y1, y2,y3,…The most optimal parameters can be estimated by choosing max(p(y1),p(y2),p(y3)….) amongst all possible parameters.

We will be using the above concepts in my next post where I will continue on how Naive Bayes act as a generative model & why they can’t be used for image/text generation.