GANs (Generative Adversarial Networks)

Rishab Das
The Deep Hub
Published in
24 min readJul 12, 2024

Today, I woke up and decided that I was going to write an article about GANs, I am vaguely familiar with the concept, but I don’t want to be vaguely familiar with this concept. I believe that these are the things where you can generate things, or possibly do other tasks with, but I am not sure. But I want to use these to do something cool. Something I can use to show that AI and Machine Learning is cool. Maybe this can do something related to the stock market, I don’t know, but lets go on an adventure and find out.

GANs stand for Generative Adversarial Networks. I am not going to be typing out Generative Adversarial Networks the entire time so just remember GANs. So what can we do with this GANs? What are they? How do they work? What is the genius math behind these things? What other question will I have when I get more answers? Let us begin!!

GANs

What are GANs?

The obvious answer to this question is that they are neural networks. But what makes them special? The thing that makes them special is there generative power. Thats why its in the name. This whole generative power thing boils down to a fundamental machine learning concept. Supervised Vs. Unsupervised learning. This, I know very well. Supervised learning is a learning process in which the model is trained with the outputs. It’s learning on something. Unsupervised learning doesn’t have the catergories or the outputs. It doesn’t really have this self evaluation mechanism that the supervised learning models have. So what do GANs have to do with this. Well, it actually very fascinating. GANs are model meant for unsupervised learning. I believe this means things like generating pictures and videos, but you still need to train these things on something right? What GANs do is generate their own new data to train on. I put that in bold because wow is that cool!! And then it trains by classifying the input as either fake (generated) or real (from the actual data). Now this is getting a little confusing to me, so I will clear it up for myself in the next paragraph.

There are two sub-models in a GAN. The first one is the generator and the second one is called the discriminator model. This is one of the parts I forgot to mention in the paragraph above. The whole premise of the model (GAN) is that it can take input data, figure out the relationships between the data, and output new (generated) things based on the data it was trained on and learned from. The way it trains and does this is by using the generator model to generate new examples (new data) and then the discriminator model which discriminates between (tries to figure out if) the data is fake or not. The models are trained adversarially (and a word that has popped up is zero-sum game). That’s where that word comes in. I think this means that they are put against each other, like they are in a game, and they have to each fool each other. Or, actually, the generator needs to make good enough examples that the discriminator is 50/50 in discriminating. Ah, that makes sense. The generator just has to fool the discriminator, then, it means that the generator is doing a good enough job, and that means the generator can generate well enough to be used to generate new things that seem close to the real thing. Does that make sense?

Simply put, two models, one that generates fake data and one that tries to discriminate against real vs. fake data are trained in this zero-sum-game style (I will explain that later) and the goal is for the generator to fool the discriminator.

The Generator Model

As stated many times before, the generator model generates fake data to try and fool the discriminator model. This model is the one that does the generating, this model is the one that gives the output in the future, after all the training is done (I think, what I mean by the previous two sentences is that its the model that makes the cool new images). But how does it actually work, whats the magnificent, glorious, fascinating math behind it? Let us dive in.

The input to the generator model is a random vector from a random noise sample. Wow, that sounds complicated, so lets break it down. What is the random noise sample? The random noise sample is a distribution, this distribution is like a statistical distribution, one you may have heard of is the Gaussian distribution. I believe, regardless of the type of distribution, it must be uniform. It takes a random vector from it, meaning, it takes a random few numbers (few numbers make up a vector) from the gaussian distribution (the random noise sample, it could be any distribution, we are just using the popular one).

Now the generator model is a model at the end of the day, so it has a second part of course, the input goes into the Neural Network that comprises the second part, but this network is interesting. Instead of breaking down an image, or the input, it builds it up. How does it do this you ask? Well it depends on the type of GAN you are trying to create, but lets do an example of a image GAN (a very popular type of GAN). The random number jumble (the vector) is passed into the first few dense layers, what these layers do it take the random noise (the vector) and transforms it into structure that is easier to grow from. This structure is then kind of blown up (added dimensions to) by a transposed convolutional layer. Instead of breaking dimensions down (normal convolutional layer), it blows dimensions up. These blown up dimensions begin to represent an image, they begin to show something close to any image. The final output layer makes it the same height, width, color scale, and all that of an image (or whatever real data you have). Simple as that. As it trains more and more, and gets feedback from the discriminator, its blowing up (I think) gets more accurate.

Now of course there is a loss function. The generators job is to minimize the loss function (less loss means better results, just think about it for a second), remember, its trying to fool the generator (which itself has a loss function it is trying to maximize). But, what you see below isn’t readable, I will make it readable.

Loss function for Generator Model

So, have L-sub-G which is the loss value for the generator model, the letters are used nicely. Then we have this negative E-sub-z-swigally-line-p-sub-z-of-z. What does that mean? So z is a noise vector. The E notation means all the average of all the log probabilities in the noise sample. Meaning, log probabilities means probabilities on a logarithmic scale, and then it just averages all of the probabilities, because remember, z is just random sampling, so there are a lot of probabilities. But what the heck is the log probability? It the log(D(G(z))) business. The log makes the probability into log form, the D is the discriminator judging the generator (G(z)), and the generator G is creating something from z. So, if you were to put this whole thing in less gibberish it would be:

Less Gibberish

Its that simple. It’s just is trying to reduce the negative log probability (thats what the negative means) and therefore, it is the same as maximizing the normal log probability. If you try and reduce how much doubt you have, you are increasing how much confidence you have. See, the confidence is the generator trying to fool the discriminator. Now let’s talk about the discriminator

The Discriminator

Now, the basic principle of the discriminator, as I have already told you, is that its job is to decide between fake and real data, the fake data is generated by the generator model. It also gives feedback to the generator model as we saw in the generator model equation, that log(D(G(z))) is the discriminator giving a log probability back to the generator model. But of course, there is a whole intricate, mathematical world behind this discriminator model, so lets dive in.

Similar to the generator model (actually to any model), there is an input layer. The input layer receives data that could either be fake or real. Also similar to the generator model, there are a bunch of convolutional layers, but they are not transposed layers, they work normally and do everything they are supossed to do, they break down the image (in this example, we are again, using an image GAN as an example), and they extract features from it. The output layer outputs a single probability, a scalar value to be more mathy. This probability represents if the input is real. It’s very simple, now lets look at the loss function equation.

Discriminator Model Equation

WOW, that is a lot, but if you look closely, and read closely about the generator model, much of this is similar. You can even see reminiscence of the Generator loss function. Instead of having random noise in the beginning, you have the actual data. Data means data (duh). And when you do have noise, its the exact same equation, just the log(D(G(z))) is now subtracted from one. I think this is because, and this may be a long explanation, but this whole loss function is trying to increased, the probability that the input is real is trying to be increased. This side of the equation is the random noise that the generator model generates, and that specific part of the equation (inside of the log) is the feedback that comes in the form of a probability (remember the log just makes into log probability). If you look closely at the equation, there are two terms. The real data component, (the terms just mean part, and the real data part is the part where data is written, haha). This doesn’t get the generator input because its real data. But the second part, this part that the whole explanation is about is using the generator input, evaluating it, and then, switching it around. The 1 — D(G(z))) is just switching the probability around. Think about it like this (I may have confused you more):

D(G(z)) is how confident the discriminator model thinks G(z) is real. 1-D(G(z)) is how confident it is that it isn’t fake (I think). But it just switches it around. And this is then averaged with the whole E letter thingy and then we see the whole equation as very similar to the generator model equation. But those two parts, the 1-D(G(z)) and the real data component of the equation are two small, but different things. (I am very sorry if I confused you more, but try and figure it out for yourself it you still are, thats what I did, I might me wrong, but at least I gave it a shot).

Simplicity

Because the whole loss thing is so complex, sometimes, you can see a simpler equation, a combined loss for the whole GAN, I like this equation, but for those (who are like me) and want to know exactly how something works (like down to the atom) the other equations are better. But here is something easier on the eyes:

Easy on the eyes loss equation

Simply, its just the loss of the discriminator plus the loss of the generator model. The whole goal of training is to update the generator my minimizing the L-sub-g and update the discriminator by maximizing L-sub-d. Increase deceptiveness and increase confidence.

Zero-Sum Game

I have been seeing this word (or group of words) pop up, and I have been wondering what they mean. The generator and the discriminator train in a zero-sum game. I think this means they train until their total failures add up to zero. Or it has to do with their losses or I’m completely wrong.

Zero-Sum game has to do with the word “Adversarial” Like, they train adversarially. This whole thing means that one of the models success comes at the other models expense. I think the model that is intending to succeed in the generator model. The model that will come at the expense of the success is the discriminator (tell me if I am wrong). These two models have an adversarial relationship, meaning, they are in direct competition with each other and the generator tries to fool the discriminator and the discriminator is trying to classify between fake and real and give feedback to the generator model. The loss functions are like the numerical representation of this. The generators loss function is trying to increase the confidence in which the fake data is classified (or discriminated) as real. The discriminator model is trying to be really really confident at detecting whether something is being faked or not. So, we have this battle. Another way to put the loss function number battle is generators loss function is is designed to improve the realism of the generated data, while the discriminators loss function is designed to penalize the Generator’s attempts to fool it.

Now, of course, there is a mathematical (I think) representation of it. And the following equation is actually very simple. If you of course paid attention to the explanations of the other equations. But, in the equation, you can see the battle taking place. It is kind of cool. A mathematical battle between loss function. And the generator is supposed to come out on top.

Math Battle of GAN

You can see, the equation’s goal is to minimize G and max D (minimize doubt and maximize confidence) and the two equations that make up the giant equation is just the generator model equation and the discrminator model equation.

Now, I asked ChatGPT to explain zero-sum training, and it gave a response based on game theory. Another cool theory, so let me try and explain it. In game theory, a zero-sum game is a situation in which one player’s gain is exactly balanced by the losses of other players. The losses in the function here are trying to balance out, and in order for that to happen, the discriminator model needs to not work (it won’t work because the generator is too good).

Project Time

Now, I hope I have explained the GAN to you all very well. I have tried many many times to do this, but I have never understood a GAN and its internal workings. So now, let us do a project. I am going to try and do some image generation with a GAN, or something with an image with a GAN. I want to know how the transposed convolutional layer works. The math behind it. But I will get to coding first, then when I hit the transposed convolutional layer stuff, I will explain it to you (again, to the best of my ability). Let’s go!!

Data

So what we are going to use is the MNIST dataset, the notebook I am learning from is from tensorflow, and can be accessed here. The generator model will generate models using the MNIST dataset. I will try to see, and explain to you, how random noise is used. I think, from my visualization in my head, the random noise (which is a statistical distribution) is plotted on a black and white scale, that’s why everything is divided by 255. I think that’s why the plot is like the way it is, it’s in black and white, and at first, it’s kind of randomly plotting things, and towards the end, its plotting less noise and more accuracy (or the values in the noise are mathematically manipulated to become more accurate).

So let’s load the dataset and work on it, I will show you the code, and then explain every line.

import glob
import imageio
import matplotlib.pyplot as plt
import numpy as np
import os
import PIL
import tensorflow as tf
from tensorflow.keras import layers
import time

from IPython import display


# Loading Data
(train_images, train_labels), (_, _) = tf.keras.datasets.mnist.load_data()



# Preprocessing Images
train_images = train_images.reshape(train_images.shape[0], 28, 28, 1).astype("float32")
train_images = (train_images - 127.5) / 127.5 # Normalizes images to [-1,1]

# Making Dataset
BUFFER_SIZE = 60000
BATCH_SIZE = 256

train_dataset = tf.data.Dataset.from_tensor_slices(train_images).shuffle(BUFFER_SIZE).batch(BATCH_SIZE)

So, let us begin to explain this code. We import everything, simple. We then get our data, we split this into train images, and train labels. Each label is associated with a certain image. The most important (in my opinion) line of code is preprocessing (I guess lines). But, basically, the lines makes the shape consistent and easily changeable. 28 x 28 is (I believe) what the MNIST dataset is default supposed to be. We also change it to float 32. I do think this is for better accuracy, but also faster training (depending of couse on the specs of what you are training on). Then, we normalize the images (I think this means pixel value) to be between -1 and 1. I had previously thought that this would be done using the number 255, but then I realized that that means the values would be between 0 and 1, and we don’t want that. But why? This is because -1 to 1 is the grayscale (or it may be the values of the noise that the generator randomly generates). But anyways, then we make a full train dataset. I think that the end result when we train this model, the syntax will be a little bit different, but lets find out. Before though, what does buffer size and batch size mean? I can explain batch size. Just think of it to be batches of 256 (multiples of 8 are good for the CPU), and then we have 1000 or so batches, I think thats the best explanation (I also do not know if we have 1000 batches or not). Buffer is the number of elements that will be shuffled (hence its in the shuffled function). The whole thing makes it a tf.data.Dataset object, I think tensorflow deals with this whole object better than pandas datasets and stuff.

Models

Generator

It looks like the code is in functions, probably for reusability. Below is the code for the function. There is a lot of LeakyReLU() and BatchNormalization and also, we finally have gotten to the Conv2DTranspose layer, yay!!! I will explain all three of those since I am not to familiar with them. I have heard of LeakyReLU, but I don’t know the math behind it, and I have heard of BatchNormalization.

# Making generator
def generator_model():

model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(7*7*256, use_bias=False, input_shape=(100,)))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.LeakyReLU())

model.add(tf.keras.layers.Reshape((7,7,256)))
assert model.output_shape == (None, 7, 7, 256)

model.add(tf.keras.layers.Conv2DTranspose(128, (5,5), strides = (1,1), padding='same', use_bias=False))
assert model.output_shape == (None, 7, 7, 128)
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.LeakyReLU())

model.add(tf.keras.layers.Conv2DTranspose(64, (5,5), strides = (2,2), padding='same', use_bias=False))
assert model.output_shape == (None, 14, 14, 64)
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.LeakyReLU())

model.add(tf.keras.layers.Conv2DTranspose(1, (5,5), strides =(2,2), padding='same', use_bias=False, activation='tanh'))
assert model.output_shape == (None, 28, 28, 1)

return model

Some of these parts I know, and some of them mathematically, I am not to familiar with the LeakyReLU (let alone ReLU) and also the batch normalization and also what does all of this stride and stuff mean (I know what the stride and stuff means conceptually, but what does it mean mathematically?) Let us begin.

LeakyReLU

Lets address LeakyReLU first, I do think this will be the least complex out of the rest of the confusing parts. ReLU stands for Rectified Linear units, and its really really simple. Whatever input is positive (say x> or =0) then it will be that x. So, if we were to use an if statement, it would (pseudocode style) be like if the x value inputed is greater than or equal to zero, the output is x. And if the input is less than zero it is zero. Techinically you could say if x is equal to zero the output would be zero, and mathematically 0 goes into the first rule. He is the equation for ReLU.

ReLU Equation

As you can see it takes the max value between 0 and x, and thats how the whole if x is blah than blah. Just think about it for a second and trust me it will make sense. So what is LeakyReLU? What I am familiar with is that it “leaks” just a little bit, meaning, I believe (I think, I haven’t actually read anything about it) that a certain random amount of the time it just makes something negative. Why is it beneficial? I don’t know yet. But let’s find out. Apparently, when you have the normal ReLU function there is a dead neuron problem, this makes sense, if your input values to the ReLU function are all negative, then you will output zero. LeakyReLU has a little leak value called alpha (a) and it basically just lets a little bit leak. When x is less than or equal to zero, x is multiplied by a and then that much of the negative is let through to the next layer. Does that make sense (I hope it does), but tell me if I am wrong. It will clear up when you look at the equation.

Equation for LeakReLU

You can see that the conditions (I didn’t put the image of the conditions for normal ReLU) but the conditions are very similar to the conditions for normal ReLU. But there is an alpha leak I like to call it. If you notice in the code there is no alpha value specified, I think the default is probably 0.0001 or something small. It needs to be small so the negative numbers don’t keep becoming really big and the model is less able to interpret them. This does help with the dead neuron problem, but I may also make the complexity of the model bigger and take more time to train because the negatives aren’t turning to zeros. I wonder now, when you do use normal ReLU, is that even the best choice, one you get rid of the negative values, you will only have positive values, so it will remain the same right? I don’t think so, but from what I am reading, it doesn’t directly change the weight, but it plays a role in the model learning. If the weights are good, probably meaning less negative, then the ReLU functions passes those on, and I think this works in tandem with the backpropagation thingy. Let me know if I am wrong, but lets move on.

Batch Normalization

Batch Normalization is what I though it was, it basically makes it faster. I will give a simpler example and then I will give the complex answer. If you have any experience with machine learning basics you will have most likely come across StandardScaler in python, the whole job of the standard scaling method is to make it so that the mean is 0and a standard deviation of 1. This is simple to think about, and hard to mathematically do, but for now we will leave the math alone, kind of. Batch normalization is the same thing. This helps for many many reasons. For example, the distribution of inputs can change, meaning, it will look weird if you were to plot it as like a historgram or something, but it would change. And this makes training confusing and complex. Batch normalization prevents that. It also makes it faster, and this makes sense, if you were to minimize numbers in the human brain of course it would be easier to calculate (for computers it’s smaller of course), but does that make sense? I think it does. It also helps with regularization which can be thought of as limiting the wildness of the models predictions (and then of course how far it is, I think).

The mean and variance of the whole dataset is calculated using the following equations, and, then, using the mean and variance you can normalize each individual value.

Mean

You take the sum of the x value at each index and divide them by the total m (for some reason its m not n) number of xs. This is just a fancy (very) fancy way of writing average

Standard Deviation

This is the average of the distance between the x and the average, you can see in place of x-sub-i you have x-sub-i — the_average_of_the_whole_batch (batch is the B). Makes sense? And then you have this equation that normalizes the individual x value using the mean and standard deviation.

Standardization (normalization) equation

You can see the mean and the square root of the standard deviation being used plus this e thingy. What is that you ask? It is a small constant to avoid dividing by zero, it smart when you have a low standard deviation, and you always need a positive denominator, what is this epsilon (that’s the greek letter) equal to? The original paper for batch normalization says epsilon is 10^-5 which is very small.

After you do this, there is a scaling and shifting done. This is to return it to the original distribution, making things have a mean of 0 and a standard deviation of 1 makes the distribution change drastically and to prevent the model from choking on the drastic changes you rescale it and shift it over a certain amount to return it to its originality.

It kind of look like a linear equation lol.

You can see gamma being multiplied to the x-hat which the new value and then there is beta going and adding some back to it. This stuff just gives the neural network more flexibility and makes it so that it can possibly (if needed) undo normalization. When it is learning, these gamma and betas are updated. You can think of them as weights for the specific task of normalization. Gamma scales it and beta shifts it. That was a longer explanation than I thought I would need to do, but now, let us move onto the Conv2DTranspose, it’s what I am most excited for.

Conv2DTranspose

A nice new term I have learned while looking up this Conv2DTranpose layer (I will be calling it conv2dlayer because the rest is to much to type), and the word is “upsampling”. I think of upsampling as sampling (extracting information) but in reverse, it doesn’t extract information by making the image smaller (this is what a normal conv2d layer does). I wouldn’t even say this extracts information, I would just say this thing just makes more information (and then of course it takes in more information). It spatially (meaning its dimensions) expands it and then convolutes over it with a learnable set of parameters (parameters that learn) so that it can generate more accurate images later on.

Stride is the increments for the filter to move over on. What I want you to imagine is a picture, a picture has pixels, and pixels are squares (obviously). This filter is square to, and what it does is it goes over each pixel (or you can even set a size for the filter so that it goes over more squares) and extracts information. In a normal conv2d layer, this information is what makes the next layers input and the next layer does the same thing. The next layer though is going over a blurrier image since the next layer has less pixels (since the filer extracts the mathematically “most important” information from the pixels it filtering over). Does that make sense?

Padding. I will try and explain this simply too. You can think of a pixel as a number and it starts somewhere right. Intuitively the filter will start on the pixel at the first index (i=0 because programming is like that). But what if you filter (which you can customarily set) is 2x2? You would need to add padding around the image so that you can start of the first pixel and the continue on the next pixels without interfereing. Padding adds like a border of zeros around the image so that you can take the first true pixel importance without interefering by having other ones. Does that make sense. Imagine a picture with a frame and the frame is made of zeros all around (I believe that padding= “same” means zeros all around). Then, imagine someone wanted to start at the first spot, but only take the first row (that’s what I was missing, if you want the first row, you need to add padding so the numbers inside of the filter fit the dimensions of the filter). Then the person with the square will have some safe space to operate. I think you get it now, if you don’t let me know by commenting!!

The stride increases the spatial dimensions, but the padding makes sure that the dimensions stay the same by adding extra zeros to the original picutre (in this case noise) so that everything is the same. That is why when you look at the code (when it is checking the shape using assert) you can see when the stride is 1 the image dimensions are staying the same (we are only changing the depth — for example 256 to 128 — by the specified depth of the layer amount). When the stride is two, we can see the dimensions changing, it goes from 7 to 14. The (5,5) is the filter size and thats what is going over the image.

I hop all of this makes sense, I didn’t do the best job explaining it, but I think you can kind of piece together what the code is doing, if you can’t comment, or email me. P.S The bias is the bias number, probably something really small.

Discriminator Model Time

Before we begin, this tensorflow tutorial shows an example of the noise from random numbers. I think, in real life, the noise is generated from a distribution of random numbers. Maybe a gaussian distribution, or a distribution where the mean is 0 and standard deviation is 1.

generator = generator_model()

noise = tf.random.normal([1,100])
generated_image = generator(noise, training = False)

plt.imshow(generated_image[0,:,:,0], cmap= 'gray')

Here we can see everything I have explained in action. The generator model is upsampling the noise, and this noise is coming from, guess what, a normal (guassian) distribution. For those of you live under a rock or think this is boring guassian is the bell curve. And then, training is false and you can see the output image being very very bad, but I believe if you train it, it will get better. But we need the discriminator to tell it if it is doing well.

Random Noise Image

Now lets code the feedback model.

def discriminator_model() :
model = tf.keras.Sequential()
model.add(layers.Conv2D(64, (5,5), strides=(2,2), padding='same', input_shape=[28, 28, 1]))

model.add(layers.LeakyReLU())
model.add(layers.Dropout(0.3))


model.add(layers.Conv2D(128, (5,5), strides=(2,2), padding = 'same'))
model.add(layers.LeakyReLU())
model.add(layers.Dropout(0.3))

model.add(layers.Flatten())
model.add(layers.Dense(1))

return model

The discriminator model is simply just a model that figures out between yes and no, so the output dimensions are obviously 1. The flatten layers makes it 1 dimension so theres no random errors in the code. The rest of the layers kind of do the opposite of the generator model. They (I’m assuming the word is:) down sample the image to see if it real or fake. We can even test this out by plugging the image into the discriminator model and seeing what it decides.

discriminator = discriminator_model()
decision = discriminator(generated_image)
print(decision)

It’s output: tf.Tensor([[-0.00393217]], shape=(1, 1), dtype=float32)

So you can see, it can’t really decide between fake or real, that means it’s not good enough and I think it is because it doesn’t have any real data. We need to put this in a training loop and then we can see how it performs when actually given the task to classify between real and fake. Before we do that we need to make loss functions and all that, plus the optimizers.

Loss Functions + Optimizers

We will first define out loss function, since we have a binary problem, it makes sense that out loss in BinaryCrossentropy. Here is the code

cross_entropy = tf.keras.losses.BinaryCrossentropy(from_logits=True)

def discriminator_loss(real_output, fake_output):
real_loss = cross_entropy(tf.ones_like(real_output), real_output)
fake_loss = cross_entropy(tf.zeros_like(fake_output), fake_output)
total_loss = real_loss+fake_loss

return total_loss

We encourage the real loss to be closer to one by making a full matrix (I don’t know what real_output actually is) of ones, the intended result of the generator and then for loss we encourage punishment by making a whole things of zeros, then we add it and thats the total loss.

Then the generator model loss:

def generator_loss(fake_output):
return cross_entropy(tf.ones_like(fake_output), fake_output)

We want the output of the fake to be classified as not fake so encourage the output to be closer to one by making a fake_output sized vector/matrix (again I don’t know). This is encouraging more deceptiveness. More true (ones) values means better image. Makes sense? Now we define out optimizers (Adam optimizer) and then we train it a loop.

generator_optimizer = tf.keras.optimizers.Adam(1e-4)
discriminator_optimizer = tf.keras.optimizers.Adam(1e-4)

Training

So we define out important training variables and then we have our training loop. The training loop is doing what any training loop would do just not normally (syntax) wise. We will be done after this. The end of this code doesn’t work for some reason, but I think that has something to with the image shape (thats what the error says) and the original code has checkpoints and I don’t need to or want to do it. I understand the complex and basic concept, we will done after this so we don’t make this too long.

@tf.function
def train_step(images):
noise = tf.random.normal([BATCH_SIZE, noise_dim])

with tf.GradientTape() as gen_tape, tf.GradientTape() as disc_tape:
generated_images = generator(noise, training=True)

real_output = discriminator(images, training=True)
fake_output = discriminator(generated_images, training=True)

gen_loss = generator_loss(fake_output)
disc_loss = discriminator_loss(real_output, fake_output)

gradients_of_generator = gen_tape.gradient(gen_loss, generator.trainable_variables)
gradients_of_discriminator = disc_tape.gradient(disc_loss, discriminator.trainable_variables)

generator_optimizer.apply_gradients(zip(gradients_of_generator, generator.trainable_variables))
discriminator_optimizer.apply_gradients(zip(gradients_of_discriminator, discriminator.trainable_variables))

This updates the loss function and also changes the gradients as we saw before. The whole gradient tape thing is the updating gradient (meaning weights and things) so that the model can learn better.

Then we just train (this is where the error comes in)

def train(dataset, epochs):
for epoch in range(epochs):
start = time.time()

for image_batch in dataset:
train_step(image_batch)


train(train_dataset, EPOCHS)

And in the notebook (which I followed pretty much to a tee) the genertor can generate numbers pretty well. And with that, this project is done. I don’t want to make this article to long so I will finish very fast.

Conclusion

GANs are fascinating. I learned a lot, and I now know the process for making a GAN. You make a tensorflow dataset with the actual images, then you use a generator model to make random noise images and then you use the random noise images to fight against the real images in a training loop. Inside of the loop you are calculating loss and updating the weights so that the generator can make better images.

My questions:

  • What do you need to do for more complex images?
  • Who had the time to discover the math behind this?
  • Can you do it with audio and other things?

My answer to the first question is that you would probably need to upsample even more and downsample even more (the dimensions would be a lot more) and the second question I don’t know, it was probably google and to the third question, yes, I think you can do this with any type of data, you break down audio using Conv1D so it would make sense that you would be able to do it. Anyways, thank you for reading, if anything is wrong, comment, please, and if you feel like it, share it, and follow me for more. Thanks!!!!

--

--