GAN (Python)

Aysen Çeliktaş
Become Better
Published in
8 min readMay 5, 2024

GAN (Generative Adversial Network) is increasing its popularity day by day. An overview of the use of this architectural structure in medical imaging, which is used in many applications such as data augmentation, image reconstruction and sound transfer, can be taken here. Also, read here for general concepts before diving into the deep learning pool.

[created by the author in Canva]

GAN is an artificial intelligence algorithm belonging to the unsupervised machine learning class. It was discovered in 2014 by Google researcher Ian Goodfellow. [1]. It is basically built on two structures called generator and discriminator.

· Generator: This is the part where synthetic data production takes place. Just like an artist, meaningful data is created from noise in line with what is desired.

· Discriminator: It evaluates the images coming from the manufacturer just like a critic and tells how close it is to what is desired. The closer the result comes to the desired one, the more successful the system is.

The system will produce a loss function according to the similarity ratio between the requested data and the data created by the generating network. Thus, with this function, the model will be fed back and try to get closer to the desired result. This process is updated for the discriminant network in each iteration to reduce the value it gives to the real picture to 1 and the value it gives to the fake picture to 0. When the probability of fake images being real in the discrimination network approaches 1, the error of the generating network will approach 0. The closer to reality the result from the generative network is, the lower the discriminator’s ability to distinguish fake and real images will be.

[created by the author in Canva]

One of the concepts that need to be considered technically here is the tensor. You should know that in order to perform operations between layers (dense) in deep learning, you need to express the data as a tensor. Chollet explains very well the concepts that need to be known at the beginner level in his book “Deep Learning with Python” [2]. Here, noise is removed in the layers like a filter, and finally the desired output is tried to be made visible, and various methods are used in each filtering process. Tensors, on the other hand, are expressions of data suitable for deep learning. It can also be thought of as a kind of data carrier. To examine tensors according to their dimensions:

· scalar value = 0D tensor

· number arrays = 1D tensor

· vector arrays = 2D tensor

· matrix arrays = 3D tensor

If you want to examine the basic properties of a tensor, you can check the degree of the tensor (i.e. the number of axes) with “ndim”, the shape of the tensor like its 3-dimensional properties (x,y,z) with “shape”, and the data type such as uint8 or float32 with “dtype”.

When running deep learning in the Python environment, the Tensorflow library, one of Google’s basic libraries, can be used. Tensorflow uses color depth last in dimensions. To give an example, let’s assume that there is 128 grayscale data of size 256*256 as image data. Since the color range of this data is grayscale, the color depth is 1. If the image is considered as RGB, then the color image should be studied using 3 color depths. Now, if we express the shape of the grayscale image using these (samples, height, width, channels), it will appear as (128, 256, 256, 1). [2].

[created by the author in Canva]

So how would tensors work if it was video data? If the video is considered as a series of numerically expressed images and each image is evaluated as a square, we will encounter a 5-dimensional tensor structure. The way this will be expressed will be (samples, square, height, width, channels).

In the rest of the article, DCGAN (Deep Convolutional Generative Adversial Network) application in Python environment will be seen. In addition, if you want to investigate generative networks, you should not only look at GANs, but also look at various generative networks such as VAE (Variational AutoEncoder). [2, p. 319]

Python

As we always emphasize, we first need to understand the data and the problem in order to approach the solution of the problem correctly. Here we will use the fashion_MNIST dataset. This data set contains a total of 60000 images for training and 10000 test groups, with dimensions of 28*28 and 784 pixels. It contains 10 classes, respectively: t-shirt, trouser, pullover, dress, coat, sandal, shirt, sneaker, bag, ankle boot. The pixel values of each of these images range from 0 to 255. By using the pandas library, pixel values of the data can be viewed on a class basis. The .csv file can be accessed here. To do this, after the data was read, the head was looked at and the differences in the distributions in the histogram chart were observed on a single image.

train_data = pd.read_csv('/content/drive/MyDrive/GAN/fashion-mnist_train.csv')
train_data.head()
[from author’s notebook]
image_data = train_data.iloc[1]

if image_data is not None:
data_array = image_data.to_numpy()
plt.hist(data_array)
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.title("Histogram of Row Data")
plt.show()
else:
print("Error: Row not found.")
[from author’s notebook]

Keras was used to import the data set. Since these values are in the range of (0, 255), the normalization of the values was reduced to the range of (-1,1) by taking this range into consideration. The number of samples of the data class used, the height*width measurement, and the shape of the data were examined through ‘shape’.

(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()
print(x_train.max(),x_train.min())
[from author’s notebook]
x_train = (x_train.astype(np.float32)-127.5)/127.5 #0-255 arası
print(x_train.max(),x_train.min())
[from author’s notebook]
print("Eğitim Verisi Şekli:", x_train.shape, "Test Verisi Şekli:", x_test.shape)
[from author’s notebook]
plt.figure(figsize=(8,8))
for i in range(25):
plt.subplot(_,_,i+1)
plt.xticks([])
plt.yticks([])
plt.grid(False)
plt.imshow(x_train[i], cmap=plt.cm.binary)

plt.show()
[from author’s notebook]

The shape of our data is (60000, 28, 28) for the training data and (10000, 28, 28) for the test data. At the input layer, where we start deep learning, we need to flatten the data and insert it into the architecture. This process is done to ensure the dimensional compatibility needed for the architecture and to make the data easier to process, regardless of whether the data is a 2D, 3D or 4D tensor. While reducing the size of the data due to this process, some information loss may be caused, but this may also be functional in terms of allowing the model to generalize more easily.

x_train = x_train.reshape(x_train.shape[0], x_train.shape[1]*x_train.shape[2])
print(x_train.shape)
[from author’s notebook]

Here, an architecture was simply created. For a better result, layers can be deepened and updated with different functions. Let’s create a GAN architecture by defining the generator and discriminator respectively.

· Using tanh instead of sigmoid in the final activation of the generator network is seen as a clue in GANs. This will keep the output at values in the (-1,1) range instead of the (0,1) range. Just as a more symmetrical output is produced, the training process can also be made more reliable.

· Gaussian distribution can be used when selecting samples in the hidden space. To achieve this, noise can be added to the input of the generating network.

· LeakyReLU can be used instead of ReLU in layers as an activation function. This provides a small slope for negative inputs. It reduces the danger of gradients (slopes) becoming smaller or disappearing as you move towards deeper layers. By expanding the activation, flexibility is added to the model.

def create_generator():

generator = Sequential()
generator.add(Dense(units = 512, input_dim = 100))
generator.add(ReLU())

generator.add(Dense(units = 512))
generator.add(ReLU())

generator.add(Dense(units = 1024))
generator.add(ReLU())

generator.add(Dense(units = 784, activation = "tanh"))

generator.compile(loss = "binary_crossentropy",
optimizer = Adam(learning_rate=0.0001, beta_1=0.5))

return generator

g = create_generator()
g.summary()
[from author’s notebook]
def create_discriminator():

discriminator = Sequential()
discriminator.add(Dense(units=1024, input_dim=784))
discriminator.add(ReLU())
discriminator.add(Dropout(0.4))

discriminator.add(Dense(units=512))
discriminator.add(ReLU())
discriminator.add(Dropout(0.4))

discriminator.add(Dense(units=256))
discriminator.add(ReLU())

discriminator.add(Dense(units=1, activation="sigmoid"))

discriminator.compile(loss="binary_crossentropy",
optimizer=Adam(learning_rate=0.0001, beta_1=0.5))

return discriminator

d = create_discriminator()
d.summary()
[from author’s notebook]
def create_gan(discriminator, generator):

discriminator.trainable=False
gan_input= Input(shape=(100,))

x= generator(gan_input)
gan_output=discriminator(x)

gan=Model(inputs=gan_input, outputs=gan_output)
gan.compile(loss="binary_crossentropy", optimizer="Adam")

return gan

gan = create_gan(d,g)
gan.summary()
[from author’s notebook]

In superimposed layers, back propagation of gradient propagation is the key issue.

· More convenient activation functions for layers

· More convenient initialization methods

· Better optimization methods like RMSProp and Adam

for e in range(epochs):
for _ in range(batch_size):

noise = np.random.normal(0,1, [batch_size,100])

generated_images = g.predict(noise)

image_batch=x_train[np.random.randint(low=0, high=x_train.shape[0], size=batch_size)]

x=np.concatenate([image_batch, generated_images])

y_dis = np.zeros(batch_size*2)
y_dis[:batch_size]=1

d.trainable=True
d.train_on_batch(x,y_dis)

noise = np.random.normal(0,1,[batch_size,100])

y_gen= np.ones(batch_size)

d.trainable=False


gan.train_on_batch(noise, y_gen)

print("epochs:", e)

Now, after saving our model to our drive path, the results can be viewed. After this simple application, you can get more accurate results by developing your own architecture. As you deepen your layers, you can increase your epoch count and retrain your model by changing the functions you apply.

_.save('/content/drive/MyDrive/...')
fig,axe=plt.subplots(_,_)
fig.suptitle("Actual Images")
idx = 0
for i in range(_):
for j in range(_):
axe[i,j].imshow(x_train[idx].reshape(28,28),cmap='gray')
idx+=10
[from author’s notebook]
plt.imshow(noise, cmap='gray')
plt.title('How the noise looks')
[from author’s notebook]
fig,axe=plt.subplots(_,_)
fig.suptitle('Generated Images from Noise using DCGANs')
idx=0
for i in range(_):
for j in range(_):
axe[i,j].imshow(generated_images[idx].reshape(28,28),cmap='gray')
idx+=3
[from author’s notebook]

References

[1] Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, et al. “Generative adversarial nets.” Advances in neural information processing systems 27 (2014).

[2] François Chollet, (2021, 2nd Edition). Deep Learning with Python. Buzdağı Publisher.

--

--