Train a NN using Keras to fit the Predator-Prey cycle using GAN architecture.

Sahar Millis
9 min readOct 20, 2020

--

Train a NN to fit the Predator-Prey cycle dataset using GAN architecture (discriminator & generator), and I’ll use the GPU for that.

Let’s make it quick and to the point. You can find all the core code here and the final code here (with a lot of improvements).

GAN Architecture

A generative adversarial network is a class of machine learning frameworks designed by Ian Goodfellow and his colleagues in 2014. Two neural networks contest with each other in a game. Given a training set, this technique learns to generate new data with the same statistics as the training set.

Imports

import os
import random

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from tqdm.notebook import tqdm # from tqdm import tqdm_notebook as tqdm

import tensorflow as tf
from tensorflow.keras.models import Model,Sequential
from tensorflow.keras.backend import sin,mean
from tensorflow.keras.layers import Input, Reshape, LeakyReLU, Activation, Dropout, Flatten
from tensorflow.keras.layers import Dense, BatchNormalization, Conv1D

from tensorflow.keras.optimizers import Adam, SGD, RMSprop
from tensorflow.keras.callbacks import TensorBoard

Hyper-Parameters

b = 1
h = 0.005
ϵ = 0.8
d = 0.6

steps = 50000
XY = np.empty((2, steps))
XY[:,0] = 50, 100
dt = 0.001

Dataset (you can skip)

This time I’ll create the cycle dataset using my own generative functions that simulate the P-P cycle in nature.

def get_rates(x, y, b, h, ϵ, d): ###
return np.array([
b*x, # prey born
(1-ϵ)*h*x*y, # prey killed, no predator born
ϵ*h*x*y, # prey killed, predator born
d*y, # predator killed
])



def draw_time(rates): ###
# assert rates.sum() > 0, rates
return np.random.exponential( 1/rates.sum() )

def draw_reaction(rates): ###
rates /= rates.sum()
return np.random.multinomial(1, rates).argmax()

updates = np.array([
[ 1, 0], # prey born
[-1, 0], # prey killed, no predator born
[-1, 1], # prey killed, predator born
[ 0,-1], # predator killed
])


def gillespie_step(x, y, b, h, ϵ, d): ###
rates = get_rates(x, y, b, h, ϵ, d)
Δt = draw_time(rates)
ri = draw_reaction(rates)
Δx, Δy = updates[ri]
return Δt, Δx, Δy


def gillespie_ssa(b, h, ϵ, d, times_,t0=0, x0=50, y0=100, t_steps=steps, tmax=steps*dt,padding=False): ###
xx = np.full(steps,0,dtype='int')
yy = np.full(steps,0,dtype='int')

t = 0
i = 0
xx[0] = x0
yy[0] = y0

while t<tmax and yy[i]!=0:

if times_[i] <= t:
i += 1
xx[i] = xx[i-1]
yy[i] = yy[i-1]
else:
# update
Δt, Δx, Δy = gillespie_step(xx[i], yy[i], b, h, ϵ, d)

t = t+Δt
xx[i] = xx[i]+Δx
yy[i] = yy[i]+Δy

if i <= steps and not padding:
return np.array([times_[:i+1],xx[:i+1],yy[:i+1]])
else:
return np.array([times_,xx,yy])

times = np.linspace(0, steps*dt, steps)
t,x,y = gillespie_ssa(b, h, ϵ, d,times_=times)


x_max,y_max = x.max(),y.max()
x_,y_ = x / x_max, y / y_max

# Sample function
def sample_data_pp(num=1000):
indices = np.sort(np.random.choice(len(x_),num,replace=False))
return np.array([np.array(x_[indices]),np.array(y_[indices])])

Let’s check the sampling function for both Predator and Prey:

ax = pd.DataFrame(np.transpose(sample_data_pp())).plot()

Let me focus only on the Prey cycle:

t = sample_data_pp(num=1000)
plt.plot(np.transpose(t[0]));

Now you can start with the actual GANs code…

Generator

def create_G(num=100):
G_in = Input(shape=15)
x = Reshape((-1,1))(G_in)
x = Conv1D(2,3,activation=LeakyReLU())(x)
x = BatchNormalization()(x)
x = Dropout(0.5)(x)
x = Conv1D(4,3,activation=LeakyReLU())(x)
x = BatchNormalization()(x)
x = Dropout(0.5)(x)
x = Conv1D(8,3,activation=LeakyReLU())(x)
x = BatchNormalization()(x)
x = Dropout(0.5)(x)
x = Conv1D(16,3,activation=LeakyReLU())(x)
x = BatchNormalization()(x)
x = Dropout(0.5)(x)
x = Conv1D(32,3,activation=LeakyReLU())(x)
x = BatchNormalization()(x)
x = Dropout(0.5)(x)
x = Conv1D(64,3,activation=LeakyReLU())(x)
x = BatchNormalization()(x)
x = Dropout(0.5)(x)
x = Conv1D(100,3,activation='tanh')(x)
x = Flatten()(x)
G = Model(G_in,x,name='Generator')
G.compile(loss='binary_crossentropy',optimizer=Adam(learning_rate=0.001))
return G

G = create_G()
G.summary()

Model: "Generator"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 15)] 0
_________________________________________________________________
reshape (Reshape) (None, 15, 1) 0
_________________________________________________________________
conv1d (Conv1D) (None, 13, 2) 8
_________________________________________________________________
batch_normalization (BatchNo (None, 13, 2) 8
_________________________________________________________________
dropout (Dropout) (None, 13, 2) 0
_________________________________________________________________
conv1d_1 (Conv1D) (None, 11, 4) 28
_________________________________________________________________
batch_normalization_1 (Batch (None, 11, 4) 16
_________________________________________________________________
dropout_1 (Dropout) (None, 11, 4) 0
_________________________________________________________________
conv1d_2 (Conv1D) (None, 9, 8) 104
_________________________________________________________________
batch_normalization_2 (Batch (None, 9, 8) 32
_________________________________________________________________
dropout_2 (Dropout) (None, 9, 8) 0
_________________________________________________________________
conv1d_3 (Conv1D) (None, 7, 16) 400
_________________________________________________________________
batch_normalization_3 (Batch (None, 7, 16) 64
_________________________________________________________________
dropout_3 (Dropout) (None, 7, 16) 0
_________________________________________________________________
conv1d_4 (Conv1D) (None, 5, 32) 1568
_________________________________________________________________
batch_normalization_4 (Batch (None, 5, 32) 128
_________________________________________________________________
dropout_4 (Dropout) (None, 5, 32) 0
_________________________________________________________________
conv1d_5 (Conv1D) (None, 3, 64) 6208
_________________________________________________________________
batch_normalization_5 (Batch (None, 3, 64) 256
_________________________________________________________________
dropout_5 (Dropout) (None, 3, 64) 0
_________________________________________________________________
conv1d_6 (Conv1D) (None, 1, 100) 19300
_________________________________________________________________
flatten (Flatten) (None, 100) 0
=================================================================
Total params: 28,120
Trainable params: 27,868
Non-trainable params: 252

I like to add the InputLayer so that I could just to make everything is ok until this point.

Discriminator

def create_D(num=100):
D_in = Input(shape=num)
x = Reshape((-1,1))(D_in)
x = Conv1D(num/2,3,activation=LeakyReLU())(x)
x = Conv1D(num/10,3,activation=LeakyReLU())(x)
x = Conv1D(num/50,3,activation=LeakyReLU())(x)
x = Flatten()(x)
x = Dense(128,activation=LeakyReLU())(x)
x = Dropout(0.5)(x)
x = Dense(64,activation=LeakyReLU())(x)
x = Dropout(0.5)(x)
x = Dense(32,activation=LeakyReLU())(x)
x = Dropout(0.5)(x)
x = Dense(8,activation=LeakyReLU())(x)
x = Dense(2,activation='sigmoid')(x)
D = Model(D_in,x,name='Discriminator')
D.compile(loss='binary_crossentropy',optimizer=RMSprop(lr=0.003))
return D

D = create_D()
D.summary()

Model: "Discriminator"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) [(None, 100)] 0
_________________________________________________________________
reshape_1 (Reshape) (None, 100, 1) 0
_________________________________________________________________
conv1d_7 (Conv1D) (None, 98, 50) 200
_________________________________________________________________
conv1d_8 (Conv1D) (None, 96, 10) 1510
_________________________________________________________________
conv1d_9 (Conv1D) (None, 94, 2) 62
_________________________________________________________________
flatten_1 (Flatten) (None, 188) 0
_________________________________________________________________
dense (Dense) (None, 128) 24192
_________________________________________________________________
dropout_6 (Dropout) (None, 128) 0
_________________________________________________________________
dense_1 (Dense) (None, 64) 8256
_________________________________________________________________
dropout_7 (Dropout) (None, 64) 0
_________________________________________________________________
dense_2 (Dense) (None, 32) 2080
_________________________________________________________________
dropout_8 (Dropout) (None, 32) 0
_________________________________________________________________
dense_3 (Dense) (None, 8) 264
_________________________________________________________________
dense_4 (Dense) (None, 2) 18
=================================================================
Total params: 36,582
Trainable params: 36,582
Non-trainable params: 0

Same here … Just a bit less complex.

GAN

Here you can see an example how Keras and Torch are so different. While it’s the same concept, It feels like a different architecture almost completely.

def create_GAN(G,D,num=100):
D.trainable = False
GAN_in = Input(shape=15)
x = G(GAN_in)
x = D(x)
model = Model(GAN_in,x,name='GAN')
model.compile(loss='binary_crossentropy', optimizer=G.optimizer)
return model

GAN = create_GAN(G,D)
GAN.summary()

Model: "GAN"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_3 (InputLayer) [(None, 15)] 0
_________________________________________________________________
Generator (Functional) (None, 100) 28120
_________________________________________________________________
Discriminator (Functional) (None, 2) 36582
=================================================================
Total params: 64,702
Trainable params: 27,868
Non-trainable params: 36,834

Sanity Check:

Due to the complex of the model let me make sure all is working as expected:

data_temp = sample_data_pp(num=15)
print(data_temp.shape,G(data_temp).shape,D(G(data_temp)).shape,GAN(data_temp).shape)

(2, 15) (2, 100) (2, 2) (2, 2)

Training

noise_size = 15

# set layers on/off train
def trainable_D(D,trainable):
D.trainable = trainable
for layer in D.layers:
layer.trainable = trainable


def train_D(GAN,G,D,on_pray=True):
# samples size (batch)
batch_size = 15

# set_trainability
trainable_D(D,True)

# create noise
noise = np.abs(np.random.randn(batch_size,noise_size))

# fake
x_fake = G.predict(noise)
y_fake = np.zeros((batch_size,2)) # 0.1 instead of 0 - to evoid mode collapse
y_fake[:,0] = 1 # 0.9 instead of 1 - to evoid mode collapse
loss_fake = D.train_on_batch(x_fake,y_fake)

# real - PREY
if on_pray:
x_real = np.array([sample_data_pp(num=100)[0] for i in range(batch_size)])
else:
x_real = np.array([sample_data_pp(num=100)[1] for i in range(batch_size)])

y_real = np.zeros((batch_size,2)) # 0.1 instead of 0 - to evoid mode collapse
y_real[:,1] = 1 # 0.9 instead of 1 - to evoid mode collapse
loss_real = D.train_on_batch(x_real,y_real)

return loss_fake,loss_real



def train_G(GAN,G,D):

# samples size (batch)
batch_size = 15

# set_trainability
G.trainable = True
trainable_D(D,False)

# create noise
noise = np.abs(np.random.randn(batch_size,noise_size))

# fake
# x_fake = G.predict(noise)
y_fake = np.zeros((batch_size,2))
y_fake[:,1] = 1
loss = GAN.train_on_batch(noise,y_fake)

return loss

Another Sanity Check:

train_D(GAN,G,D), train_G(GAN,G,D)

((0.6930726766586304, 0.7439238429069519), 0.6872183680534363)

Ok, it works :))

Training GAN

def train(GAN, G, D, epochs=100, n_samples=100, verbose=True):
d_loss_fake = []
d_loss_real = []
g_loss = []
e_range = range(epochs)

if verbose:
e_range = tqdm(e_range)

for epoch in e_range:
loss_fake,loss_real = train_D(GAN,G,D)
d_loss_fake.append(loss_fake)
d_loss_real.append(loss_real)

loss = train_G(GAN,G,D)
g_loss.append(loss)

if verbose and ((epoch+1)%(epochs//10)==0):
print("Epoch #{}:\t Generative Loss: {}, \t Discriminative Loss fake: {}, Discriminative Loss real: {}".format(epoch+1, g_loss[-1], d_loss_fake[-1], d_loss_real[-1]))
return g_loss,d_loss_fake,d_loss_real

g_loss,d_loss_fake,d_loss_real = train(GAN, G, D, verbose=True)

HBox(children=(FloatProgress(value=0.0), HTML(value='')))

Epoch #10: Generative Loss: 2.36905574798584, Discriminative Loss fake: 0.015760576352477074, Discriminative Loss real: 0.0186001006513834
Epoch #20: Generative Loss: 13.932963371276855, Discriminative Loss fake: 0.003945493139326572, Discriminative Loss real: 1.4132818250800483e-05
Epoch #30: Generative Loss: 20.495332717895508, Discriminative Loss fake: 0.004247874021530151, Discriminative Loss real: 0.00016064960800576955
Epoch #40: Generative Loss: 18.873523712158203, Discriminative Loss fake: 0.0483543686568737, Discriminative Loss real: 0.001071239123120904
Epoch #50: Generative Loss: 16.03032875061035, Discriminative Loss fake: 0.017826693132519722, Discriminative Loss real: 0.08348666876554489
Epoch #60: Generative Loss: 6.583922386169434, Discriminative Loss fake: 0.04926323890686035, Discriminative Loss real: 0.15698513388633728
Epoch #70: Generative Loss: 3.1525802612304688, Discriminative Loss fake: 0.008090116083621979, Discriminative Loss real: 0.6023349165916443
Epoch #80: Generative Loss: 0.24656127393245697, Discriminative Loss fake: 0.010115372948348522, Discriminative Loss real: 7.273652590811253e-05
Epoch #90: Generative Loss: 2.347670555114746, Discriminative Loss fake: 0.004298840183764696, Discriminative Loss real: 0.000157357266289182
Epoch #100: Generative Loss: 1.5693128108978271, Discriminative Loss fake: 0.0020043672993779182, Discriminative Loss real: 0.10965212434530258

Here a result of a Google Colab Notebook with 50K epochs.

Generate

Now you can use the generator model to generate more data.

batch_size = 1

# create noise
noise = np.abs(np.random.randn(batch_size,noise_size))

# fake
x_fake = G.predict(noise)
y_fake = D.predict(x_fake)

# real
x_real = np.array([sample_data_pp(num=100)[0] for i in range(batch_size)])
y_real = D.predict(x_real)

plt.figure(figsize=(10,5))
plt.plot(np.transpose(x_fake*x_max),label='Fake')
plt.plot(np.transpose(x_real*x_max),label='Real')
plt.legend();
plt.show();

Still has some work, but it’s a pretty good fit :)

Further Work:

You should get femiliar with these techniques, that would come handy while training GANs:

  • Feature matching
    Feature matching suggests to optimize the discriminator to inspect whether the generator’s output matches expected statistics of the real samples.
    In such a scenario, the new loss function can be any computation of statistics of features.
    For example, mean or median.
  • Historical Averaging
    This addition piece penalizes the training speed when Θ is changing too dramatically in time.
    It’s basicly an moving avarage on the i & i-1 changin of the loss, with L2 norm. And uses as a regulator on the loss of the generator.
    This Will smooth out the noise and reduce the generator’s variance.
  • Batch Discrimination
    With minibatch discrimination, the discriminator is able to digest the relationship between training data points in one batch, instead of processing each point independently.
    In one minibatch, we approximate the closeness between every pair of samples, and get the overall summary of one data point by summing up how close it is to other samples in the same batch.
  • Adding Noises
    To “spread out” the distribution and to create higher chances for two probability distributions to have overlaps.
    For example, One solution is to add continuous noises to the real examples of the discriminator D.
  • G’s L1
    Use L1 Normalization on G’s weights.
  • Gradients Computetion Ratio
    Compute the gradients of G & D with a different ratio.
    For example, on every train of G — train the D for 5 times.
  • Smoothing Target
    When feeding the discriminator, instead of providing 1 and 0 labels — use soften values — Such as 0.9 and 0.1.
    It is shown to reduce the networks’ vulnerability.
  • Label Presentation
    Train the model on a different label’s presentations.
    For example, in this case — fourier transform.
  • Network Params
    Play with all the params of the networks.
    for example, LR can have a huge affect and the “golden rule” G:0.0001 D:0.0004 is just a starting point.
  • Skip-z Connection
    Feed the noise vector to additinal layers not just the first one.
    Very similar to attention in nlp.
  • Better Metric of Distribution Similarity
    The loss function of the vanilla GAN fails to provide a meaningful value when two distributions are disjoint.
    For example, instead of BCE loss use Wasserstein loss.

Closing Thoughts

This implementation was about implement GAN in Keras while using a more complex D & G architectures.

If you wish, you could try a couple of quick things to check if it will preform better:

  • Better Metric of Distribution Similarity — Wasserstein Loss
  • Clipping values
  • Gradients Computation Ratio — 1:5
  • Tried Smoothing Target technique — 0.9 & 0.1 instead of 1 & 0

HAVE FUN … :))

--

--