Quick Reference for Neural Networks & Keras

5 min readMay 14, 2019

This post is aimed for conceptual reviews of Neural Networks and Keras. To learn more about the setup of Nerual Networks and Deep Learning, Neural Network Playground provides a good visualization on classification and regression with parameters settings. Below summaries are credited to J Beightol.

Differences between single layer and multi-layer perceptrons

Single Layer’s have no hidden layer, just input features and a single activation function with an output which can be interpreted as the prediction. Just like the way most models we have talked about work.
Multi Layer’s have 1 or more hidden layers where data is transformed and new interaction terms are created to be used to predict the output or feed into additional layers.

Define epoch, learning rate, activation function, hidden layer, neuron, weights & bias terms.

Epoch : When our full set of training data fully forward and back propagates through our network. 1. Feed the data through the network. 2. Calculate the errors. 3. Propagate backwards to adjust network to better fit the data via found errors.
Hidden Layer : one or more neurons that sit in-between our input layer (original features) and our output layer (predictive function) that are responsible for transforming the data.
Weights : Values applied to features as they flow through the network to contribute to interaction. Weights are randomly initialized and adjusted according to gradient decent on our loss function. The weights function similar to coefficients in a linear or logistic regression in which the weights multiply the original value in order to affect how much each feature contributes to the interaction term that results from the activation function.
Bias Terms : Constant values that are added in to each neuron. These are also adjusted the same way the weights are, however they exist on their own and do not weight any other value.

output = activation function {sum (inputs *weights * ) + bias} — -> y pred = reLu(Xs with matrix multiply W + b)

Activation Function: A function on the neuron that takes in the sum of all features after their weights have been applied plus the Bias term and returns a new value that will be used as a feature in the next layer. Note: softmax will be used for multi-classification problem.

Neuron : Neurons are the weights, bias term and activation function. NNs can have various hidden layers and each hidden layer can have various neurons. Each neuron has their own unique set of weights and bias term and potentially unique activation function in order to arrive at a single value that represents some kind of combination of all the fed in features.
Learning Rate : How large the steps are that we take when moving through our loss function in order to increase/decrease weights in order to decrease total loss. (Same as gradient Descent)
Batch : Often we can not forward then back propagate our entire data set at once so we divide it up into batches to flow through chunk by chunk
Iteration :The process of a single batch flowing forward and backward through the network.

from keras.models import Sequential
from keras import layers# Number of features
input_dim = X_train.shape[1]model = Sequential()#hidden layers
model.add(layers.Dense(5300, input_dim=input_dim,activation='relu')) model.add(layers.Dense(100, input_dim=input_dim, activation='relu'))
model.add(layers.Dense(30, input_dim=input_dim, activation='relu'))#output layer  
#Dense should be 1 for the binary output
model.add(layers.Dense(1, activation='sigmoid'))

Define Forward and Back Propagation.

Forward propagation is the process of data flowing through the network in order to arrive at a predicted value.
Backward Propagation is the act of optimizing a network be working backwards through it, starting with the errors calculate and identifying the points in the network (weights or nodes) that are contributing most to the error and adjusting them accordingly.

Explain how L1/L2, dropout, and early stopping regularization work and implement these methods in Keras

Regularization

Incorporating either LASSO or Ridge regularization into back propagation is as simple as adding the regularization term to the loss function
L1 typically isn’t used in Neural Networks.
L2 is much more popular and sometimes referred to as “weight decay”.

Dropout

The process of randomly removing nodes in training epochs to see how it affects the model. A Neural Network with densely connected layers will have the tendency to overfit so we can utilize drop out in order better understand if nodes should be removed or kept. (Similar to how Random Forests reduce variance and overfitting.)
Dropouts can occur at any epoch and the models “constant fear” of loosing a node at any time prevents the over adjustment of a weight for an epoch.

Early Stopping

Compares how much the loss function is changes and stops the algorithm once it hits the point where the loss function starts to go up. This is done with the assumption that the first minimum that was hit it the global minimum.

Quick Reference for Neural Networks & Keras

Differences between single layer and multi-layer perceptrons

Define epoch, learning rate, activation function, hidden layer, neuron, weights & bias terms.

Define Forward and Back Propagation.

Explain how L1/L2, dropout, and early stopping regularization work and implement these methods in Keras

Written by Bill Yu