Abstraction, rules and beauty in the code to build a model.

My first steps into the world of A.I.

Part 3: What is a model?

Medmain
7 min readOct 10, 2018

Intro

In the last part, I gained some basic insight into the vocabulary used in Machine Learning, and also studied a little about the “Training” process. I learned that “Neural Networks” represent a structured part of Deep Learning, and it revolved around the idea of stacking neural layers together to perform filtering and processing on sets of data, multiple times.

Diagram from Quora

I also talked about Keras, a high-level API which is useful for constructing models for Deep Learning.

Keras believes that User-friendliness and Modularity are what makes a good Deep Learning library, it allows you to experiment and prototype with ideas conveniently.

Because Keras is so easy to use, it’s backed by many big companies such as Google, Microsoft, Nvidia, and Amazon. This also means that it is automatically packed inside Tensorflow and comes with it.

Image result for keras
Logo from Official Keras Documentation

So, I think it is a good time for me to make a model.

But how should I start? Learning more about models, obviously!

Background

Models describe the ways which neural layers are organized, meaning that different orders of the same set of neural layers are considered a different model, and they can probably produce different results too!

Although sometimes the term Model can be used interchangeably with Neural Network, I prefer Neural Network since the concept received inspiration from Neuroscience and function somewhat similar to a human brain when signals are passed. However, when talking about specific details in this Neural network, it makes more sense for me to imagine it as a model with an organized structure, as it is better translated as an algorithm to a programmer.

Image from Listen Notes

Each neural layer is made up of nodes, which are also sometimes called neurons or perceptrons similar to their counterparts in neuroscience.

The role of each node is to simply consume some weighted value as input and then pass it to a transfer function or activation function. The node uses the weighted input to produce an output based on the function’s decision, which is then passed to another node in the next layer.

Definitions

Inside a complete model of a general Neural Network, there are three types of layers: Input Layer, Hidden Layer, and Output Layer.

Diagram from Quora

An Input Layer is where the model receives raw input, as suggested by the name, and pass that input to the next layers.

Hidden Layers are the parts where the input is processed and then passed to the next layer.

Output Layers produce the processed data as a variable, and that is used as the result.

There are many properties used to describe the structure of a model:

Size is the total number of nodes in the whole model.

Width and Height is the dimension of nodes in a specific layer.

Depth is the number of layers in the model.

Diagram from Stanford Course CS231n

Simple, right?

Aside from these basic terms, there are also some very loosely defined terms that describe other states of a model.

Capacity refers to the type or structure of functions that this model configuration can learn. This term is derived from the formal term “Representational capacity”. It means that a model with higher capacity will be able to learn more relationships that are more complex in nature, while a low capacity model has a limited understanding of relationships. Currently, there is no unit to measure capacity yet. In summary, it describes the limitations and boundary of the model itself.

Architecture is the specific arrangement of the layers and nodes in the network, such as the relationship of one node’s position in relation to another node. Although the term is not often used, it is a good way to describe alternate versions of one general model type. For example, the LSTM model has many variations and they are all different takes of the same basic architecture.

Okay, maybe not that simple.

In practice, it is much less about the vocabulary but more about the data. It’s particularly helpful for a strong visual learner like me to imagine the movement of the information, how it flows, transforms and computes.

Flow chart obtained from Gogul Ilango

This is a flow that can be seen as a procedure with steps. Of these , five parts are more central than the rest. These steps form the core of the process.

Define Network

There are two available models in Keras, the Sequential and Model functional API.

model = Sequential()

Or

model = Model(inputs=inputs, outputs=predictions)

While the Sequential is a better choice for most densely connected network, keras allows you to create a more complex and customized model such as multi-output models with their functional API.

As part of the network definition, there is also a need to define the input, hidden, and output layers. An example may look like :

model.add(layers.Dense(1, activation = “sigmoid”))

The first layer (input layer) of a model should always contain information about the dimension and shape of the input. This is also sometimes used to define the width of a layer. In the above example, we added a dense layer with input shape of 1 using sigmoid activation function.

At the end of this stage, use :

model.summary()

to review the architecture of the model.

Compile Network

This stage chooses the optimizer algorithm, loss calculation method, and the metric to optimize. It can be done with one function in keras:

model.compile(

optimizer = “adam”,

loss = “binary_crossentropy”,

metrics = [“accuracy”])

This compiles the model with the Adam optimizer, uses binary cross-entropy to calculate loss and maximizes the accuracy of the model.

Fit Network

In this stage, all the previously defined parameters are used and actually run on the model; It looks through all the input data and begins making adjustments based on previously defined algorithms.

Similar to compiling, this can also be done in one function. The parameters to specify are the training data and labels, number of epochs, the size of a batch for each epoch. Optionally you can also specify a validation set.

model.fit(

train_x, train_y,

epochs= 2,

batch_size = 500,

validation_data = (test_x, test_y))

Commonly, this fit operation is stored in a variable since it returns a History object. This object contains a History.history attribute which may be used to check, display and analyze the fitting process.

Evaluate Network

This works on the assumption that there is a set of test data which has labels so we are able to see how well the model performs on new data.

Again, keras makes evaluation a very easy task with just one function.

model.evaluate(

test_x, test_y,

Batch_size = 32)

In most cases the result should be the same as predicting if the model is trained correctly, this is one stage of the process where there is potential for debugging.

Make Predictions (using the Network)

The model is fed unlabeled test data and will produce a result based on everything it has learned. This is the last step in the procedures and can also be done with a single function. The function takes in an input data x and produces a possible value y.

y = predict(x, batch_size=None, verbose=0, steps=None)

There is really no way to tell if this value is correct, since it already is a prediction. Much like not knowing how long I’ll live, the only way to see the true value is to wait for the prediction to succeed or fail. Unlike evaluation, this stage will not tell you how accurate your result is, since none of your test data are labeled. Perhaps when the prediction fails you will be able to measure how close it was to success, by our own definition of success.

“This is the point of no return, that dramatic moment where we make the moral decision of whether to trust the machine based all the numbers it crunched, or we make our own best guesses. “ ~Inspired by This Post

But thats another topic all in itself, considering the ethical outcome was never the programmer’s job anyways.

What really matters here is the fact that this returns a numpy array of predictions, or a list of numpy arrays if the model as multiple inputs.

I feel like shortening this into an acronym will help me remember it a little better, so let’s go with combining the first letters of each step.

By personal preference, I propose that this flow be named DeCoFEM.

I believe forming a standard procedural step will contribute towards my own understanding and practice of building models. It also serves as a nice reference point.

The next time I need to remember the structure of a model, I can just #DoDeCoFEM.

Diagram from Deep Learning Garden

Please correct me if something like this already exists and I’m just being extra. Also, I am aware this is not the only flow that has been described, which is why I will also look into alternate ways of structuring this.

That’s it for this part, please look forward to the next!

--

--

Medmain

Startup Company based in Fukuoka, Japan. We strive to become the medical domain of the world. Learn more about us at https://medmain.com