DeepClassifyML Week 2 Part 2


This post is a part of the series ‘Hasura Internship’ and covers what git is. Plus, we start to implement and design a Neural Network to find which students get accepted to a university based on their grades. Also check out my previous posts : Part 1 , Part 2, Part 3 for the app idea and some Computer Vision and Neural Network basics.

Git is a version control system used to manage a project, or a set of files and track changes in them as they change over time. Git was developed by Linus Torvalds. It lets you to revert files back to a previous state, revert the entire project back to a previous state, compare changes over time, see who last modified something that might be causing a problem, who introduced an issue and when, and more. Using Git or any other version control system also means that if you screw things up or lose files, you can easily recover them. It has 3 main states : commit, modify, and staged. Commit refers to the state when all the data is safely stored in your local database. Modify is when a user has updated or changed something in the file but not committed it to the database yet. Staged means that a user has marked a modified file in its current version to go into your next commit snapshot.

Git stores this information in a data structure called a repository. A git repository contains the following:

  • A set of commit objects.
  • A set of references to commit objects, called heads.

The repository is stored in the same directory as the project itself, in a subdirectory called .git. It is important to understand that:

  • There is only one .git directory, in the root directory of the project.
  • The repository is stored in files alongside the project. There is no central server repository.

A commit object contains three things:

  • Set of files that define state of the project at a given point in time.
  • Reference to parent commit objects.
  • An SHA1 name, a 40-character string that uniquely identifies the commit object.

The basic Git workflow can be described in 3 step:

  • Modify/update/change the files in your project repository.
  • Stage the modified files.
  • Commit the staged files and store them permanently to your Git directory.

Modifying a file but not adding it will result in Git including the previous version (before modifications) to the commit. Hence, the modified file will remain in place.

This excellent blog post covers everything you need to know about Git in detail. Hubspot’s post on Git and Github is also a great introduction beginners.

Now over to Machine Learning. Last time we began with basic mathematics using Numpy. Lets look at how to apply it to a problem.

The way in which a neural network learns from data can be divided into 3 steps :

  1. We present the neural network with training examples, which consists of a pattern of activities for the input units together with the desired pattern of activities for the output units.
  2. We determine how closely the actual output of the network matches the desired output.
  3. We change the weight of each connection so that the network produces a better approximation of the desired output.
A 3 layer Neural Network

This is a 3 layer Neural Network. Each layer has individual blocks which we call neurones and these are the basic unit of a neural network. Each one looks at input data and decides how to categorise that data. The arrows represent connections and can be imagined as the ‘weights’.

Suppose we want to find if a student gets into a university given his/her grades in 4 subjects. We have their scores and also know if they get admitted or not. We now want to find what happens when a new student comes in. The grades are the inputs to our network and are known as ‘features’. The output node in the output layer will either give us ‘accepted’ or ‘rejected’ and is called the target variable. So to summerise , the input data (grades) are fed into a network of interconnected nodes. Now, in the example above, if the inputs combine and it passes a certain threshold, we output a ‘yes’ and this student gains admission into the university. The network decides whether a student’s grades are high enough to be accepted to the university.

Now you might be wondering that how does it know which subjects are more important in making this acceptance decision. In the beginning, when we initialise the neural network, we don’t know which one will be most important in making a decision. This is done using ‘weights’. Each input to the network has an associated weight that represents its importance and these weights are determined during the learning process of a neural network. We denote the weights as ‘W’ and the inputs(subject grades) as ‘S’. The network uses these weights to the inputs sums them in a process known as Linear Combination. The equation would be as follows:

x = w​1​⋅s1​​ + w​2​​⋅s​2 + w3.s3 + w4.s4

But for this example, we just have 4 inputs. In general we may have many subjects or maybe other features such as co-curricular activities etc. Consider we have m different inputs(features) and we labele them s(1)​​, s(​2)​​, .….s(m). Let's also saythat the weight corresponding to s​1​​ is w​1​​ and so on. In that case, we would express the linear combination succintly as:

x = w​1​⋅s1​​ + w​2​​⋅s​2 + w3.s3 + w4.s4 + ……..+ w​m⋅sm

or

x = (1∑​m)​[​(wi​​⋅s​i)]​​

Sigma (∑) is used to represent summation. This means we evaluate the equation to the right multiple times and add up the results. Here 1​∑​m​​ means to iterate over all i values, from 1 to m.

The equation above means:

  • We start at subject number 1 i.e i=1
  • Evaluate w​1​​⋅s​1​​ and remember the results
  • Move to i=2
  • Evaluate w​2​​⋅s2​​ and add these results to w​1​​⋅xs1​​
  • Repeat the process till i=m, where m is the number of inputs(subjects).

(Note : You could just write summation ∑ from 1 to m as just ∑w​i​​⋅s​i​​.)

Remember Activation Functions from Part 2? They are functions that decide, given the inputs into the neurones, what should be the its output should be. It decides the actual output and its output is thus are often referred as “activations”.

Now the output from the input layer (the summation) is turned into an final output i.e ‘accepted’ or ‘rejected’ using this activation function by feeding the linear combination into an activation function. Some activation functions are the sigmoid, tanh, softmax and relu functions. We will be using sigmoid activation function given by :

Sigmoid Function

The sigmoid function is bounded between 0 and 1, and the output is given in terms of a probability for success. (1 for ‘accepted’ and 0 for ‘rejected’).

We also add a term called ‘bias’ to the linear combination. A bias, represented in equations as b, allows us to shift the activation function to the left or right, which may be critical for successful learning. Like ‘weights’, the neural network won’t know in advance what values to pick for biases. And so, the bias can also be updated and changed during training. So input layer output becomes

x = ∑w​i​​⋅s​i +b​​

Let’s use Numpy to calculate the output of our Neural Network with 4 inputs(subjects) nodes and one output node ( ‘accepted’ or ‘rejected’)with a sigmoid activation function. We will divide this task into 3 steps:

  • Calculate the linear combination (input layer’s output).
  • Apply the activation function.
  • Calculate the output

For the calculating the linear combination or weights’ sum, we will use Numpy’s dot product function that lets us perform element-wise multiplication and sum.

import numpy as np
def sigmoid(x):
return 1/(1 + np.exp(-x)) ##numpy's exponential function
inputs = np.array([50, 60, 10, 45])
weights = np.array([0.1, 0.8, 0.02, -0.7])
bias = -0.1
output = sigmoid(np.dot(weights, inputs) + bias)
print('Output: {}'.format(output)) ##Output: 0.999999999584

This operation of a neural network is called ‘Forward Propagation’ and lets us observe how well our model is performing. The next step is to actually update the weights and make our prediction. This is known as ‘Backward Propagation’ and is the stage where the actual learning happens. We learn the weights from data and then use those to make the predictions.

In the next post we will discuss ‘Backward Propagation’ and ‘gradient descent’, the backbone and one of the most important concepts of Machine Learning.

Edit : You can reach me out at akshaybhatia10@gmail.com