Learning From Scratch — My Introduction to Machine Learning

Caleb Devon
10 min readJul 29, 2022

--

Welcome to my first Medium Blogpost and the first in a series of posts in which I will be documenting my list of personal projects related to my study of Data Science and Machine Learning.

Photo by AltumCode on Unsplash

Table of Contents

  • About Me
  • Purpose and Goals
  • Project Overview
  • Design and Creation
  • Testing and Results
  • Conclusion

About Me

Before I begin going in-depth about this project, a little about myself. I am an Electrical and Computer Engineering Student at the University of Texas at Austin. At the time of writing (July 2022) I am currently preparing to start my Sophomore year, and am currently pursuing a Bachelor’s Degree. I plan to take a technical pathway in data science and build my career in machine learning.

My interest in this field goes back to my freshmen year of high school when I first came across the idea of Neural Networks; I was amazed at the idea there was a way to break one of the most fundamental rules of Computer Science — “the machine is only as smart as the programmer.” Since then I have spent a great deal of time and effort researching and creating these models, always a small step more advanced than the last.

Purpose and Goals

My main purpose for writing this series of blog posts is to keep a written record of the projects I have made and to show off the work I have put into this passion. I hope to document my journey in this field from start to finish, including that which takes place before I can truly call this my career.

My primary goal is to use this blog as a demonstration to show potential recruiters of my work and level of understanding in this field. In the past, it has been difficult for me to prove my experience in interviews because of how difficult it is to describe personal projects in a professional setting, and I plan to use this series as a detailed explanation for my work. I hope that the readers of this post learn something new, and perhaps, even feel inspired to begin a journey similar to my own.

The project I will be detailing in this post is the first project I completed related to machine learning, and was written in my sophomore year of high school. It is another one of my goals that as this series progresses, the quality and “professionality” of the code written increases with the complexity of the overall project.

Project Overview

The project I am showcasing in this post is a Single Layer Perceptron. This project along with all future projects I showcase are written using only internal libraries. This means no established Deep Learning Libraries. This series of projects is also being written in Java, rather than the popular choice for data science: Python.

Why? — First I want to address the elephant in the room: Java for Machine Learning?
As criticized as the language is in general, especially for Machine Learning, I chose to write these projects in Java simply out of my enjoyment of the language. Java was the first language I learned, and as such, has always felt the most intuitive to me. Furthermore, my restriction on not using established libraries means that the biggest advantage of using Python in the first place is irrelevant.
Second — Why am I putting a restriction on external libraries?
By writing these projects from scratch I not only learn the importance and abilities of machine learning but also the intricacies of how models work.

Info on the Model — The Single Layer Perceptron (SLP) is among the most basic models for machine learning. Perhaps slightly misleading, the single-layer perceptron possesses not one but two layers: an input layer and an output layer, but no hidden layers like the more advanced Multilayer version. Additionally, in most cases, the activation function is simplified to what is essentially a discrete rounding function.

Perceptron model visualization
Perceptron Model Visualization, Source — https://www.javatpoint.com/single-layer-perceptron-in-tensorflow

Design and Creation

As hinted at earlier in this post, this project having been made so soon into my learning of programming, the code written in this project includes multiple poor decisions, most notably the design of the major math-heavy class having only static variables and methods. That being said I do believe it is important to reflect on the progress made throughout the projects I complete.

Because of the SLP’s simplicity, and rather significant lack of detailed, and organized code, it is likewise simple to explain. To start, the following variables are initialized to save the network operations.

static int[] inputs = new int[2];
static double[] weights = new double[2];
static int error = 0;
static int target = 0;
static int output = 0;
  • The inputs are set by the user to run a test case. The set size of 2 represents the total number of inputs in the given function, in the case of this project I use 2 because the project is aimed at using the SLP to solve 2 input Logic Gates.
  • The weights are network parameters that are adjusted over time to increase the effectiveness of the operation. The set size of 2 represents the number of connections needed between layers; since this network consists of a 2-node input layer and a 1-node output layer, the total number of connections needed is 2.
  • The error represents the difference between the target value and the calculated output value.
  • The target is a known value and is the desired result from a given set of inputs. The target value, sometimes called “label” is a major defining factor of supervised learning, in which training of networks involves revealing the correct answer after a network makes a guess.
  • Lastly, the output is a value calculated by the passing forward of inputs into the output layer. Since this network requires only 1 output, this is saved to a variable, however, if the number of outputs were to increase this would be expressed as an array.

Upon creation all weights are randomized, this process is rather simple:

public static void initializeWeights() {
for(int x = 0; x < weights.length; x++) {
weights[x] = Math.random();
}
}

After the inputs and target is initialized by the user, the forward pass is ready to occur. This is the process of moving input values through a network to reach an output.
In this network since there are only 2 input nodes connected to 1 output node, the forward pass algorithm is a simple expression:
o = (i₁ * w₁) + (i₂ * w₂)
Where “o” is the output, iₙ is the input at index n, and wₙ is the weight at index n.

To achieve this, all that is needed is a simple summation loop to add the results of inputs multiplied by their respective weight.

 public static void determineOutput() {
double out = 0.00;
for(int x = 0; x < inputs.length; x++) {
out += inputs[x] * weights[x];
}
output = (int)Math.round(out);
determineError();
}

As hinted at within this function with the call to a function called “determineError()” the next step is to calculate the error. Since there is no non-linear function, and only consists of 2 layers, the error function consists simply of the target - output.

public static void determineError() {
error = target - output;
avgError += Math.abs(error);
}

Lastly, the only thing left to do is use the error calculated to adjust the weights which is done by analyzing how much each weight contributes to the overall error. This is done by multiplying the value from the input layer with the error of the output layer. This product is then multiplied by a constant value known as the Learning Rate to adjust the weights.
This Learning Rate value is typically a low constant (this example is set to 0.2), which defines how fast the network will learn. Of course, it’s easy to think that by setting this value high, training time will be reduced. However, increasing the learning rate will decrease the quality of the final product. Overall, it is a delicate balancing act between training time and the final result.

To adjust the weights, another simple loop is run to complete the calculation:

public static void adjustWeights() {
for(int x = 0; x < weights.length; x++) {
weights[x] += inputs[x] * error * LEARNING_RATE;
}
}

With these basic processes, it is possible to run an algorithm that can learn basic relationships. For a more detailed look at the code, the repository can be found here.

Testing and Results

To show the abilities (and disabilities) of the SLP model, I ran 2 separate datasets. The first one is a logical OR gate.
This refers to an electrical opponent that takes in 2 digital values (1 or 0) and outputs 1 based on these inputs.
The specifics of the logical OR gate are fairly self-explanatory: as long as at one OR the other input is true (1) the output will be true, otherwise the output will be false (0).
To visualize the relationship, the truth table for the OR gate is below.

[ Inputs  ][ Outputs ]
[ 0 | 0 ][ 0 ]
[ 0 | 1 ][ 1 ]
[ 1 | 0 ][ 1 ]
[ 1 | 1 ][ 1 ]

After running the algorithm multiple times, it is clear that the OR gate is no problem for the Single Layer Perceptron. Results from one of the datasets are shown below.

Single Layer Perceptron solving an OR gate

The second dataset I ran similarly uses another logic gate, however, the results vary greatly from the OR gate. The gate used for this test is the XOR gate; the “exclusive or.” The idea is similar to the OR gate, however, it removes the 1 and 1 case, hence the designation of exclusive.
The XOR truth table is shown below.

[ Inputs  ][ Outputs ]
[ 0 | 0 ][ 0 ]
[ 0 | 1 ][ 1 ]
[ 1 | 0 ][ 1 ]
[ 1 | 1 ][ 0 ]

Running this dataset in the SLP we see the following result:

SLP is unable to solve the XOR Logic Gate

Viewing the results, it can be seen that the Single Layer Perceptron is never able to solve the XOR gate like it could with the OR gate.

But Why? — Outputs generated by Neural Networks are based on decision boundaries that are drawn by connection weights. In a model like the Single Layer Perceptron, the lack of any hidden layers, combined with the fact that the network only uses linear functions means that these decision boundaries can only be linear functions — straight lines. This means any function that requires a nonlinear decision boundary cannot be achieved by the SLP.
This doesn’t represent any problems in the case of the OR gate, however, is a major issue when attempting to solve XOR.

To visualize here the truth table for the OR gate is formatted onto a graph, where blue dots represent true (1) results, and red represents false (0).

Graphical Truth table for OR gate

In this example, it is very simple to draw a linear decision boundary to separate the red dot from the blues and can be done like so:

Graphical OR gate with 1 possible Linear boundary

This example is just one of many possible solutions to create a decision boundary. Put simply, The OR gate has no trouble being solved by a system that can only create linear functions.
Analyzing the XOR gate in the same way the following graph emerges:

Graphical Truth table for XOR gate

Unlike the OR gate, there is no linear decision boundary that can be drawn that properly separate all outputs properly. This is the reason the SLP fails to categorize the XOR truth table. In order to solve this issue, a nonlinear function must be introduced to allow networks to form nonlinear decision boundaries to form. In addition, hidden layers must be included to allow more complex solutions to be formed.

This problem is not new in the field of Machine Learning and is appropriately named the XOR problem. For further reading on this issue, I recommend reading this article from Towards Data Science How Neural Networks Solve the XOR Problem by Aniruddha Karajgi.

Conclusion

Overall, this project was a great starting place for me to immerse myself in the world of Machine Learning. It taught me a great deal about the inner workings of the math behind the Perceptron, which will be built upon in later projects as I showcase more complex models.

The next project I will showcase is the natural step up from the Single Layer Perceptron, that is, the Multilayered Perceptron. This model is designed to overcome the shortcomings of the SLP, by introducing what has already been mentioned to be absent from the model.

For several years I had planned to start this series of blog posts but never took the time to learn how. Despite how long it has taken, I am very excited to finally start writing about the projects I have put a great deal of work into.

With all that said, thank you for reading, I look forward to writing more posts in the near future!

--

--

Caleb Devon

Class of ’25 Electrical Engineering Student at the University of Texas at Austin with a strong interest in Software Engineering and Machine Learning