I read an entire Machine Learning textbook in an hour: This is what I learned.
Using big data to do big learning.
Hey guys, this is Frank Liu (Majoring in Interpretive Dance and Antarctic Food Science) from the Data Science Club at NYU and I’ll be introducing some basics of machine learning today! We’ll code everything from scratch and cover the fundamentals to get you started with machine learning, specifically image recognition.
The Task: Cat or Bunny?
How can we tell the difference between a cat and a bunny? Is it by the ears, tail, the sound of their barks, or personality? While these features are easy for us to identify, it is not so easy to translate into code. Furthermore each cat and bunny are unique individuals with unique backgrounds and family histories. Our goal is to have a machine that accurately predicts whether an image contains a cat or bunny by ruthlessly stereotyping them.
To do this, we’ll need to create an AI (Artificial Intelligence). Today this will be done through a neural network. But wait, what is a neural network?
Neural Networks?!
Contrary to popular belief, Neural Networks are in fact not modeled after airplanes. Neural network structures mimic the human brain and its neurons. There are three types of layers, conventionally named after the Ferrero Rocher.
The Ferrero Rocher Model
- Outer Chocolate and Crushed Hazlenut Layer
This layer is a metaphor for how neural networks receive data. The crushed hazelnut bits are the different data points, that by themselves mean nothing to us.
2. Smooth Chocolate Layer that’s like Nutella but not really
The homogenous cacao complex is an analogy for the way that the inner layers of a neural network works. The cocoa (our data) is mixed in with sugar and milk (our training calculations) in order to create our smooth chocolate filling.
3. Nut
The final layer of the neural network is a complete hazelnut recreated from the mess of crushed hazelnuts. A complete prediction assembled from the seemingly incoherent mess of data.
Coding The Ferrero Rocher
So how do we code all of this? We must implement the Ferrero Rocher model and its three parts:
Outer Chocolate and Crushed Hazlenut Layer- Processing and inserting images into the network
Smooth Chocolate Layer- Complex calculations between layers
Nut- Make the prediction
According to a Kaggle survey, almost 78% of Data Scientists use Python, making it the most popular language by far. However, we will be using Java, since Python reminds me of snakes like my Ex-girlfriend Kylie.
Processing Images
We ask users how many images they want to process, and use this to initialize an array of that size for our predictions. But what are memo, bijection(), and randmoid()? This will be explained in the next section. For now, focus on the use of Scanner, and initializing our inputs using Scanner.
Training of our Neural Network
This is the confusing part of the neural networks, the parts that deal with mathematical concepts introduced in Calculus III, Linear Algebra, and Condensed Matter Physics.
First, we write the bijection() function which takes input x and maps it uniquely to another calculated value y. We will use the Dynamic Programming technique of memoization in order to “memoize” the y values given by an input x, after calculating it once. We can do this because the output is unique for every x. Memoization helps us reduce runtime by not having to recalculate the costly computation every time for the same value. We use a HashMap because it is the fastest data structure w/ no downsides.
Second, we write the randmoid() function which helps make the function non-linear in order to make our neural network more interesting. This works by finding the inverse of, x times a scalar added to another unique value.
Displaying the output of our Neural network
Given the layers in our neural network, the nut() function helps us create our hazelnut, our final prediction. A formulaic compartmentalization of the animal's identity, created from a pool of inherent broad assumptions about their culture.
Note that the 2nd line contains an if-statement with boolean algebra too complex to explain in the scope of this article.
Putting it all together: Testing Our Neural Network
We will use a large sample size of 5 randomly selected images in order to reduce biases in our test.
After only 15.7 hours of training, the verdict is clear:
We conclude that our Neural Network has a 100% prediction rate, which is pretty good for our first neural network.
Some data scientists call neural networks like these overtrained, meaning they work so hard it’s unbelievable.
Code can be found here, on Google’s official code-sharing interface: Google Drive
Conclusion
Congratulations, you have created your own AI from scratch. In the future, we’ll want to implement interscholastic grading descent, in order to improve training efficiency. Basically how it works is that multiple scholars from different fields grade the result of the neural network and you change it accordingly. With interscholastic grading descent, we will be able to reach prediction accuracies upwards of 174% in only 11.3 hours, further exacerbating the polarizing racial divides that plague the cat and bunny population.
Recap of what was introduced:
- AI
- Neural Networks
- Ferrero Rocher Model
- Big Data
- Dynamic Programming
Now you are an AI expert.
References
- Trust me
Check out this great article on the nature of AI.
About The Author
Frank Liu is a freshman at NYU and an expert in AI with over 2 hours of experience. Business inquiries only: fl2211@nyu.edu
“Email me if you want but I probably won’t answer since I am too busy coding Facebook 2.”
[P.S. Happy Late April Fools!]