Machine Learning is for Humans

How to do it and teach it too

Elissa Levy
Educate.
15 min readSep 8, 2021

--

Machine learning algorithm was trained to recognize headphones and water bottle, which it correctly identified. And it couldn’t recognize my face, because it wasn’t trained to so do.
Output from a test run on Google’s Teachable Machine

Note: this post was written jointly by Greg Benedis-Grab and Elissa Levy, to show how we teach the math of machine learning to high schoolers and to high school teachers.

Introduction

We (Greg and Elissa) met while teaching coding to high school students with Upperline Code. About a year later, Greg was developing workshops on computational thinking for STEMteachersNYC and Elissa was studying the math behind machine learning. Neither of us had found materials online that were sufficiently complex to be meaningful, but also written comprehensibly for a high school (or general public) audience. So we came together and created a teacher workshop on machine learning. We both share our experiences teaching and learning on Medium, and we thought since we ran a workshop together it might make sense to blog together. This post is an integration of our blogging styles and is posted on both of our blogs. To see more of what we’ve written, check out Greg Benedis-Grab’s page and Elissa Levy’s page.

What is machine learning?

Machine learning (ML), a subset of artificial intelligence, involves computer algorithms that are repeatedly adjusted based on large input datasets. Our lives are affected by machine learning algorithms in myriad, often imperceptible ways: movie recommendations, text auto-complete, fraud detection, facial recognition, medicine, and more. Machine learning is all over the media: it’s a hot topic (and hot button topic!) these days. Programs abound for college students and adults who want to write ML algorithms. As we both teach high school students, we are always trying to figure out what is worth knowing and what topics are most relevant to our students. Given how prevalent machine learning is in the world around us and the injustice it can perpetuate if not used carefully, this important topic needs to be unpacked and better understood by everyone. And it is indeed quite unpack-able: machine learning concepts rely only on high school level algebra.

The first slide of our teacher workshop materials stated:

You (teacher, student, citizen) have the ability, the right, and some obligation to understand the math behind the algorithms that affect our lives. It is more knowable than the public discourse implies.

So let’s learn some machine learning!

Getting a taste: the black box

In math and science, a “black box” typically refers to something we can’t “see” inside, but whose behavior we can learn about by putting things into it and looking at what comes out. It’s similar to the concept of a function. Before we walk through the math of machine learning (that is, before we open this black box), let’s get a feel for what goes in and what comes out.

The best tool we’ve found for a fast, interactive intro is Google’s Teachable Machine. Click “image project” > “standard image model” and then upload photos from two classes (categories). Elissa took some photos of her headphones and some other photos of her water bottle. She trained the algorithm and then showed it some images, and the algorithm did a great job predicting the class that the image belonged to!Note that the algorithm was uncertain when there were no headphones nor water bottle — because we had not created a class for objects that were neither headphones nor water bottles.

Training data used in Google’s Teachable Machine
Testing the algorithm on Google’s Teachable Machine

Playing with this project and others on the Teachable Machine site gives you a feel of the power and the limitations of machine learning. Take some time to explore the site and make your own image classifier. In our workshop, the participants explored the tools and noticed patterns. Exploring the black box is a key part of the inquiry process, because it enables us to build intuition and play before trying to develop a model of what’s happening. (Analogously, in NGSS-driven science classes, students explore phenomena first, and next they figure out the explanation.) So now, let’s tackle the math behind ML.

Doing the actual math

We have both toyed with the math of machine learning using computer code, mostly in Python. Machine learning requires and relies on megabytes of data and billions of calculations, which of course is much more easily done by computers than humans. While you need programming experience to write ML code, you don’t need any coding experience to understand it — only high school algebra. Let’s open the black box with an example.

Imagine you’re in charge of the algorithm that turns the street lamps on your block on and off. You’ve been asked to make sure the light is off during the day, and on at night. You have access to a light sensor, which records values between 0 and 100. You install the sensor nearby and record its measurements at random times over the next few weeks, whenever you happen to walk by: you can play with the data here. If you could program a computer to light the street lamp based on the output of the sensor, what rule would you use? (Think before reading further.)

When we ask students or teachers this question, there are generally two categories of response: (a) partition the data with a cut-off value, and (b) set specific values to “on” and specific values to “off.” Can you think of the pros and cons of each answer?

This is machine learning: how do you take an input (light sensor value) and turn it into a decision (lamp on or off)? In this case we have only one input. The power of machine learning comes when there are multiple inputs. So let’s add another.

You may have already realized that a single sensor is going to be an imperfect measurement of whether it’s daytime. You could get a value that is incorrectly low if (for example) a bird sat on the sensor. You could get a value that is incorrectly high if (for example) a car’s headlights shone on the sensor at night. You don’t want to “overfit” your data; there will inevitably be values that your algorithm incorrectly predicts. Answer 2 in the diagram above is an example of overfitting. In general, outlier data decreases with more inputs, but it’s still critical to avoid overfitting.

Fortunately for you, in this hypothetical scenario, there was another light sensor just a block away, recording data every time you pressed the “record” button on your first sensor. How lucky! If you plot the data from sensor 1 on the horizontal axis and the data from sensor 2 on the vertical axis, and you use the color of the dot to represent daytime versus nighttime, a clear pattern emerges.

Note: if this example were real, you would expect the values of sensor 1 and sensor 2 to be correlated — because they’re both measuring the amount of light on the same street. Our teacher workshop participants rightly expected the data to lie mostly on a straight, positively sloped line, instead of being scattered all around. For this scenario, we asked teachers to suspend their disbelief and imagine the two sensors really were uncorrelated. In ML, the more uncorrelated (but still generally accurate) our inputs are, the more accurate our predictions will be. We’re working on a better example for the next time we run this workshop.

The points that are in red and blue are our training data. Looking at this plot, if someone were to give you a new point (not in the training data set), would you be able to tell if the point should be red (daytime) or blue (nighttime)? For example, what color would you make the point (70, 10)? What about (90, 90)? What rule are you using to make this determination?

It’s relatively intuitive that you can draw a partition to separate the blue points from the red points. You can draw it by hand. If you’re familiar with Desmos, open this plot and see if you can graph what you drew.

Let’s draw a line for now, to separate the two categories. (We’ll come back later to the meaning of other curves we might draw.) Let’s write the equation of this line to match the form ax + by - c = 0. A straightforward way to do this is to choose two points on the line and then find the equation of the line.

The slope of this line is (40–100)/(80–40) = -3/2. The y-intercept is found using y = -3/2 x + b with one of the points plugged in: 40 = -3/2(80) + b → b = 40+120 = 160. The line is y = -3/2 x + 160. Moving all the terms to the same side, we have 3/2 x + y - 160 = 0.

Let’s think about the physical significance of this equation. If the red dots were all on the right and the blue dots on the left, then the line would be much steeper. The coefficient in front of the y would be close to 0, which is another way of saying that the data from sensor 2 is irrelevant. If instead the red dots were all on top and blue on the bottom, then the line would be flat. The coefficient in front of the x would be close to 0, meaning that sensor 1’s data is useless in determining day versus night. Instead, we have an in-between case where both sensors’ inputs are useful, though the effect of sensor 1 is slightly stronger (coefficient 3/2 is bigger than 1).

The -160 tells us how much the line is shifted away from (0,0). In machine learning, this is called a bias.

Neural net for our streetlight example, with no hidden layers

We just constructed our first neural net! The green circles are our input nodes, in this case the values of sensor 1 and sensor 2. The gray square is our bias, which is -160. If we were given a new value, say sensor 1 = 70 and sensor 2 = 10, then we would do 70*3/2 + 1*10 - 160 = -45. Because -45 is less than 0, our point lies below our line, and so it must be a nighttime value.

Now, what if the partition between our two categories weren’t a line? If for some reason our sensor data worked out like the image below, then we might want to partition with a circle or an ellipse. In this case, we can’t do our calculation the same way, because the equation we’ve been using is the equation of a line. To get more sophisticated, we need to add what’s called “hidden layers.”

The following diagram was inspired by Matt Mazur’s description, which also walks through a more detailed calculation.

For each layer of the calculation, we take the input value from each node, multiply by a weight, add the bias, and add it all up. If you have hidden layers, a given sensor may be weighted differently for each node in that layer. The more nodes you have in a hidden layer, and the more hidden layers you have in your network, the more detailed your partition between categories will end up being. To see how well a set of calculations can match a given data set, play around with the Tensorflow Playground. We will return to this tool later in the post.

When the pros use machine learning algorithms, it’s rarely the case that they can look at the data set and “see” the partition between categories like we did with the streetlight example. In most cases, there are many more than two inputs (“sensors”). Perhaps we are trying to predict the creditworthiness of a given individual, so our inputs include their age, education, income, marital status, spending history, etc. Or perhaps we are writing a facial recognition algorithm, and our inputs are the red, green, and blue values of each pixel in an image. Remember that each input is a separate axis; after three inputs we’re really at a loss when it comes to visualizing it. Instead, we have the machine iterate towards weights that work. The algorithm is pre-seeded with random weights. Each known value is run through the system, and the weights are tweaked to make the algorithm’s output for that particular input match the known output. It takes a lot of time to calculate this by hand, but it’s a valuable activity. If you’re interested, Matt Mazur’s blog post will walk you through the steps. When this tweaking process is done with thousands of discrete inputs, the weights generally plateau at stable values that enable accurate categorization of future inputs. This process of updating weights is called backpropagation.

Beginner-friendly ML in p5.js

In our workshop, we showed a way to implement machine learning in a beginner friendly coding environment called p5.js. In this section, the code was developed by Greg, based on the work of Dan Shiffman who has a wonderful video series on Machine Learning in p5 that explains the concept in detail and provides numerous examples.

The ml5 library is a javascript package that is designed to be used with p5.js. There is a lot of complexity in the underlying code including TensorFlow, a standard package to implement many flavors of machine learning. On top of that developers had to perform many tricks to get all of these things to work in the browser that are primarily part of the tensorflow.js library. The bottom line is that by using ml5 we can create a machine learning model, feed it data, train it, and then use it to make predictions. So in a way we are returning to the black box aspect of the course now that we have more insight into what is happening in the black box. In theory, these calculations could be done by hand, but it would take ridiculously long; we really need the computing power of, well, computers. This activity further builds our intuition about how machine learning works, while relying on the computer to do the calculations that would take too long to do by hand.

The code to instantiate the ml5 model is fairly simple. The easiest way to include the library is to add the following line to your html code:

Then you need to create the ml5.neuralNetwork object which I have called brain.

In this case, we will be creating something sort of silly to hopefully gain further insight into machine learning. Many web pages implement hover functionality where the formatting of a button changes when you mouse over that button. It may change shade or the text color might shift. This functionality is efficiently performed algebraically. By this I mean you would use the x,y coordinates of the mouse to determine if you were hovering over the button. In the case of a rectangle, you might use a series of conditionals. For example

However, what if we wanted to use a machine learning approach to this problem? It may not be a useful example in the real world, but it illuminates the topic.

Instead of telling the computer the conditions of the hover as we did in the above code, let’s provide the computer a series of examples, meaning a series of coordinate pairs. For each pair, we will also tell the computer whether the hover condition should display.

For example

Then from this data, the model will be trained through backward propagation revising the coefficients with each point. Let’s get this program working. First we need a dropdown menu to select the color of the point that we are adding to our training data.

Then we need some code to add the point to the model.

We also want the training points to show up on the screen. For that I used the following

So with this implemented we can click some points on the screen with different colors.

Next, we need to train our model. For that, we create another button to train.

The epochs represent how many times you run all the training data through the backpropagation step. When you run the training you get a popup screen that shows the progress.

After the training is complete we can use the model to predict what the color should be at various locations on the canvas. To make things simple we will just change the background color of the screen based on the model’s prediction at the given mouseX, mouseY location. This is how you implement that.

The approximate circle that I tried to draw had a few flaws. I should have used more blue points as some of the gaps confused the model in those areas. However, it still worked reasonably well. If you play with it for a while you can get more intuition into the working of neural networks. You can see the corner cases where the model makes predictions that might not match the intuition of a human looking at the screen. This hands-on experience leads to a richer understanding.

Check out Greg’s p5.js sketch! If you are comfortable with p5.js go ahead and remix this work and see how you can modify it.

The Tensorflow Playground app, which we referenced earlier in this article, is a sleeker version of Greg’s simple tool that gives you more control over the number of neurons and layers in the network.

Tensorflow Playground app

This web app also provides more quantitative information such as the coefficients for each of the neurons.

Why did we do all this?

Our audience consists of high school teachers and students who will probably never design or implement a machine learning algorithm for general use. But our premise remains as articulated earlier, that everyone has the ability, the right, and some obligation to understand the math behind the machine learning algorithms that affect our lives. Once we understand the basics of how ML works, we can question its outcomes — particularly the outcomes that are unjust.

In general, machine learning algorithms are good at categorizing future data in the same way that the training data was categorized. For example, one machine learning algorithm did a great job distinguishing between images of criminals and non-criminals, but it turned out that what the algorithm was actually doing was distinguishing whether the person was smiling. (Criminals generally don’t smile in their mug shots.) What makes this dangerous is that we often don’t notice what the algorithm is actually selecting for, because the weights in the calculations don’t have intuitive meaning to humans. The 3 Blue 1 Brown series on neural net with digit classifier (video 1, video 2, and video 3) shows how a ML algorithm can be developed to identify a handwritten digit (0 through 9). The videos show how the weights in the algorithms aren’t necessarily doing the same things your brain would do to figure out which digit you’re looking at. Because we can’t readily interpret the steps of the algorithm, it’s harder to identify cases where its predictions might not match our expectations.

Machine learning is not going away. It’s too efficient to use computers instead of people to make the billions of decisions that get made daily in our international economy. Fortunately, advocacy is increasing to ensure these algorithms are designed responsibly. The most popular book we know on the topic is Weapons of Math Destruction — definitely a must-read. Interesting, relevant articles include Texas (Houston) teachers unfairly fired because of AI, Makeup to fool facial recognition, AI sending people to jail and getting it wrong, and Unethical AI chatbot in South Korea. We recommend you follow Joy Buolamwini and Jordan Harrod. For some lessons, check out MIT case studies and Ethical CS reflection modules.

Conclusion

Making sense of complex things is fun, and it makes the world more knowable. This can be empowering for students. The ethics challenges of machine learning are real, and we need to empower our students to ask the questions that will hold institutions accountable. Our goal is to help teachers and students alike make meaning and take ownership of the math that powers these algorithms. We will keep iterating and updating, and we look forward to running our workshop again.

--

--

Elissa Levy
Educate.

I teach physics in Virginia and facilitate workshops nationally. I aim to engage.