In Algorithms We Trust
Machine Learning, deep neural networks, general artificial intelligence, a series of tubes!!! Oh buzz words, how casually you get thrown around in the valley and beyond. It seems that every system, product, or app is trying to integrate some form of machine learning or artificial intelligence into their tech, or at least be able to include the terms in their marketing materials. Like so many other people getting into tech, I’ve heard a lot of these terms thrown around and personally not been sure what artificial intelligence meant beyond HAL9000 or the terminator. I’ve also, at times, been too afraid to ask for fear of looking dumb and/or getting a three-hour symposium on mathematics that are way over my head. Thankfully as I am advancing along my path into software engineering I have the opportunity to get past those negative/unhelpful feelings and learn so much more about a variety of topics including Artificial Intelligence.
One of the concepts that they teach at Holberton is the Feynman Technique, which is the idea that if you have truly learned something you should be able to explain it to someone else in simple and concise terms. With that in mind, I would like to take a high level look into just what machine learning is and explain it to the best of my ability to someone that is not already in the field, perhaps not even in STEM. To kick things off let me reassure you that the idea of a terminator or a matrix-like intelligence is still a long way off. The concept of Artificial General Intelligence or AGI was first theorized at the beginning of AI research in the mid-1950s. Even at the time with technology that today we think of as ridiculously antiquated, like mainframe computers taking up whole rooms and running on magnetic tapes, a few researchers predicted that true artificial intelligence was only 20 to 30 years away. Today’s experts concede we are still decades away from a computer that could be considered to be thinking and learning on a human level.
“On a human level? But I thought computers were way faster and smarter than us. Just look at all the games they have won lately like AlphaGo, DOTA, or Jeopardy.”
Well yes, computers are very good, and very fast at specific things, but they are very bad at generalized intelligence which is what we humans have. You as an individual can pick up a tool you have never held before and with a bit of input from your senses and a bit of experimentation you can probably figure out what that tool is used for. Compare this to a computer that still can’t beat captchas where the letters and numbers are too squiggly or perhaps inverted.
So if general artificial intelligence is still years and years off, then what are we talking about? Well, a subset of artificial intelligence research known as machine learning. Machine learning at it’s most basic form is much like traditional computer science in that you are using software engineering to create a system that accomplishes a specific task very well, though we will see that it gets very complicated quickly. When we are talking about machine learning we are talking about the algorithms that are constantly tracking you and trying to in some way influence you or improve your life. What you see on social media, served up by machine learning algorithms to keep you scrolling. What’s up next on Netflix? That’s right, brought to you by machine learning. Did you google a product? Be prepared for cookies to chase you across the web and shopping sites to target you with complementary products that other’s similar preferences have trained the system to recommend. Thanks machine learning…
So the tl;dr so far is that AI as the movies show it is Artificial General Intelligence which is built to think and learn just like us though with computer speed and efficiency, but this is at least decades off. Machine learning is the creation of algorithms that can learn or find patterns but are mostly tied to very specific tasks. So then, how does machine learning work? Well, let’s take a look at the three most common flavors of machine learning, supervised, unsupervised, and reinforcement.
The basis of supervised learning is training your machine learning algorithm to map desired outputs from input data that has already been labeled by humans. An example of this could be the classic dog/not dog computer vision app. To train your algorithm, you give it ten million images of dogs that are labeled as dogs. Over time it should develop the ability to understand the fundamental features of a dog so that when you feed in novel data such as images of a cat, a parakeet, or even Snoop Dogg, it will correctly identify that none of those are in fact dogs. The first key is that you need a LOT of labeled data to get started as your algorithm will need to pour over large sets of data to learn. If you have used a computer in the last ten years, and of course that is a silly hypothetical as you are reading this on medium, you’ve surely answered a captcha test involving pictures, and that means you have helped label data by telling the system which pictures contain cars, or people, or clouds, etc. If you’ve noticed an uptick in the questions for buses, stop lights/signs, and cars well you can thank self-driving vehicle research for that, and they should thank you for helping improve vehicle safety and intelligence.
So supervised learning is training a system on labeled data but how do we go about doing that? Let’s go over a few approaches to supervised learning:
Linear Regression is in its simplest form is the best fit line. If you take a look at the scatter plot below you’ll see there is a best-fit line which is drawn by as close as possible to all points on the graph. The distance between where the point of data is, and where it would actually fall if it was directly on our best fit line is known as the error. When we draw this line we want to minimize error. We can also see a positive correlation (the line running up and to the right), a negative correlation (the line running down and to the right), or some other non-correlation such as a flat or verticle line.
Linear regression is one of the simplest and most widely used supervised machine learning algorithms. It is a great way to predict an answer based on input data. For instance, if we know what the independent value of a new data point is we can make an educated guess about what it’s dependent variable value will be based on our linear regression line. Linear regression can feature a single best-fit line or multiple lines for more complex data. The line or lines also don’t need to be straight but can be curved via algebraic formulas to find the very best fit. On important thing to note when we talk about finding the ‘best fit’ is actually the problem of overfitting. Overfitting occurs when a model is tuned too specifically to a particular data set. This means that the model is no longer generalizable.
Decision Trees, again considering the simplest form, are a flow chart of options that lead to binary true/false or yes/no conclusions. If you have ever used a dating site, or done an online personality quiz you can think of how that flow might look as each question branches and your answers get you closer to a definitive match. This sort of model is best used for classification or predicting answers based on set inputs. You can think of each decision that has options as branches, and the conclusion it reaches when there are no more options, as leaves. When you have a very complicated model with many many options and starting points, it is often called a forest.
For our final example of supervised learning, algorithm let’s look at the K-nearest neighbor set up. Much like the decision tree model the k nearest neighbor is best employed for classification. The K nearest neighbor algorithm takes a newly plotted point on a graph and k as an integer of how many of the closest points you want to include. Imagine you have two groupings of data representing voters. There are left-leaning and right-leaning voters grouped in their respective camps along an x-axis of political viewpoint. When we plot a new point towards the middle, meaning a more or less centrist voter, we still want to know which way they may lean if we are canvassing for a political party. To do this we take k=5 and find the 5 closest already existing points of data, to our new voter data point. If three points are registered democrats, and only two are republicans our model will label our newly plotted voter as a likely democrat. As with our other examples, this is a very simplistic representation of what this algorithm can do and it is only working with two parameters but it demonstrates a use case.
How machines learn
So we’ve just covered a few examples of supervised models and if you want to learn more, this is a great intro with examples of the algos as well as some python code for setting them up:
But everything we have covered so far at least appears to be something that a human could have just coded out, why go to all the trouble of machine learning if you can just write out the answers yourself? Well, the examples we have gone over are very very simple representations of what algorithms are capable of. If we consider something like your taste in media, be it movies, books, or music things get very complicated very fast. Imagine a graph that tries to plot the most recent 1000 songs you’ve listened to via the streaming service of your choice. Songs are represented by genre, beats per minute, tags for mood, and how popular the song has been with other users this week. That right there is a 5-dimensional graph and I have no idea how I would wrap my squishy meat brain around building that, but to a machine learning model that has been trained on all that data, it’s no problem.
This leads us to our discussion of the ‘black box’ of machine learning. When it comes to the level of complication that most serious machine learning systems are dealing with, the engineers that work on the systems understand the inputs and the outputs but the ‘how did we get here’ is often beyond human comprehension. Any sensible person may ask, how in the world could that possibly work? How could we not know how our programs are running, and this is completely reasonable if you, like me, are just getting into software and your life is often subsumed by knowing every excruciating detail of how your code works in order to track down bugs. What helped me in understanding the black box of machine learning, and how it can come to pass that we don’t know what is going on under the hood, is thinking of these algorithms in terms of biological evolution.
When we build machine learning systems, often we are not directly building the bots themselves but rather we are building a system that will create prototypes from a basic model and then institute random changes to create an army of versions. Each time our creator bot spawns this mass of versions from our origin bot they are fed the training data, which you will recall is labeled already, and they are given an opportunity to ‘learn’ by being tested on how well they sort the labeled data. In order to be tested and graded we also must build-out a tester bot. The version bots that perform the best on the test go back to the creator bot where they become the prototypes for the next generation which will be spawned with random changes in their decision-making code, the rest will be recycled. This process is repeated much like genetic evolution, with random changes in the code, testing to see what is an improvement, high performers pass on their code to the next generation and the cycle continues. A truly excellent short video on this was produced by CGP Grey:
Alright, so if all of that was supervised learning, then what is unsupervised learning? Well, it’s building models that will take in unlabeled data and draw conclusions or make inferences from patterns that the algorithm finds themselves. From a more complicated viewpoint, we can say that unsupervised learning is the process of mining data for patterns. The key is that the data is unlabeled so we are using our machine learning models to look for patterns that we are unaware of. If we think back to our 5-dimensional representation of the last 1000 songs you listened to on Youspotunes we can think of that in terms of supplying data to a recommendation engine. Our recommendation engine may be an unsupervised model that is built to look for the patterns in your musical taste and compare your data to millions of other users to recommend the best new tracks to you and others like you. A model like this one will be given massive amounts of information about what tracks you listened to and their characteristics, how long you listened, how many repeats, how often, etc. By finding intricate patterns that are beyond our understanding it will generate playlists that should appeal to you. Another example is the algorithmic trading of stocks, which is the majority of stock trading today. Systems that consume terabytes of stock information daily make predictions about the market billions of times a day and trade on tiny fractions of a cent to rapidly turn a profit.
Unsupervised learning can seem like even more of a black box than supervised as the machine is finding the patterns all on its own and we rarely understand them without a deep dive into the data and results. But much like pattern recognition allowed us to assign stories to the starts and begin to grasp a sense of time and travel, perhaps the unsupervised learning models will help us unlock deeper understanding of patterns we never knew existed. In that, I hope you have already submitted your participation in our deca-annual American big data project, aka the census. That data will be poured over by algorithms for the next several years looking for patterns in economic changes, the mobility of people, population makeups, etc as we work to forge our national path forward.
Finally, we have reinforcement learning and the building of neural networks for deep learning. This branch of machine learning is focused on making models that more closely represent how we as humans take in information, process it, reinforce learning from rewards, and come to conclusions. While this is much closer to Artificial Intelligence we are still working with limited scope and not talking about creating AGI.
So what is a neural network? Well in short it is building a network of nodes that process information. A typical neural network will have one or more input nodes that feed data into the network, one or more sets of ‘hidden’ nodes that assign weights to the inputs and then add bias to their outputs which feed the next layer, and finally one or more output nodes that receive the processed data and make inferences or decisions. To better explain this image what it would take to build a neural network that could drive a car in a parking lot. For simplicity sake lets say that lot is just a testing ground so nothing else is moving, there are no out of control shopping carts or oblivious pedestrians, just aisles of cars and a path to navigate up and down the aisle until we reach the desired point. In fact, the parking lot is just a maze that we must navigate. For this example, we would only really need to look left, right, and forward as the 5 point vector representation in the above image represents.
With these five data inputs of how far we are from an object, we can calculate if we need to go hard left, forward left, straight, forward right, or hard right and these are the five input nodes in our basic neural network. The hidden layer will receive data from each of our distance readings and assign a weight to each based on the job of the hidden node, each of which represents an option in our direction of travel. Each hidden node also has a bias value it will add on to the sum of the inputs * weights. Finally, all the data from the hidden nodes is fed to our output node which calculates which way we need to steer the car.
So much like our supervised learning origin bots needing to go through successive generations to improve, most neural networks don’t hit things out of the park on their first try. They also need to improve over time and that means iterating on the weights and biases that the hidden nodes utilize. When it comes to improving a neural network we could use what’s called backpropagation, which is the process of feeding the correct answer back through the network from the output back to the input and adjusting the weights and biases to reduce errors.
For a great intro to developing a neural network for a game check out Jabrils series:
Ok, last but probably foremost in terms of the importance, we need to talk about the bias elephant in the room. No not the republican party, bias in machine learning. With machine learning models becoming a larger portion of the systems that have a profound impact on our lives it is important for our society to reckon with its biases and make sure that we do not encode them into systems that are making important macro choices for us. Quick side note here, there is Mathematical bias, and then there is societal bias. Mathematical bias is the issue of data being misunderstood or misrepresented so that all values are impacted and the results erroneous. A textbook definition would be: “A systematic (built-in) error which makes all values wrong by a certain amount.” Societal bias is what you think of when I mention bias, in that some group is under or over-represented to the detriment of another. Societal bias in engineering is rarely the result of hang wringing malevolence, it is far more likely to be a sin of omission where a homogenous or near homogenous team just did not consider the ‘other’ when designing their product or system.
Most of the examples that we discussed so far have been rather innocuous, things like recommendation engines for music or movies, simple decision trees, or a basic self-driving car, but what about the big things like resource allocation, facial recognition, medical research, or even criminal justice? Well here is where things get heavy. While there will always be those that will debate this, I think there is little doubt that our societies have biases and that all women to some degree and all people of color to a high degree face systemic bias. While in the most extreme forms these biases are clearly displayed in the waving of Confederate flags, misogynistic comments, or use of racial epithets, the more subtle and insidious forms are things like micro-aggressions, suppression of upward mobility through lack of opportunity, lack of school funding, lack of credit, etc. While it is unlikely you’ll come across a team of Nazi-engineers trying to make a truly evil machine learning model, it has already been shown that current biases in things like policing get directly molded into policing technology. This piece from ProPublica is a good examination of how our current opinions on who is likely to be a repeat criminal, and thus deserving of higher bail and a longer sentence is informed by, race, class, education level, neighborhood or residence, and other factors.
On a spring afternoon in 2014, Brisha Borden was running late to pick up her god-sister from school when she spotted an…
A Model is only as good as its data
Another good example of bias that is easily baked into a system with no ill intention from the engineers has to do with the data that we train with. Go google ‘grandma’ in google images right now, I’ll wait. So how many faces of color did you see? When I did this test on July 5th, 2020, I checked the first 50 results and there were 46 white faces, 2 black, and two Asian. The second most image was from an article about an older Japanese grandma’s birthday which was nice both for a feel-good piece as well as it being the second result, but it was not til result 35th result that I saw the first black face, and then not again until result 45. Not great when you consider I am in the United States and a full 13% of our population is African American, 26% LatinX, and 8% Asian. In 50 photos this would mean I should have seen at least 12 LatinX abuelas whereas I found none. Extrapolate this to millions and millions of data points and it is easy to see where any non-white face could be treated as novel data. It’s no wonder that when researcher Joy Buolomwimi was working on bias in this field the models she worked with could not even recognize her black face. At a much worse level, the lack of proper training and misapplication of facial recognition technology can lead to dangerous results for people of color like a man in Detroit who was recently wrongly arrested based on a failed facial recognition system that misidentified him based on his DMV photo and a cctv image from a robery.
Bias is real in our lives and our society. Systemic racism is real and it doesn’t take outward violent racist acts of terror to systemically disadvantage a minority group. We are a deeply flawed people and we are on the verge of massive technological leaps that will shape our lives and reinforce the models we build, so we must get them right. This means having more voices present in the design and implementation of machine learning models so that the tough questions can be asked and points raised that a homogenous team may never consider. It is so crucial that we broaden the pool of people in STEM and tech so that we can avoid building deeply biased systems that are potentially dystopian in nature. Whether it’s fairly allocating resources, colonizing additional celestial bodies, or building a new artificial intelligence, we must be our very best and inclusive selves or we will only perpetuate the problems and biases of humanity to the extreme.
Now, I opened this section with the adage that a model is only as good as it’s data and that can be true but it is not the full story. While we should absolutely consider our datasets when building ML systems, we won’t always have great data and so we must work to fight bias of incomplete data with better models.
Whew, that was a lot to cover and it barely scratches the surface of this topic. You could, and many people do, spend your whole life learning about AI/ML and still not get through half of it. There is so much going on in this growing field of technology but we really need to step forward with a firm footing in ethical representation and the just use of machine learning. We need a more inclusive tech industry that is supported by a better more equitable national education system. We need to continue to educate people that we are not building HAL9000 but rather the systems that work behind the scenes to serve you with information and choices, and that those systems are self-reinforcing so it is possible to end up in an echo chamber/feedback loop that it is up to the individual to break out of. Finally, we often don’t fully understand how our own systems reach their conclusions and that is important to know as we plot our path forward. While computers are not nearly as broadly smart as us, they are extremely focused and fast, and the level of complexity we are working with is already beyond much of our understanding. There is so much more to learn in this field but we must never trade efficiency for ethics, and we always must be prepared to step back and ask how we can make a system better for everyone.
Thanks for reading!