The conception or even implementation of artificial intelligence (A.I.) is by no means a recent invention, yet it is, by all means, a modern phenomenon. Conscientious machines proliferate popular entertainment of all forms, and implementations of A.I. powering everyday software and technologies today preface an exciting future of advanced image-processing, self-driving cars, and so much more.
How does artificial intelligence work? How is software capable of interpreting, acting on, and predicting human behavior? Quite simply, how do machines learn?
In this article, I provide a beginner’s guide to the world of A.I. More specifically, I introduce its method of implementation, machine learning. I give an overview of machine learning and a sampling of its approaches. In the end, I list resources for learning more and getting started in machine learning.
ARTIFICIAL INTELLIGENCE — TWO TYPES
The idea of artificial intelligence is relatively familiar to many, but popular representations are actually misleading. In practice, there exist two different types of A.I. — strong and narrow.
STRONG — LIKE HALO’S CORTANA
Strong artificial intelligence mimics the human brain. The term is strictly a theory— there has never been a real implementation of strong A.I.; nevertheless, strong A.I. is perhaps most commonly familiar due to popular entertainment.
Master Chief’s partner Cortana from the Halo video game series is one such representation — a virtual, intelligent software with full ranges of human logicality, emotions, and self-consciousness. If you’re not familiar with Cortana, think of your favorite sci-fi medium— the friendly droids of Star Wars, the self-deterministic androids of HBO’s Westworld; the possibilities are endless — and you can consider yourself familiar with a representation of strong A.I.
NARROW — MORE LIKE MICROSOFT’S CORTANA
In reality, modern technology is only capable of narrow artificial intelligence— specific implementations of a particular human cognitive ability. Any existing software or program in existence today is really an implementation of narrow A.I.
The real Cortana, as in Microsoft’s Windows helper program named after the same Halo character, is one such example. Microsoft’s Cortana serves a single, intelligent function — it receives a command from a Windows user and assists that user in executing a particular feature of the operating system. Beyond Cortana, however, Facebook’s face-tagging, Netflix’s recommender system, and Google’s speech-to-text exemplify just a small number of modern narrow A.I. implementations. The device you’re reading this article on right now includes numerous narrow A.I. features.
Note that in each of the above examples, there is a constant format. Face-tagging receives an input, a photo, and identifies any faces in that photo. Recommender systems look at a single block of data, played and searched media information, to make a recommendation. Speech-to-text receives one input, voiced words, and outputs their textual format. This input-output, information-classification format forms the basis of the method behind narrow artificial intelligence — machine learning.
MACHINE LEARNING — HOW WE “COOK UP” A.I.
You can think about machine learning as you would cooking. When nighttime comes around and you get hungry, you have a goal — eat dinner. Initially, this goal is too vague, so you have to specify it. What do you want to eat for dinner? Let’s say that you decide to eat pasta.
Great, so you’ve taken a goal consisting of a multitude of options, dinner, and chosen to achieve a specific, narrow implementation of it — pasta. Now, you need to go about actually acquiring the pasta. How are you going to do so? You need a method. To make meals, we use the method of cooking.
Machine learning is to artificial intelligence as cooking is to meals. Just as we would not describe cooking as meals, machine learning cannot be described as artificial intelligence. Instead, machine learning is the method we use to go about building, or “cooking up,” A.I.
In general, machine learning is the process of using existing data to make decisions. More specifically, machine learning is the training of models using statistical analysis to take data and produce a particular, corresponding output.
The goal of machine learning is to produce models. When going about building a model, you start by supplying a statistical algorithm initial data. This data is referred to as training data, and the process of supplying the data is referred to as fitting a model. Once a model has been fit with training data, it can make predictions on future data based on the information it already knows.
Machine learning succeeds when it produces models that are relevant and useful for humans. There are four significant types of machine learning:
SUPERVISED— LIKE LEARNING LANGUAGES
Supervised learning relies on labeled, or classified, information. When an input is given to a model trained with supervised learning, it is given with a set of specific, defined qualities. As the model builds up a database of such qualified data, it is able to classify future input based on the information it already has access to — this predictive analysis is referred to as regression.
The method is not so different from humans’ understandings of language. Each language has its own phonetics and spelling. As we learn words from different languages, we learn to detect these differences. Then, upon hearing new words, we can draw on our known vocabularies to qualify that word. We’re not always correct— for instance, we might mistake a Portuguese word we do not know as Spanish. Yet, after making that mistake, we know from then on that the mistaken word is Portuguese — in other words, that information is classified and added to our “language” database.
This analogy exemplifies the very idea behind supervised learning — the more information a model is provided, the more information it knows, and the better it is able to qualify and predict future input.
UNSUPERVISED — LIKE ASSEMBLING A JIGSAW PUZZLE
In contrast to supervised learning, unsupervised learning, as you might be able to guess, relies on unlabeled data. Unsupervised models are provided an entire dataset of unclassified information with the simple instruction of finding patterns, referred to as clusters.
Think of unsupervised learning models as a jigsaw puzzle. When you begin working on a jigsaw puzzle, you dump out the entire contents of the box. Initially, this is just a big mess of unidentifiable pieces. To assemble the puzzle, you build groups of pieces based on particular features— edge pieces, corner pieces, matching shapes or color — that help you connect them together. The focus on features to create clusters is analogous to dimensionality reduction in unsupervised learning models, the technique of filtering out irrelevant data.
It is important to note that the goal of unsupervised learning is not necessary to complete the puzzle, but simply to determine the patterns that permit classification of each piece. The idea is that such clusters reveal useful insights into analyzing a given dataset.
SEMI-SUPERVISED — JUST LIKE IT SOUNDS
Semi-supervised machine learning combines methods of both supervised and unsupervised learning. This is useful for applications such as web content classification and image and speech analysis where large amounts of data need to be clustered before being classified.
REINFORCEMENT— LIKE PLAYING A VIDEO GAME
Reinforcement learning strives to create self-sustained models that improve over numerous interactions with a given set of data. Models utilize an exploration/exploitation technique essentially similar to a try-and-see. For each action the model performs, it receives a corresponding positive or negative reward signal. This reward signal is recorded, so that the next time the model interacts with the data, it knows which actions are preferable to others.
Reinforcement learning is best understood and visualized as a video game. Say that you are playing a platforming game such as Super Mario. The first time you complete a level, you are unfamiliar with the layout. It might take some time to figure out how to proceed, and Mario may even die with an ill-timed fall or hit.
The second time you play that level, you have been exposed to the layout. You may still make a mistake or die, but you will probably complete it a little faster. The third time, even faster. By the 20th time, you are completing the level pretty quickly. By the 100th time, you are flying through — you know exactly what to expect and what to do. Similarly, reinforcement models learn the optimal way to interact with a given dataset based on many failed previous iterations.
The video game visualization is more than just a helpful analogy — reinforcement learning models are directly utilized in video games for non-playable-character behavior. They have also been used to create intelligent board-game playing systems, such as the famous AlphaGo computer program.
ALGORITHMS — THE RECIPES
Now that I’ve introduced what machine learning is and the four significant types, let’s delve into the actual details behind training the models described above.
Allow me to re-introduce my cooking metaphor. Recall that we started with a broad, general goal — eating dinner — before specifying it down to a particular method — cooking pasta. We’re still not quite ready to make pasta at this point. Simply knowing that we need to cook is not enough. To actually cook the pasta, we need a recipe.
In machine learning, recipes can be thought of as algorithms. Data scientists use algorithms drawn from statistics and mathematics to implement machine learning models. There are many algorithms (we’re talking about the entirety of statistical academia here) with new ones being formulated every day, but the following are a few of the most relevant to machine learning methods.
LINEAR REGRESSION — DOES THIS AFFECT THAT?
- Type: Supervised
- Goal: Correlation
Linear regression attempts to determine correlation among variables. Linear regression models receive information, or an independent variable, and determine how correlated that information is with a particular classification, or dependent variable. Correlations can be calculated between just a single independent and dependent variable (simple linear regression), or several (multiple linear regression).
Each independent variable is given a weight, with more heavily weighted variables representing more significant predictors of a particular classification. Once a significant correlation among variables has been established, a model trained on linear regression can make predictions on future inputs according to these weighted features.
Have you ever thought to yourself how two things relate? For instance, how number of hours spent studying might influence performance on a test? Or how the number of cups of coffee drunk in a day affect alertness levels? Such cause-and-effect questions form much of research, and are exactly suited to testing with linear regression, thus making it one of the most common and relevant algorithms.
K-NEAREST NEIGHBORS — IF IT LOOKS AND SMELLS LIKE ONE…
- Type: Supervised
- Goal: Classification
K-nearest neighbor (k-NN) algorithms classify input based on similarities to existing information. When a k-NN models receives data, it groups each piece based on how closely it matches the data surrounding it, or its “neighbors.”
The “k” in k-NN is a variable representing the number of neighbors to base classifications on. For instance, in 1-NN methods, data is classified based on the single closest neighboring data. In 5-NN methods, data is classified based on its five nearest neighbors — the classification is made according to the majority features of those five neighbors (note that since classifications are based on majority-rule, odd-numbered methods are preferred).
Think of k-NN methods like a map. If you were trying to determine which hemisphere of the globe a country is located in, you could look to its bordering, or neighboring, countries. For example, with a 1-NN method, you could determine that the United States is in the northwestern hemisphere because one of its closest neighbors, Mexico, is also in the northwestern hemisphere.
Note, however, that this 1-NN method would not always work. Venezuela, for example, which is also in the northwestern hemisphere, borders Brazil, which is (mostly) in the southwestern hemisphere. Thus, for this model, we might want to up our k variable to 3— by comparing Venezuela not just to Brazil but also its neighbors Colombia and Guyana, you could successfully determine that it exists in the northwestern hemisphere.
DECISION TREES — IF THIS THEN THAT
- Type: Supervised
- Goal: Prediction/Classification
Decision trees, another supervised learning algorithm, determine the value of an input based on a cascading set of conditionals. A decision tree algorithm takes a dataset and splits it into different paths according to related attributes.
The first attribute test represents the base of a tree with each cascading path representing branches. Each possible ultimate outcome, or the points at the bottom where predictions are made, are referred to as leaves.
Having established a tree of attribute tests, a decision tree model can take any input and make a prediction on its value by running it through the paths. For instance, in the example pictured above, the model receives an initial input, outlook, and can then make a statement on that outlook will result in a humid, overcast, or windy day.
DEEP LEARNING — VIRTUAL SYNAPSES
- Type: Semi-supervised
- Goal: Prediction/Classification
The hot-button technique in machine learning at the moment, and the closest (although still quite far off) we’ve gotten to strong A.I., deep learning algorithms mimic the human brain’s system of neurons to transform input.
Just as the human brain sends electrical signals from neuron to neuron to perform particular actions, deep learning networks consist of many different layers of processing units. The input starts at the first layer and is modified by each successive layer until it is finally processed and executed at the end.
This data transformation model can handle complex inputs and is thus on the forefront of such machine learning technology as speech- and image-recognition. Take speech-handling, for instance. Remember the language-recognition example from linear regression? A neural network could consist of several layers of such models — one that recognizes the language, one that interprets the words, one that contextualizes the words with each other, etc. — before finally understanding the statement voiced and performing a corresponding action.
Machine Learning Diagram:
BIAS — MACHINES ARE ONLY HUMAN CREATIONS
Machine learning methods are producing incredible technology promising a future ripe with possibilities, and maybe, possibly, one day, successful implementation of strong artificial intelligence.
Still, as the field of machine learning advances, it is important to note its current drawbacks — notably, bias. Studies prove that bias persists among current implementations of machine learning algorithms. For example, a 2014 examination of LinkedIn revealed that algorithms recommending jobs were less likely to advertise high-paying jobs to women than men. In another study conducted in 2017 on a risk assessment program used to classify US prisoners, it was revealed that black defendants were erroneously labeled as twice as likely to re-offend as white defendants.
With advanced machine learning methods, it can be easy to forget that artificial intelligence is not objective — it is produced by humans. Moving forward, it is crucial that software development and machine learning fields increase representation and employ a diverse workforce. The recent 2018 Stack Overflow Developer Survey revealed that the current software development industry is highly skewed demographically, with about 93% being men, 74% being of white or European descent, and 93% identifying as heterosexual.
Reduction of bias in machine learning is a complex issue founded in a long human history of discrimination, but starts with proper representation in the workforce.
So, how can you get started with machine learning? First and foremost, you’ll need a strong foundation in math.
Machine learning methods are truly statistical and mathematical methods. Here are some resources to start becoming familiar with linear algebra and the fundamental statistical algorithms:
- The Mathematics of Machine Learning
- A Gentle Introduction to Linear Algebra
- The Hitchhiker’s Guide to Machine Learning in Python
- Which machine learning algorithm should I use?
As you begin to feel familiar with the statistics and math behind machine learning, you’ll need to decide which programming language to use. Here are the current four most popular languages in machine learning, in order from most to least:
- Python — A versatile programming language offering multiple machine learning packages, including TensorFlow, scikit-learn, and pandas.
- Java — Often used in enterprise programming and front-end desktop application; not recommended for first-time programmers.
- R — Used primarily for statistical computing, especially in academia.
- C++ — Optimal for game or robot applications.
Finally, here are a few resources to actually go about beginning to code with machine learning algorithms:
- Udemy— Machine Learning A-Z
- Coursera — Stanford Machine Learning
- Kaggle Machine Learning Tutorial
- And for a more expansive list: Best Machine Learning Resources for Getting Started
Now, go out and change the world! Just, as you help us get closer to implementations of strong A.I, try to avoid a Westworld-type, situation, please? 😅