A Brief Introduction to Machine Learning Methods

Published in

Geek Culture

6 min readOct 7, 2021

“If you can’t explain it to a six year old, you don’t understand it yourself.”
― Albert Einstein

This past spring, I was fortunate to take a Machine Learning class through Stanford taught by Charlie Flanagan, a Data Scientist @ Google. The mission of the class was to “lower the barrier to use ML techniques in everyday business settings” and it certainly delivered on that. He explains, “As artificial intelligence technologies proliferate, they are becoming a priority for businesses that want to maintain a competitive edge. Tools and techniques to unlock the power of business data have become extremely accessible to all.” In that same spirit, I have provided a brief recap of the class below. Let’s begin…

Linear regression is perhaps the most simple and well known approach to machine learning (ML). It is a way of mathematically sorting out which variables have an impact on a particular outcome. It answers the questions: If I change “x” how does this impact “y”? Which factors matter most and which can we ignore? How do those factors interact with each other? One challenge: Overfitting. One solution: dividing your dataset into parts. First there is the training dataset, or the sample of data used to fit the model. Next, the validation dataset, or the sample of data used to provide an unbiased evaluation of a model fit on the training dataset while tuning the model. Lastly, the test dataset, or the sample of data used to provide an unbiased evaluation of a final model fit on the training dataset. When it comes to regression, sometimes we need to scale the output in a way that makes the most sense (i.e. the answer should be limited to Yes (1) or No (0) or something in between). In these cases, we use a Logistic regression model to translate the data points to the appropriate scale. When there are multiple variables, we can also use a non-linear process called Gradient Descent that starts with random values for each coefficient and optimizes them by iteratively minimizing the error of the model on the training set.

Next onto tree-based models. These are more common than you might think and are used every day in determining things, such as who gets approved for a car or home loan, through a series of cascading questions with binary outcomes (much like a tree branching). Random forests or random decision forests operate by constructing a multitude of decision trees at training time and outputting the mean/average prediction of individual trees. These consist of multiple single trees each based on a random sample of the training data. Each random forest tree is based on a random sample, and at each node, a random set of features are considered for splitting. The random forests can be quite fast and efficient and are able to handle missing data. However, they usually cannot predict beyond the range in the training data and tend to overfit data sets that are particularly noisy. Any ML model’s performance will also decline gradually and need to be retrained over time.

Then, we covered Natural Language Processing (NLP). The ultimate objective of NLP is to read, decipher, understand, and make sense of the human languages in a manner that is valuable. Today, this technology is used in everything from speech recognition to chatbots. Google has a tool called AutoML Natural Language that uses machine learning to analyze the structure and meaning of the written word. It can train a custom machine learning model to classify documents, extract information, or even understand the sentiment of authors (still a work in progress). We also experimented with GPT-3 (Generative Pre-trained Transformer 3) from OpenAI, an AI research organization co-founded by Sam Altman and Elon Musk (who later resigned due to concerns about the technology’s safety). GPT-3 generates text using algorithms pre-trained by crawling through the vastness of the internet, including the entire text of Wikipedia. If you ask it a question, you get a detailed answer. It can even write a poem if you ask nicely…

Named entity recognition (NER) is an information extraction technique that automatically identifies named entities in a text and classifies them into predefined categories. Entities can be names of people, organizations, locations, times, quantities, monetary values, percentages, and more. NER improve the speed and relevance of search results and recommendations by summarizing descriptive text, reviews and discussions. Word embeddings are a form of word representation that bridges the human understanding of language to that of a machine and are used by used by many of the apps we use day to day, such as Doordash and Airbnb. We also covered recommender systems, such as those used to suggest movies or dating profiles based upon patterns in user behavior (i.e. similar swipees are clustered together). The input is a user query and the output is a probability vector with size equal to the number of items, representing the probability the user will interact with each item; for example, the probability they will swipe on a dating profile or select a movie. One famous example is Netflix, which years ago ran a competition offering a $1 Million prize for anyone that could make its recommendation engine 10% more accurate using a dataset with over 100 Million ratings of 17,770 movies from 480,189 customers. Ironically, the Netflix recommendation engine was already so accurate that they didn’t even use the winning code given that it “did not seem to justify the engineering effort needed to bring them into a production environment.”

Lastly, we covered one of most exciting topics of all: Neural Networks. Essentially, they are AI systems based on simulating connected “neural units,” loosely modeling the way neurons interact in the brain. Since the 1940s, scientists have studied computational models inspired by neural connections, but more recently they have gained popularity as computer processing power has increased drastically. AI practitioners refer to these techniques as “deep learning,” since neural networks have many (“deep”) layers of simulated interconnected neurons. On the more sophisticated end, there are Convolutional neural networks (CNNs) in which the connections between neural layers are inspired by the organization of the visual cortex, the portion of the brain that processes images, well suited for perceptual tasks. We had a guest speaker from Alphabet’s automonous driving arm, Waymo, show us how this works in real-time. Another technique known as Reinforcement learning trains systems by allocating virtual “rewards” or “punishments”, essentially learning by trial and error. Google DeepMind uses reinforcement learning to develop systems that can play games, in some cases better even the most acclaimed human players. As daunting as it seems, it all comes down to finding a function to fit the data. Tools like Google’s TensorFlow have also made neural networks far more accessible with little coding required. There has recently been a surge in low code/no code AI models. There are also Generative adversarial networks (GANs), the “black magic” behind Deepfakes, which can learn to mimic various distributions of data (for example text, speech, and images) and are therefore both powerful and potentially dangerous.

Machine learning is not all a bed of roses and models are not inherently objective. We train models by feeding them a data set of training examples, and human involvement in the curation of this data can make models susceptible to bias. Therefore, when building models, it’s important to be aware of common human biases that can manifest in your data, so you can take proactive steps to mitigate their effects. This is why the field of AI ethics is so important today. We are still in the early stages of realizing what AI/ML truly has to offer. Like any tool, the most important part is how we use it!

Thank you for reading this class summary! To learn more, I recommend checking out the fall catalogue at Stanford for upcoming class offerings.

A Brief Introduction to Machine Learning Methods

Written by Partap Singh