Our AI Jargon File

Published in

Source Institute

4 min readMay 17, 2017

“You know, I couldn’t do it. I couldn’t reduce it to the freshman level. That means we really don’t understand it.” — Richard Feynman

We thought we’d share the jargon file we’ve started while making an AI course at Source:

Machine Learning: Algorithms that can learn and make decisions without being given a set of rules. Instead, machine learning algorithms learn by example

Deep Learning: Machine Learning algorithms which work on their own to decide what’s important for getting to their end goal. It decides on what questions to ask, and how important they are.

Supervised Learning: You have to tell it what a right answer is for it to learn. Like when you mark an email as spam.

Unsupervised Learning: It determines what a right answer is and learns on its own. This works for things like anomaly detection, automatically grouping things, or removing background noise.

Latent variable: An input that can be useful in higher accuracy output, but your algorithm hasn’t noticed that yet

Explainable Algorithms: Allow you to work backwards and see how the system came up with a conclusion

Unexplainable Algorithms: Don’t allow you to work backwards and see how they came up with a conclusion

Generative Algorithms: focus on the relationship between data points

Discriminative Algorithms: focus on finding the boundaries between categories of data

Neural Networks: Algorithms that mimic the neuron structure in the human brain where each layer of neurons makes a decision based on some criteria it determines to be important. After going through many of these layers, it spits out an answer based on many many decisions made in each layer.

Convolutional Neural Networks (CNN): Neural networks specifically designed for images.

Backpropagation: working backwards through the layers of a Neural Network when you get a wrong answer and assigning portions of blame to the decisions made by each layer so they can do better next time.

K-Nearest Neighbors (KNN): A simple classification algorithm that uses proximity to other points and what group they belong in to decide what group new data belongs in

Multinomial: When something has more than 2 possible outcomes, the theory around this lets you predict probabilities as the number of factors and possible outcomes increases.

80/20 Rule: Rule of thumb to have a ratio of 80% training data to 20% testing data

Training data: data that’s collected and organised so an algorithm can learn to do something

Testing data: data not given to an algorithm after training so you can try the algorithm to see how accurately it works

Overfitting: When your algorithm gets really good at deciding things based on training data but not good at applying that to new data. Kind of like someone who is awesome at university but can’t do anything in real life.

Covariate Shift: Covariates are the things you measure as input, in order to predict output. Covariate shift is when the range of those things changes over time, so the range of inputs is no longer the same as when you started

Internal Covariate Shift: When you change the way neural nodes communicate with each other as well as the data flowing through them at the same time, resulting in your algorithm freaking out.

Vanishing Gradient Problem: When your algorithm’s range of possible outputs is too narrow, so it can’t learn because the changes it makes barely have an impact on the output.

RELU (Rectified Linear Unit): A way to solve the Vanishing Gradient Problem by saying that everything below 0 becomes 0 and you allow an infinite number of out outputs above 0.

Sigmoid: When the output of your algorithm falls between 0 and 1.

TanH: When the output of your algorithm falls between -1 and 1.

Bayes Probability Theorem: A statistical formula that takes conditional probabilities into account. For example, it might rain 10% of the days of the year where you live, but the chance of rain is higher if it’s cloudy and cold.

Naïve Bayes Classifier Algorithm: Subjectively classifies word-based content using the Bayes Probability Theorem to group it by similarities. A Bayes Classifier is programmed to look for specific conditions. This is how spam filtering works.

K Means Clustering Algorithm: Takes a set of points and divides them into however many groups you tell it to based on their properties.

Ontology: A map of concepts which shows their relationship to one another.

Broad AI: AI which can perform a broad range of tasks.

Narrow AI: AI which is designed to perform a very specific kind of task.

Strong AI: AI which genuinely reasons the way a human would.

Weak AI: AI which produces the right answer a human would produce but gets there in its own machine way.

Decision Trees: A tree shaped set of decisions where each branch is given a value and a probability. You then add each one at the bottom and use that to make a decision based on which choices provides the most value.

Random Forest Algorithms: Creates tons of decision trees, tests them, and comes to a conclusion.

Real-Time Data: Data which is fed into an algorithm as its generated.

Historical Data: Data which has already been created which is fed into an algorithm.

Off the Shelf: Describing an AI tool which can be purchased and used as is.

Natural Language Processing: The process of algorithms understanding and analyzing human language, whether speech or text.

Term Frequency–Inverse Document Frequency (tf–idf): A method of natural language processing which determines what words are important based on how often they appear relative to how often they would normally appear in a similar text.

How are we doing? Got some Jargon you think should be in here? Comment and let us know.

Our AI Jargon File

Written by Eric David Halsey