Deep Learning: Introductory One-Liner Glossary

Published in

fuse.ai

6 min readMar 21, 2019

Deep Learning explained in single sentences!

Supervised Learning

Supervised Learning is used to train a model when we have a set of feature X and Target Y. It is also used when we are asked to predict target outputs for new feature x_test.

There are two types of supervised learning problems: classification and regression.

Unsupervised Learning

Unsupervised learning is used when we don’t have targets in the training dataset, but want to automatically find patterns and relationships in an unlabeled dataset. It could be something as grouping data points into clusters.

Semi-Supervised Learning

Semi-supervised learning is a class of supervised learning tasks and techniques that also make use of unlabeled data for training — typically, a small amount of labeled data with a large amount of unlabeled data.

Artificial Neural Networks (ANN)

A computer system modeled on the human brain and nervous system, which consists of a number of input nodes (input data). The input nodes are passed through certain weighted hidden nodes with non-linear activation to the output node.

Deep Learning

Using neural network architectures with multiple hidden layers of neurons to build generative as well as predictive models.

Reinforcement Learning

Learning the behaviors that maximize a reward through trial and error.

Classification

To predict a categorical response (e.g. Yes or No? Blue, Green or Red?)

Regression

To predict a continuous response (e.g. Price, Sales).

Clustering

Unsupervised grouping of data into baskets.

Model

A model is a required structure trained using machine learning algorithms that stores a generalized, internal representation of a data for description or prediction.

Attribute

A quality describing an observation (e.g. Color, Size, Weight).

Feature

A feature includes an attribute + value (e.g. Color is an attribute. “Color is blue” is a feature).

Feature Vector (X)

A list of features describing an instance or observation.

Instance or Observation

A data point, row, or observation containing feature(s).

Data Cleaning

Improving the quality of the data by modifying its form or content, for example by filling in NULL values or fixing values that are incorrect.

Accuracy or Error Rate

The percentage of correct predictions made by the model, i.e. #correctPrediction / Total#Predictions.

Specificity

In binary classification (Yes/No), specificity is how accurately model classify Actual No as No. i.e. #CorrectNo/Total#ActualNo

Precision

In binary classification (Yes/No), precision is how accurate are the positive prediction. i.e #CorrectYes/Total#YesPredicted

Recall or Sensitivity

In binary classification (Yes/No), recall is how accurately model classifies Actual Yes as Yes. #CorrectYes/Total#ActualYes

Confusion Matrix

The table that describes the performance of a classification model. Source.

True Positives: We correctly predicted that they do have diabetes
True Negatives: We correctly predicted that they don’t have diabetes
False Positives: We incorrectly predicted that they do have diabetes (Type I error)
False Negatives: We incorrectly predicted that they don’t have diabetes (Type II error)

Over-fitting

Occurs when a model learns the training data so well that even the minute details and noise from the dataset are not generalized.

Hint: When model aces on the training/validation set but fails on a test set, it is over-fitting.

Under-fitting

Under-fitting occurs when your model over-generalizes and fails to incorporate relevant variations in the data that would give the model more predictive power.

Hint: Generally when a model performs poorly on both training and test sets, it is a case of under-fitting.

Bias-Variance Trade-off

Statistical property of a predictive model such as a lower bias in parameter estimation has a higher variance of the parameter estimates across samples and vice versa.

Bias: What is the average difference between your predictions and the correct value for a particular observation?
Variance: How tightly packed are your predictions for a particular observation relative to each other?

Training Set

A set of examples (X, Y) used for training the model.

Validation Set

To avoid “contaminating” the model with information about the test set, a mini-test set that provides feedback to the model during training on how well the current weights generalize beyond the training set.

Test Set

A set of examples used at the end of training and validation to assess the predictive power of your model. It’s also done to test the generalizability of the model on unseen data.

Backpropagation

An algorithm that computes the partial derivatives of the cost function with respect to any weight or bias in the network being trained.

Loss

Summation of the errors made for each example in training or validation sets, usually negative log-likelihood and residual sum of squares for classification and regression respectively.

Epoch

One epoch is when an entire dataset is passed both forward and backward through the neural network only once.

Batch

The total number of training examples present in a single batch.

Iteration

The number of batches needed to complete one epoch.

Learning Rate

A hyper-parameter that determine how much or how fast we are adjusting the weights of our network with respect to the loss gradient.

Regularization

Penalizing loss function by adding L1 norm (LASSO) and/or L2 norm (Ridge) to curb the weights from being too large that causes over-fitting.

Normalization

Changing the values of numeric columns in the dataset to use a common scale by subtracting the batch mean and dividing by the batch standard deviation.

Model-tuning

Selecting hyper-parameters to generate a model that best suits the objectives.

Activation Function

Non-linear functions that define the output of that node in a neural network.

Batch Normalisation

Normalizing the output of a previous activation layer allowing each layer of a network to learn a little bit more independently than other layers.

Embedding

A learned representation for text or image or other types of input in the form of a vector, where data that have the same meaning have a similar representation and are close to each other in the vector space.

Convolutional Neural Networks (CNNs)

Space or shift invariant ANN, which consists of convolutional layers that apply a convolution operation to the input, passing the result to the next layer.

Convolution in CNN

Variation of mathematical convolution operation that returns the weighted summation of elements within a kernel as output as the kernel is moved through the input matrix.

Recurrent Neural Networks (RNNs)

A class of ANN where connections between nodes with the internal state form a directed graph along a temporal sequence, which allows it to process not only single data points but also the entire sequences of data.

Long Short-Term Memory (LSTM)

An RNN composed of cells that has memory and the three gates (input gate, output gate and forget gate) that regulate the flow of information into and out of the cell.