Deep Learning: Introductory One-Liner Glossary
Deep Learning explained in single sentences!
Supervised Learning
Supervised Learning is used to train a model when we have a set of feature X and Target Y. It is also used when we are asked to predict target outputs for new feature x_test.
There are two types of supervised learning problems: classification and regression.
Unsupervised Learning
Unsupervised learning is used when we don’t have targets in the training dataset, but want to automatically find patterns and relationships in an unlabeled dataset. It could be something as grouping data points into clusters.
Semi-Supervised Learning
Semi-supervised learning is a class of supervised learning tasks and techniques that also make use of unlabeled data for training — typically, a small amount of labeled data with a large amount of unlabeled data.
Artificial Neural Networks (ANN)
A computer system modeled on the human brain and nervous system, which consists of a number of input nodes (input data). The input nodes are passed through certain weighted hidden nodes with non-linear activation to the output node.
Deep Learning
Using neural network architectures with multiple hidden layers of neurons to build generative as well as predictive models.
Reinforcement Learning
Learning the behaviors that maximize a reward through trial and error.
Classification
To predict a categorical response (e.g. Yes or No? Blue, Green or Red?)
Regression
To predict a continuous response (e.g. Price, Sales).
Clustering
Unsupervised grouping of data into baskets.
Model
A model is a required structure trained using machine learning algorithms that stores a generalized, internal representation of a data for description or prediction.
Attribute
A quality describing an observation (e.g. Color, Size, Weight).
Feature
A feature includes an attribute + value (e.g. Color is an attribute. “Color is blue” is a feature).
Feature Vector (X)
A list of features describing an instance or observation.
Instance or Observation
A data point, row, or observation containing feature(s).
Data Cleaning
Improving the quality of the data by modifying its form or content, for example by filling in NULL values or fixing values that are incorrect.
Accuracy or Error Rate
The percentage of correct predictions made by the model, i.e. #correctPrediction / Total#Predictions.
Specificity
In binary classification (Yes/No), specificity is how accurately model classify Actual No as No. i.e. #CorrectNo/Total#ActualNo
Precision
In binary classification (Yes/No), precision is how accurate are the positive prediction. i.e #CorrectYes/Total#YesPredicted
Recall or Sensitivity
In binary classification (Yes/No), recall is how accurately model classifies Actual Yes as Yes. #CorrectYes/Total#ActualYes
Confusion Matrix
The table that describes the performance of a classification model. Source.
- True Positives: We correctly predicted that they do have diabetes
- True Negatives: We correctly predicted that they don’t have diabetes
- False Positives: We incorrectly predicted that they do have diabetes (Type I error)
- False Negatives: We incorrectly predicted that they don’t have diabetes (Type II error)
Over-fitting
Occurs when a model learns the training data so well that even the minute details and noise from the dataset are not generalized.
Hint: When model aces on the training/validation set but fails on a test set, it is over-fitting.
Under-fitting
Under-fitting occurs when your model over-generalizes and fails to incorporate relevant variations in the data that would give the model more predictive power.
Hint: Generally when a model performs poorly on both training and test sets, it is a case of under-fitting.
Bias-Variance Trade-off
Statistical property of a predictive model such as a lower bias in parameter estimation has a higher variance of the parameter estimates across samples and vice versa.
- Bias: What is the average difference between your predictions and the correct value for a particular observation?
- Variance: How tightly packed are your predictions for a particular observation relative to each other?
Training Set
A set of examples (X, Y) used for training the model.
Validation Set
To avoid “contaminating” the model with information about the test set, a mini-test set that provides feedback to the model during training on how well the current weights generalize beyond the training set.
Test Set
A set of examples used at the end of training and validation to assess the predictive power of your model. It’s also done to test the generalizability of the model on unseen data.
Backpropagation
An algorithm that computes the partial derivatives of the cost function with respect to any weight or bias in the network being trained.
Loss
Summation of the errors made for each example in training or validation sets, usually negative log-likelihood and residual sum of squares for classification and regression respectively.
Epoch
One epoch is when an entire dataset is passed both forward and backward through the neural network only once.
Batch
The total number of training examples present in a single batch.
Iteration
The number of batches needed to complete one epoch.
Learning Rate
A hyper-parameter that determine how much or how fast we are adjusting the weights of our network with respect to the loss gradient.
Regularization
Penalizing loss function by adding L1 norm (LASSO) and/or L2 norm (Ridge) to curb the weights from being too large that causes over-fitting.
Normalization
Changing the values of numeric columns in the dataset to use a common scale by subtracting the batch mean and dividing by the batch standard deviation.
Model-tuning
Selecting hyper-parameters to generate a model that best suits the objectives.
Activation Function
Non-linear functions that define the output of that node in a neural network.
Batch Normalisation
Normalizing the output of a previous activation layer allowing each layer of a network to learn a little bit more independently than other layers.
Embedding
A learned representation for text or image or other types of input in the form of a vector, where data that have the same meaning have a similar representation and are close to each other in the vector space.
Convolutional Neural Networks (CNNs)
Space or shift invariant ANN, which consists of convolutional layers that apply a convolution operation to the input, passing the result to the next layer.
Convolution in CNN
Variation of mathematical convolution operation that returns the weighted summation of elements within a kernel as output as the kernel is moved through the input matrix.
Recurrent Neural Networks (RNNs)
A class of ANN where connections between nodes with the internal state form a directed graph along a temporal sequence, which allows it to process not only single data points but also the entire sequences of data.
Long Short-Term Memory (LSTM)
An RNN composed of cells that has memory and the three gates (input gate, output gate and forget gate) that regulate the flow of information into and out of the cell.