Multilayer Perceptron (MLP) vs Convolutional Neural Network in Deep Learning

Uniqtech
Data Science Bootcamp
3 min readDec 22, 2018

--

Multilayer perceptrons are sometimes colloquially referred to as “vanilla” neural networks, especially when they have a single hidden layer. — MLP Wikipedia

Udacity Deep Learning nanodegree students might encounter a lesson called MLP. In the video the instructor explains that MLP is great for MNIST a simpler more straight forward dataset but lags behind CNN when it comes to real world application in computer vision, specifically image classification. It is the vanilla neural network in use before all the fancy NN such as CNN, LSTM came along. Here are some detailed notes why and how they differ. What is fully connected? What is not fully connected?

A multilayer perceptron (MLP) is a class of feedforward artificial neural network. A MLP consists of at least three layers of nodes: an input layer, a hidden layer and an output layer. Except for the input nodes, each node is a neuron that uses a nonlinear activation function. MLP utilizes a supervised learning technique called backpropagation for training. Its multiple layers and non-linear activation distinguish MLP from a linear perceptron. It can distinguish data that is not linearly separable.

“MLP” is not to be confused with “NLP”, which refers to natural language
Multilayer perceptron wikipedia page

Multilayer Perceptron (MLP): used to apply in computer vision, now succeeded by Convolutional Neural Network (CNN). MLP is now deemed insufficient for modern advanced computer vision tasks. Has the characteristic of fully connected layers, where each perceptron is connected with every other perceptron. Disadvantage is that the number of total parameters can grow to very high (number of perceptron in layer 1 multiplied by # of p in layer 2 multiplied by # of p in layer 3…). This is inefficient because there is redundancy in such high dimensions. Another disadvantage is that it disregards spatial information. It takes flattened vectors as inputs. A light weight MLP (2–3 layers) can easily achieve high accuracy with MNIST dataset.

Convolutional Neural Network (CNN): the incumbent, current favorite of computer vision algorithms, winner of multiple ImageNet competitions. Can account for local connectivity (each filter is panned around the entire image according to certain size and stride, allows the filter to find and match patterns no matter where the pattern is located in a given image). The weights are smaller, and shared — less wasteful, easier to train than MLP. More effective too. Can also go deeper. Layers are sparsely connected rather than fully connected. It takes matrices as well as vectors as inputs. The layers are sparsely connected or partially connected rather than fully connected. Every node does not connect to every other node.

The panning of filters (you can set the stride and filter size ) in CNN essentially allows parameter sharing, weight sharing so that the filter looks for a specific pattern, and is location invariant — can find the pattern anywhere in an image. This is very useful for object detection. Patterns can be discovered in more than one part of the image.

Disadvantages of MLP include too many parameters because it is fully connected. Parameter number = width x depth x height. Each node is connected to another in a very dense web — resulting in redundancy and inefficiency.

MLP in Keras: Tensorflow uses high level Keras API to give developers an easy-to-use deep learning framework. Here’s how to implement an MLP in Keras.

Multilayer Perceptron implementation in Keras

There was one point in time where MLP was the state-of-art neural networks. As the neural network architecture gets more complex or deeper, or evolve, MLP looks increasing simpler and more vanilla.

--

--