Applications of Deep Learning (CNN) in Big Data Analytics
Recently, there are many technological advancements that took place in the field of Artificial Intelligence. There are cool social Apps like Prisma, DeepArt.io that transform your photos into works of art using the styles of famous artwork. We are seeing autonomous drones, advances in Medical Image Analysis, like being able to read MRIs and CT scans rapidly and more accurately than radiologists, and to diagnose cancer earlier and with less invasive procedures. Also, there are improvements in the navigation of self-driving cars. The driverless cars are now able to use sensors and onboard analytics and are learning to recognize the traffic signals or any obstacles and are able to react to them swiftly. Google had two AI projects underway before 2012. But today, it is pursuing more than 1000 projects.
What made these advances possible and What is Deep Learning? All these advances are made possible by a family of Artificial Intelligence techniques, popularly known as DEEP LEARNING. Artificial Intelligence is any technique that enables computers to mimic human intelligence, using logic, rules and machine learning. Machine Learning uses a suite of algorithms to go through data to improve decision making. Deep Learning is a subset of machine learning composed of algorithms that permit software to train itself to perform tasks using deep neural networks to vast amounts of data.
It’s been more than 50 years since the science behind deep learning was discovered, but why is it just now starting to transform the world? Between 2010 and 2015, the data that is available in the digital world increased 10-fold and is still increasing at a tremendous pace and during the same period, the storage costs decreased by approximately 70%. There is a flood of data in the Big Data movement — images, text, transactions, etc. Also, there is wide availability of GPUs that make parallel processing faster, cheaper and more powerful.
Convolutional Neural Network
Convolutional Neural Networks (CNNs) are a category of Neural Networks that have proven very effective in areas such as Computer Vision, Speech Recognition and Natural Language Processing. They are very effective, especially in image recognition and classification with applications ranging from face recognition to powering vision in robots and self driving cars.
There are three main layers in a CNN. They are:
- Convolutional Layer
- Pooling Layer
- Fully Connected Layer
An image is represented as a matrix of pixel values, which range from 0 to 255. Typically, 0 is black and 255 is white. Any values in between are different shades of gray. An image from a standard digital camera will have a red, green and blue channel. We can think of them as three 2D matrices stacked together (RGB). A grayscale image, on the other hand has just one channel.
Convolutional layers are the core building blocks of CNN architectures. CNN derive their name from the “convolution” operator. The primary purpose of convolution in case of a CNN is to extract features from the input image. This convolution operation is known as the feature detector of a CNN.
We slide the matrix over the original image by 1 pixel (or a stride) and for each and every position, element wise multiplication between the two matrices is done and then added up to get an integer value, which forms a single element of the output matrix. The 5×5 matrix is called a ‘filter’ and the output matrix formed by sliding the ‘filter’ over input image is called a ‘Feature Map’ or ‘Activation Map’. Different filters will produce different Feature Maps for the same input image. By changing the values of ‘filter’ matrix, different filters can be obtained and each of these filters can detect a different feature from the input image, for example, edges, curves etc. In practice, CNN learns the values of these filters on its own during the training stage. With more filters, more features get extracted from the input image and the CNN gets better at recognizing patterns in unseen images.
Now, there is another important layer in a CNN, which is called the Poling Layer. Pooling layers are commonly present in between successive convolutional layers. In this layer, we try to reduce the dimensionality of each feature map, simultaneously retaining the most important information from the input image. We also call this operation, “downsampling”. Pooling can be done in various methods: Max Pooling, Average Pooling, etc.
In this example, we used Max Pooling. We defined a window of size 2x2 and took the largest element from the feature map within that window. Instead of Max Pooling, we could also take the average of the pixel values in the window specified, it is called Average Pooling. It was found that Max Pooling worked better in most of the cases. this reduces the dimensionality of our feature map. As the pooling operation is applied to every feature map generated in the Convolutional Layer, we get the same number of feature maps in the Pooling Layer, but with reduced dimensions. The main advantages of using the Pooling/Downsampling operation are:
- The dimensions of feature maps will be smaller.
- Less number of parameters and therefore less computations in the network.
- Makes the network invariant to small changes or transformations in the input image, thereby, control overfitting.
- Pooling layers reduce the data representation over the network.
A Fully Connected Layer is used to compute class scores that will be used as the output of the network. The dimensions of the output volume is (1 x 1 x N), where N is the number of output classes evaluated. This layer has a connection between all of its neurons and also every neuron in the previous layer. Some CNN architectures have multiple Fully Conmected Layers at the end. So, the Convolutional and Pooling layers extract high-level features of the input image and the Fully Connected layer uses these features to classify the image into various classes based on the training dataset.
In general, if more convolution steps are present in the CNN Architecture, more complicated features will be learnt by the CNN. For example, a CNN may learn to detect edges from raw pixels in the first layer, then use the edges to detect simple shapes in the second layer, and then use these shapes to determine high-level features, such as facial shapes in higher layers.
Let’s see how a CNN is used in three applications: Image Style Transfer, Medical Image Analysis and Music Recommendation.
Image Style Transfer
We extract the low level features like colour, texture and visual patterns from one image, let’s say style image (S) and apply it to more semantic, higher level features, like a face on another image, let’s say content image (C) and arrive at the style-transferred image (X). The image style transfer is based on a popular paper “A Neural Algorithm of Artistic Style” (Gatys et al., 2015) that explains how an image is transformed into an artistic style. In particular, the paper poses the problem as an optimisation problem. If you see the image below, for instance, blue-and-white brush strokes are considered to be the “style” while the face in the photograph is considered to be the “content”.
The key notion behind implementing style transfer is to define a loss function and try to minimize that loss. We need to conserve the “content” of the original image, while simultaneously adopting the “style” of the reference image.
Mathematical Loss : x∗=argmin(α*Lcontent(C,X) + β*Lstyle(S,X))
We try to reduce the above loss function,where α and β are weights for Content and Style respectively. The content and style images, when passed through a CNN, learn a lot, i.e. to encode perceptual and semantic information about the images. The lower layers of the CNN reproduce the exact pixel values of the original image and the higher layers capture the high-level content.
Medical Image Analysis
Deep Learning and CNNs are used in various areas in the field of Medicine/Healthcare. In Radiology — they are used for classification of lesions and automated segmentation of lesions; In Pathology — they are used for classification of the type of a patient’s cancer; In Ophthalmology — they are used to classify type and severity of eye disease in diabetic retinopathy.
While a radiologist might see thousands of images in his life, a computer can be shown millions. Unlike traditional CAD, deep learning networks can scout for many diseases at once, providing insights in treatment planning and disease monitoring. This problem of classifying medical images can be solved better by computers because they can go through so much more data than a human could ever do. The main advantage of using Deep Learning techniques for medical image analysis is not only greater accuracy and faster analysis, but also democratization of services. This means that as the technology becomes standard, eventually every patient will benefit from it. A Startup company called Enlitic is using deep learning algorithms to analyze radiographs and CT and MRI scans. Also, companies like Merck and Atomwise are trying to use deep learning for drug discovery.
Let’s see how Deep Learning and CNNs are used in Diabetic Retinopathy. Diabetic Retinopathy is caused by diabetes which can lead to loss of vision. This condition accounts for 12% of all new cases of blindness in the US. In general, the manual examination of these cases is very time consuming, slow and requires medical specialists. Interpreting the photographs of the fundus retinal requires specialized training, and there aren’t enough qualified doctors to screen them in many areas of the world. But now, with the Deep Learning algorithms developed and the computational resources available, we are able to quickly go through thousands of images and detect this eye disease.
The data consists of high-resolution colour fundus retinal images belonging to five classes (shown above). The images scaled down to a fixed resolution size to form a standardized dataset and then given as input to a CNN. It is found that the accuracy improved a lot by using Deep Learning and CNN than previous classification ML algorithms like SVM, etc.
Traditionally, music streaming companies like Spotify relied mostly on collaborative filtering approaches for recommendations. The idea of collaborative filtering is to determine the users’ preferences from historical data. For example, if two users listen to the same set of songs, it can be assumed that their tastes are probably similar. Conversely, if two songs are listened to by the same type of users, they might be similar. This information is exploited by the streaming services to make recommendations. So, the collaborative filtering methods do not use any kind of information about the songs that are being recommended. This is one of their disadvantages. As they are reliant on historical data, popular or more frequently listened songs will be much easier to recommend than new or unpopular songs. This is just because more data available for the frequently listened songs. So, if there is historical data to analyze, this approach breaks. This is called the cold-start problem. If we want to be able to recommend new music with no previous data, we need other methods or algorithms.
So, then came Deep Learning techniques, which are now being used for the recommendations at Spotify. They are using Discrete Fourier Transforms to convert the audio signals into the frequency domain, allowing for a simple and compact representation of the data and exported as, what are called Spectograms. Below is the spectogram of a part of a song. This is about 20 seconds of audio generated from a song. The x-axis is time and the y-axis represents the frequencies.
To train a model on this data, the images should be of equal dimensions. So, the images are split so that each image represents just a few seconds of audio. Now, when we split the dataset, we have tens of thousands of images, each with a label for the music genre it represented. We can now train a Deep Convolutional Neural Network to classify these samples of audio data and use the model to classify a new song that we have never seen
Because of the slicing, we cannot predict the class of the song in one go. We have to slice the new song, and then put together the predicted classes for all the slices and do some kind of voting to predict the class of the unseen new song.
- Deep Learning makes a valuable tool for Big Data Analytics, which involves analysis from very large collections of raw data that is generally unsupervised
- Deep Learning aids in automatically extracting complex data representations from large volumes of unsupervised data
- Deep Learning is a field with intense computational requirements
- We need a GPU or use cloud services like AWS, GCP or FloydHub to experiment effectively and efficiently on Deep Learning
• Gatys, Leon A., Alexander S. Ecker, and Matthias Bethge. “A neural algorithm of artistic style.” arXiv preprint arXiv:1508.06576 (2015)
•Simonyan, Karen, and Andrew Zisserman. “Very deep convolutional networks for large-scale image recognition.” arXiv preprint arXiv:1409.1556 (2014)
•Deep Learning With Python (1st ed.). (2017). Manning Publications.