Business applications of Deep Neural Networks

Published in

Machine Learning Reply DACH

11 min readDec 15, 2022

Some words about Neural Networks

Machine learning is all about learning patterns from the data, if your data contains a pattern, choosing the right algorithm will likely figure it out and you will be able to generalize to new cases your algorithm has not seen before. In machine learning we have several algorithms: random forests, support vector machines, logistic regression, etc. Depending on the kind of data you have, people have shown that specific algorithms perform better than others, therefore we do not have a universal algorithm that can be applied to any kind of data you can think of. Today we are going to talk about neural networks and in which kind of business cases (and which kind of data) these have been shown to be particularly useful. As a rule of thumb I would say that when the structure of the data is really complex, neural networks seem to outperform other machine learning methods.

So first of all, what is a neural network and how does it differ from other algorithms? Well, neural networks as well as other algorithms like decision trees are known to be universal function approximators. In Lyman terms this simply means that if your algorithm has enough free parameters you should be able to map any input to any output: i.e. capture any kind of pattern, given that you have enough data to train the algorithm.

A neural network is a succession of one or more layers where each layer takes the input of the previous one and passes its output to the next one.

Architecture of a vanilla one layer neural network. Image taken from Wikipedia.

The image above displays the architecture of a really simple one layer neural network. So let me illustrate it with an example. Let’s assume that the two initial green nodes contain the height and weight of a person. Then we pass this information to the middle layer with blue dots, some mathematical operations take place there and they pass the results to the final yellow node. Finally in the yellow node we get a number, we can then for example say, if the number is bigger than some threshold x our algorithm predicts that the person is overweight. Of course this is a dumb example, because we can calculate this simply with a simple fraction, but I think it illustrates good the mechanics of how a neural network operates.

The architecture is up to you, i.e. the number of neurons in each layer and how the layers are connected. Each of these connections are called weights and these are the free parameters that you are going to optimize to get the approximator you want. Of course depending on how you choose your architecture, this will have a huge impact on the performance of your network. In the following sections we will talk about a couple of the must know deep learning architectures by 2022.

Why Deep Neural Networks?

The adjective “deep’’ in deep neural networks, comes from the fact that a neural network can have several layers, so each layer takes the input of the previous one and passes its output to the next one. Neural networks which have several layers in contrast to one-layer neural networks are referred as Deep Neural Networks.

In principle one layer neural networks are already universal function approximators, then why do we talk about deep neural networks at all? Well it turns out that deep neural networks seem to have been empirically more generalizable and easier to train compared to shallow/one layer neural networks. This seems to be related to the fact that each layer detects a set of features and the next layer will use those detected features to create even more higher level features. A well known example that illustrates this are Convolutional Neural Networks (or CNN for short). In a CNN the first layers detect simple patterns, like edges, the layers afterwards will detect if the edges cross somewhere and the final layer will detect if that corresponds to the picture of a table.

Hopefully the following image helps you to understand the explanation of the previous paragraph better:

Convolutional networks and computer vision

In 2012 the CNN named AlexNet won the ImageNet challenge dropping the classification error from 26% to 15%. Since then the same competition has been won by other CNNs year after year. Therefore convolutional neural networks have become the state of the art for many computer vision tasks like image classification or image segmentation.

The key ingredient of these kind of networks is that every layer uses an operation called convolution. Convolutions have been there way long before deep learning, in fact the first discovery of convolutions as a mathematical operation come back to the 18th century which, and of course, had to be discovered by one of the greatest mathematicians of all times — if not the greatest — Leonhard Euler.

Why are convolutions so important to image processing? This is because convolutions display a set of symmetries that we expect to be there in our algorithm. In particular convolutions display translation equivariance in both dimensions of the image (horizontal and vertical). So instead of expecting our algorithm to learn those, we already impose them in our initial architecture. This is an example of what is known in the machine learning community as inductive bias. Furthermore convolutions reuse the same parameters for the whole image, making the neural network much more lightweight and easier to train compared to a fully connected vanilla neural network for the same purpose.

In some sense we are creating a shortcut there, we expect that if we start with a fully connected neural network and we train it for a long enough time it should end up discovering these symmetries and creating those layers itself. Instead we are going to help the neural network and start from an already “advanced” stage of its training.

Again, if you want to learn more about the specifics of CNNs there are plenty of amazing resources out there which would be a waste of time to redundantly repeat in another medium post. If you are interested on them I recommend you this great article about CNNs: An Intuitive Explanation of Convolutional Neural Networks.

These are some of the applications of convolutions neural networks in computer vision:

Medical imaging: Time is short and specialized physicians (e.g., radiologists) often cannot keep up with patient demand. This ends up resulting in long waiting times which prevents patients from getting an early diagnosis. CNNs can’t offer a full solution, but can decrease significantly the time before a diagnosis and the amount of human resources to get there. An important problem faced by CNNs in medical imaging is its lack of explainability. So as you probably have heard already Neural Networks act pretty much like a black box, so they get trained but it’s usually almost impossible to translate to a human understandable process how the network decides to classify an image as a car and not as a truck, and as a doctor you would like to be sure of the result before you prescribe any kind of potentially harming drug to a healthy individual.
Self-driving cars: Self-driving cars need to be able to distinguish several objects in their surroundings, we need to make sure that we know where the road is, if we have cars around or if some human being is close to us for safety reasons. Not just this, but we also need to be able to interpret the signs in the street. Are the lights red? Is it allowed to go faster than 120 km/h? To solve all these problems self-drivings cars use an ensemble of algorithms that helps them to take decisions.
Document identification/validation: You might be running a company where you want to classify documents or even validate national IDs or passports. In that case CNNs might be a good choice
Face id: Apple started using CNNs for face identification as early as the iOS 10. Moreover we can also use these networks for other biometric data like fingerprints. As you can see, if you want a reliable way to identify your users CNNs can do the work for you.

GPT-3, Language models and transformers

The famous paper Attention is all you need revolutionized the world of Natual Language Processing (or NLP for short), presenting a new architecture, so-called transformer. Transformers are sequence to sequence models and this applies to languages, as a sentence is nothing more than sequence of words. The best introduction I have seen to transformers is by far the article from Jay Alammar: The illustrated transformer. So if you want to go into more detail I encourage you to take a look at it.

Transformers have become the state of the art for NLP tasks, with incredible results to translate language, write essays, summarize text, answer questions and many more. Just go into the Playground of OpenAI https://beta.openai.com/examples and test the models with some simple Q&A or text generation. If you feel lazy to set up the necessary things for the OpenAI playground you can start directly playing with the new AI chat from openAI: https://chat.openai.com/

But the mechanism of self-attention which characterizes the transformer architecture seems to be quite universal. In recent years a lot of papers for computer vision have been using transformers, challenging the previous state-of-the-art CNN architecture. Here I am going to give you a couple more of examples where transformers are shining for their huge potential to revolutionize the Pharma industry:

Drug discovery: Nvidia and AstraZeneca collaborated to create an algorithm based on the transformer architecture for predicting how chemicals will react with each other and generate new molecular structures. Here you have the original article from Nvidia: Drug Discovery Gets Jolt of AI via NVIDIA Collaborations with AstraZeneca, U of Florida Health.
AlphaFold: When the paper of AlphaFold appeared for the first time, it shocked the entire scientific community, even the magazine Forbes published an article with the following header: AlphaFold Is The Most Important Achievement In AI — Ever. What AlphaFold does is to determine the 3D structure of a protein from its aminoacid sequence. Apparently the proteins shape determines what it does and how it works, and this has a direct impact on inventing new medications which target those proteins. Alphafold is able to predict the structures for nearly all cataloged proteins known to science and this is huge. As by the year 2019 just a small fraction of them were known and it took a vast amount of human resources just to figure out one more. This is the original article in the magazine nature: .Highly accurate protein structure prediction with AlphaFold

Cooling servers and deep reinforcement learning

In contrast with most algorithms in Machine Learning, in Reinforcement Learning we train an algorithm which its most characteristic trait is that is able to take decisions by its own. The simplest example is playing a chess game, in reinforcement learning you train a so-called agent which depending on the distribution of figures on the board and possibly on the previous moves of your opponent the agent is going to take a decision and move one of the figures on the board. Of course we train the agent to maximize the likelihood of winning the game. In general there is always a goal we want to achieve and the success or failure depends on the strategy or set of actions taken by the trained agent.

In general reinforcement learning does not necessarily need to be coupled with deep learning, but again as explained in the introduction deep learning comes in handy when the complexity of the problem is quite high. Therefore for many real-life applications, like self-driving cars, reinforcement learning is going to take advantage of deep learning for a better performance.

DeepMind and Google have been using Deep Reinforcement Learning to optimize the cooling strategy of their servers. Until recently the data center team was in charge to continuously monitorize and take decisions on what to do once the servers started to warm up over a certain threshold. With the new AI algorithm, the team was not just able to reduce the energy consumption of the cooling system but also discovered strategies for cooling down the servers that they were not even aware of! If you want to know more about this amazing application of RL take a look at the original paper of DeepMind: Controlling Commercial Cooling Systems Using Reinforcement Learning.

Needless to say, reinforcement learning fits in perfectly in applications like autonomous driving, as the car (the agent) needs to constantly asses what is the next decision that he is going to take. Is he going to stop, accelerate turn to the right, to the left?

Another amazing application coming again from DeepMind is the Chip Design with Deep Reinforcement Learning. In this application DeepMind trained a RL agent to design chips which are suit specifically for ML purposes, reducing the chip design cycle time dramatically.

Synthetic data, Art and generative models

Until now we talked about models that take as input the type of data that we want to analyze and they output a result: be this classification, regression, segmentation, etc. These are known as Discriminative Models in contrast to the models we will talk in this section, the so-called Generative Models.

Generative models work a bit the other way around, so we start sampling from some random probability distribution (usually a Gaussian) and as output we get a data point that should be really “similar” to our training data. I think the most well-known example is the case of creating realistic faces using a Generative Adversarial Network (or GAN). Take a look at the following picture from the StyleGAN repository:

All these faces do not belong to any real person. Photo taken from original repo: https://github.com/NVlabs/stylegan

Maybe you can spot that some of the images seem to be a bit odd, but others you can’t really tell that they are produced from an algorithm and that the face does not belong to a real person.

These are generative models: you train them with a type of data, in this case faces of people, and at the end of the day they are able to produce samples so similar to the training data that it is hard to tell the difference.

There are several algorithms that belong to the category of generative models, the most well-known are GANs, VAEs, Normalizing Flows and Diffusion models (I recommend this article for the mathematical avid reader). The important thing is to remember that these algorithms are able to create realistic, still synthetic data.

But why would synthetic data creation be useful? In Europe GDPR regulations might be quite strict for certain data and in general data related to health tends to be highly protected. Therefore the healthcare as well as the Pharma industry are two of the sectors that can benefit the most from synthetic data. As long as the privacy of the users is guaranteed you should be able to share synthetic artificial data that contains the correlations and characteristics that enable researchers to speed up research or drug discovery for example. There are already companies offering such services, for example gretel offers a service to create synthetic data. Of course the more complex the data structure the harder it is to replicate it. If you want to create synthetic data from genomics, that can be quite a challenge as figuring out realistic combinations of DNA bases A, C, G and T as these sequences vary in size from hundreds to millions.

Another example could be in Art, I invite you to try the following models that create images from text: https://replicate.com/collections/text-to-image. Pretty cool huh?