A non computer science graduate approach to understanding deep learning.

If you are new into deep learning, machine learning or artificial intelligence, this is for you.

Tobiloba Adejumo

Published in

Dataly.ai

12 min readDec 24, 2018

Welcome on board! 😃

Particularly, I must say that I’m very happy you have chosen to make this decision; to proceed with understanding deep learning irrespective of being a non computer science graduate.

I understand that most materials on this topic online might seem a bit complex or fluffy because the authors often expect you to have to basic knowledge of calculus or linear algebra. Because of this, I wrote this article to be seemly and delectable, trying to avoid all sorts of ablation from the original contextual meaning of some the lingo used in artificial intelligence.

Why is there so much noise about Artificial Intelligence?

Artificial intelligence is the driving force of the fourth industrial revolution. The fourth industrial revolution sets out to dissolve the barriers between man and machine. What better way do you think this can be achieved except through Artificial Intelligence — allowing machines make decisions on their own and learn on their own volition. Africa as a continent seems laid back in these industrial revolutions as they have barely caught up with the rest of the world in the past three revolutions.

The question now on your minds should be: Why is this so with Africa?

Opetunde Adepoju explains this in details in her article:

Why Africa needs to prepare now for the future of work

In the euphoria of Sundar Pichai’s visit to Nigeria, I write…

becominghuman.ai

When did Artificial Intelligence start?

Artificial Intelligence (AI) started in the 1950’s. Hebert Simon, Cliff Shaw and Allen Newell conducted experiments in writing program to replicate human cognitive and thinking process. The experiments led to a program called Logic Theorists which was the first artificial intelligence program ever written. Logic Theorists was just a program written in IPL that was used to prove mathematical theorems.

IPL is an acronym for Information Processing Language. It is a computer language fashioned for Artificial Intelligence programming.

You can imagine the stress they had to go through just to create such a program — a program or an “AI” most people not in academia were not interested in.

In 1955, John McCarthy coined the term “artificial intelligence” in his 1955 proposal for the 1956 Dartmouth Conference, the first artificial intelligence conference. He is referred to as the father of AI because of this.

Years later, they wrote a more powerful program known as General Problem Solver (GPS) using the same IPL GPS could solve a bunch of puzzles using empirical approach i.e trial and error. This was not so cool as GPS could not also learn and also could not rewrite its own code only if the codes are written explicitly by the programmer.

In 1960, John McCarthy combined the IPL with some kind of mathematical logic and developed a new programming language known as LISP which was the official AI language in the United States.

What is Artificial Intelligence?

Artificial intelligence is a branch of computer science that tries to create machines that acts intelligently in response to their environment. This can be done using a variety of techniques as follows:

Explicit Programming: Explicit programming involves programming a computer to execute task step by step. This is how artificial intelligence was first implemented with fluctuating level of achievement as seen in logic theorist and general problem solver. This method has a flaw as it only works for programs that have a well ordered arrangement of instructions that a machine can follow and also works well in only monitored environments. It doesn’t handle novel situations very well as it will require the codes to be explicitly written again by the programmer.
Expert System: An expert system imitates the decision making ability of humans. It comprises of a knowledge base and inference engine. A database of facts and rules with an inference engine used to deduce new facts given the known facts and rules. The inference engine simply does the reasoning in the expert system. It applies some logical rules to the knowledge base to get the required knowledge as the expert system runs. Expert system has limited success in various decision support areas such as medical diagnostics, logistics, and business applications.
Machine Learning: Ability of machines to teach themselves how to solve problems by just providing the data.This involves teaching a machine how to solve a problem by applying algorithms without programming them explicitly how to do so. By explicitly, I mean that the rules are not defined by the programmer. The machine is being taught to identify statistical patterns in data. In identifying statistics pattern in data, the machine is exposed to existing data so that it can make right decisions given new data. Machine learning creates a function that maps an input to an output. Machine learning is the application of statistics to solving the problem involving artificial intelligence. Machine learning as lots of numerous techniques. Algorithms for classification, regression, clustering and many more. Traditional machine learning tools involves: decision tree classifiers, support vector machines, k-means clustering, or shallow. An approach to machine learning is neural network. Neural networks is a series of algorithm that recognizes the statistical pattern in series of data.It does this by mimicking the way the human brain operates. The earliest neural network were called perceptron.

What is Artificial Neural Network?

An artificial neural network is a machine learning computational algorithms based on a vague or crude approximation of a biological neural network in a brain. Neural network consists of neurons. A neuron consists of inputs (x1, x2, … xM), outputs, summations of the inputs multiplied by the weights,( net)weights (wk1, wk2, … wkm), activation function (f). The weights simulate the biological synaptic strengths in a natural neuron. Neuron also have an additional weighted input called the bias.

Connecting series of artificial neuron into a network, we get artificial neural network. It comprises of nodes for each neurons and edges for each connection.

There are various types of activation functions which are used for various applications based on their pros and cons. Let’s imagine activation function is our little way of making sure the output values stay within a particular range of values i.e 0 to +1, -1 to +1. . Increasing the bias shifts the activation function to the left while decreasing the bias shifts the activation function to the right. The weights of the neuron can also be referred to as parameters.

To train the weights of the neural network to make accurate predictions, we will need a bunch of mathematics but there are two main steps that we need to be aware of.

Forward Propagation: In forward propagation, we use the network with its current parameters to compute a prediction for each example in our training data-set.
Backward Propagation: The prediction error computed is used to properly update the weights of the connection between each neuron to help the network make better future predictions. The complex calculation happens here. We use a technique called gradient descent to help us decide whether to increase or decrease each individual connection’s weight. Then we also use something called the training rate to determine how much to decrease or increase the weights during each training step. Essentially, we need to increase the strength of the connections that assisted in predicting correct answers and decrease the strength of the connections that led to incorrect predictions. This process is repeated for each training sample in the training data set. We also repeat until the weight of the whole network become stable.

What kind of data structures are to be used in machine learning?

Because we are working with computers rather than brain, we make use of computationally efficient data structures such as vectors, matrices and tensors to represent the network of nodes and edges. A vector is a one dimensional array of values. A matrix is a two dimensional array of values. A tensor is an array with an arbitrary number of dimensions such as third order tensor, fourth order tensor e.t.c.

It is high time we defined deep learning…

Deep learning is a form of artificial intelligence that uses powerful type of machine learning called an artificial neural network with multiple hidden layers that learns hierarchical representations of the underlying data in order to make predictions given new data.

Deep learning is a type of machine learning which is a type of artificial intelligence. There are several ways to create artificial intelligence, machine learning is one of those ways. There are various types of machine learning. Deep learning is a specific type of machine learning.

A deep neural network is a neural network with more than one hidden layer. Adding more hidden layer allows the network to model aggressively more complex functions

Process of detecting human faces…

Consider this, If we want to teach a neural network how to detect human faces: first, in the input layer, we would feed a set of labeled images of human faces into the network in order to teach it what different faces looks like. The first hidden layer of the network will learn to detect geometric primitives, for example horizontal lines, vertical lines, diagonal lines. The middle hidden layer will learn to detect more complex facial features such as eyes, nose or mouth. The final hidden layer will learn to detect the general pattern for entire faces and the output layer will learn to detect the most abstract representation of a person for example the name of the person being recognized. Each subsequent layer learns to extract more complex features from the preceding layer. As a result, each additional layer detects more abstract representations than the previous layer. This is what is referred to as learning hierarchical representations of underlying data. The hierarchy of details start from the low-level details to the high-level abstractions. A deep neural network learns to model this composition hierarchy in order to make predictions. Keep in mind that this visual representation is a vast over simplification of a real world deep neural network for image processing.

DEEP LEARNING TECHNIQUES

These are techniques that allow deep learning to solve a variety of problems

Fully connected Feedforward Neural Network: The standard network architecture used in most basic neural network applications. Each neuron in the preceding layer is connected to every neuron in the subsequent layer. Feedforward simply means that there are no cycles or loops in the connection.
Convolutional Neural Network (CNN): The network architecture that works well for images, audio and video. Designed for specific task like image classification. Unlike a fully connected neural network, CNN uses a combination of sparsely convolution layers which performs image processing on their inputs. In addition, they contain down sampling layers called pooling layers to further reduce the number of neurons necessary in subsequent layers of the network. CNN then contains one or more fully connected layers to connect the pooling layer to the output layer. Convolution is a technique that allows us to extract visual features from an image in small chunks using a filter or a kernel. Pooling also known as sub sampling or down sampling. Pooling reduces the number of neurons in the convolution layer while still retaining the most important information. CNN applications include: image recognition, image processing, image segmentation, video analysis, language processing.
Recurrent Neural Network: The network that works well for processing sequences of data over time. They operate effectively on sequences of data with variable input length. The problems of RNN are vanishing and exploding gradient. Two of this variants are Gated RNNs and Long Short-Term Memory RNNs also known as LSTMs. Both of this variants uses a form of memory to help make predictions in sequences over time. Applications of RNN: Natural Language Processing, Speech recognition, Language Translation, Conversation modelling, image captioning and visual question and answer.
Generative Adversarial Network: A technique where we place two opposing neural networks in competition with one another in order to improve each other’s performance. This is a combination of two deep learning neural networks; a generator network and a discriminator network. Generator network produces synthetic data and the discriminator network tries to detect if the data that it is seeing is real or not. Applications of Generative Adversarial Network includes: Image generation, image enhancement, text generation, speech synthesis, drug discovery and more.
Reinforcement Learning: A technique for providing reward signals when multiple steps are necessary to achieve a goal. It involves an agent interacting with an environment. The agent observes the state of the environment. The environment state can be modified by actions from the agent. Agent receives reward signal whenever it achieves a goal of some kind. The objective of the agent is to learn how to interact with its environment in such a way that it allows it to achieve its goals. For example a car can be our agent and the world our environment. Deep reinforcement learning is the application of reinforcement learning to train deep neural networks. Examples of Deep Reinforcement Learning applications are games, board games like chess, video games, autonomous vehicles, self driving cars, autonomous drones, robotics including teaching robots how to walk and teaching robots how to perform manual task, management and financial task.

Applications of Deep Learning

Tabular Data; Classification, Regression, Clustering and Anomaly Detection.
Text: Document classification, natural language processing, sentiment analysis.

Different Kinds of Activation Function

The Linear Function: A straight line that multiplies the input by a constant value. The other non linear functions are listed below are called non linear because the output is not a linear multiple of the input
The Sigmoid Function: This can also be called a logistic function. It is a S-shaped curve ranging from 0 to 1.
The Hyperbolic Function: This can also be called a tanH function which is an S-shaped curve ranging from -1 to +1
The Rectified Linear Unit Function: This can also be called ReLU function which is a piecewise function that outputs 0 if the input is less than a certain value or linear multiple if the input is greater than a certain value

References

Newell, A., Shaw, J. C. and Simon, H. A., Empirical explorations with the logic theory machine: a case study in heuristics, in Computers and Thought, Feigenbaum, E. A. and Feldman, J. (Eds.), McGraw Hill, New York, 1963.
Shanon, C. E., Programming a computer for playing chess, Philosophical Magazine, Series 7, 41, 256–275, 1950.
Newell, A., Shaw, J. C. and Simon, H. A., A variety of intelligent learning in a general problem solver, in Self Organising Systems, Yovits, M. C. and Cameron, S. (Eds.), Pergamon Press, New York, 1960.
McCarthy, J., Recursive functions of symbolic expressions and their computation by machine, Communications of the ACM, 7, 184–195, 1960.
Weizenbaum, J., ELIZA — A computer program for study of natural language communication between man and machine, Communications of the ACM, 9(1), 36–44, 1966.
Minsky, M., A framework for representing knowledge, in The Psychology of Computer Vision, Winston, P. H. (Ed.), McGraw Hill, New York, 1975.
Bobrow, D. G., Natural language input for a computer problem solving system, in Semantic Information Processing, Minsky, M. (Ed.), MIT Press, Cambridge, 1968.
Winograd, T., Understanding Natural Language, Academic Press, New York, 1972.
Holland, J. H., Adaptation in Natural and Artificial Systems, The University of Michigan Press, Ann Arbor, 1975.
Matthew Renze, Deep Learning: The Big Picure
Goldberg, D. E., Genetic Algorithms in Search, Optimisation and Learning, Addison-Wesley Publishing Co., Reading, Mass., 1989.
Selfridge, O. G., Pandemonium: a paradigm for learning, in Proc. Symposium on Mechanisation of Thought Processes, Balke, D. and Uttley, A. (Eds.), H. M. Stationery Office, London, 1959.
Minsky, M. and Papert, S., Perceptrons, MIT Press, Cambridge, Mass., 1972.
Hewitt, C., PLANNER: A language for proving theorems in robots, Proc. IJCAI, 2, 1971.
Feigenbaum, E. A., The art of artificial intelligence: themes and case studies in knowledge engineering, Proc. IJCAI, 5, 1977.
Newell, A. and Simon, H. A., Computer science as empirical enquiry: symbols and search, Communications of the ACM, 19(3), 1976.
Shortliffe, E. H., Computer-Based Medical Consultations: MYCIN, Elsevier, New York, 1976.
Hayes-Roth, F. and Lesser V. R., Focus of attention in the HEARSAY-II system, Proc. IJCAI, 5, 1977.
Engelberger, J. F., (1980), Robotics in Practice, Kogan Page, London, 1980.
Smith, R. G., Mitchell, T. M., Chestek, R. A. and Buchanan, B. G., A model for learning systems, Proc. IJCAI, 5, 1977.
Lindsay, R. K., Buchanan, B. G., Feigenbaum, E. A. and Lederberg, J., Applications of Artificial Intelligence for Organic Chemistry: The DENDRAL Project, McGraw Hill, New York, 1980.
Shanck, R. C. and Abelson, R. P., Scripts, Plans, Goals and Understanding, Erlbaum, Hillsdale, N.J, 1977.
Takeda, H., Veerkamp, P., Tomiyama, T. and Yoshikawa, H., Modeling design processes, AI Magazine, Winter 1990, 37–48, 1990.
Green, M. (Ed.), Knowledge Aided Design, Academic Press, London, 1993.
https://www.codementor.io/james_aka_yale/a-gentle-introduction-to-neural-networks-for-machine-learning-hkijvz7lp