Deep Learning for Beginners

7 min readAug 5, 2023

In my previous articles, “Supervised Machine Learning for Beginners” and “Unsupervised Machine Learning for Beginners”, I briefly talked about machine learning. Here, I would like to introduce the subject of deep learning in a simple way. An overview of the application of Generative Adversarial Networks (GAN), a deep learning architecture utilized for a variety of purposes such as data augmentation, combining different modalities, image reconstruction, segmentation etc. can be found in the Python environment at the following link. Additionally, its utilization in the field of medical imaging can be explored here.

“Neural networks solve problems using a divide-and-conquer strategy: each of the neurons in a network solves one component of the larger problem, and the overall problem is solved by combining these component solutions.” [1]

Deep learning is a machine learning technique that is considered an evolution of machine learning, capable of performing much more complex operations with much larger data. It is a sub-branch of machine learning, but more developed than machine learning. One of the biggest differences with basic machine learning (Figure1) is that feature extraction is not done by humans in deep learning. In deep learning, these features are learned by the machine, thanks to the architecture of the layers.

**Fig.1.** Machine Learning & Deep Learning [created by the author in Canva]

If a photographic data is given as an example, these layers of neurons produce an output by using the structural features of an object such as edges, vertices, and colors. It finds these features by filtering through kernels. One of the main differences with machine learning is that it works with data in tensor (Figure2) structure, not with a dataframe as in machine learning. Another important difference is that in traditional machine learning, as data increases, the model learns up to a certain point, but then reaches a certain plateau and stabilizes there. No matter how much you increase the data, the learning remains constant. In deep learning, as the data increases, the learning continues and is not fixed. Deep learning works with large data (big data(velocity, variable, volume)) that is difficult for machine learning.

**Fig.2.** scalar value to tensor [created by the author in Canva]

As an example, real estate data, which is very classic, can be given. Here, by looking at the number of rooms of the houses, their location, and the size of the house; Many characteristics, such as the size of families, the walking capacity of people, and their distance from school, hospital or government institution, are learned between layers. It can output an estimate of the price of the house.

**Fig.3.** Deep Learning Methods [created by the author in Canva]

The structure that enables the neurons between the layers to learn was developed by being inspired by the structure of neurons in the human brain. In the history of machine learning that started with Turing in 1950, on deep learning, Geoffrey Hinton found the concept of neural network imitating the human nerve cell in the 1980s. A very important development was achieved with the backpropagation algorithm that he worked on with his friends. Subsequently, in 1997, IBM Deep Blue defeated world chess champion Garry Kasparov. In 1998, Yann LeCun and his friends managed to classify it with CNN. In the 2000s, many developments such as ImageNet, Siri, AlexNet, GoogleNet, GAN, AlphaGo, Capsule Networks, ChatGPT took place, respectively. The development of computer technology has accelerated the work and made it efficient.

Referring to the neural network concept, the human nerve cell conducts its transmission along the axons shown in Figure4. Impulses transmitted to the nucleus via dendrites are transmitted to new nerve cells along the axons after being biologically processed in the nucleus. This results in the mathematical processing of the information entering the neurons by weighting in deep learning. The input values are multiplied by the weights on the paths corresponding to the dendrites, collected in the nerve cell and added bias. In order to obtain a non-linear state, the activation function is applied before exiting the neuron. The output obtained as a result of the activation function can be the result or the input of another neuron. Each nerve cell is calculated with this method and connected to each other in series or parallel.

**Fig.4.** biological and machined nerve cell [created by the author in Canva]

According to Forrest Wickman of the Washington Post: “The human brain contains roughly 100 billion neurons [Ed. note: closer to 86-billion, actually, but now we’re just being nitpicky]. Each of these neurons seems capable of making around 1,000 connections, representing about 1,000 potential synapses, which largely do the work of data storage. Multiply each of these 100 billion neurons by the approximately 1,000 connections it can make, and you get 100 trillion data points, or about 100 terabytes of information.” [2]

Paul Reber, a psychologist at Northwestern University: “… neurons combine so that each one helps with many memories at a time, exponentially increasing the brain’s memory storage capacity to something closer to around 2.5 petabytes [1 petabyte ≈ 1,000 terabytes]. For comparison, if your brain worked like a digital video recorder in a television, 2.5 petabytes would be enough to hold three million hours of TV shows. You would have to leave the TV running continuously for more than 300 years to use up all that storage.” [2]

In deep learning, each cycle in which the model is fed with all the data is called an epoch. Batch size, on the other hand, includes feeding the model piece by piece by creating a sub-sample of the data in the feeding of the model. In addition, some points to be considered while building the model are mentioned below, respectively.

Activation Function: These are functions used to convert it to a nonlinear structure after summing the weighted inputs with bias. Required for derivative transactions during backpropagation. Hidden layers are also usually used ReLU. While making classification, in the output layers; Sigmoid is used if it is binary, Softmax is used if it is multiclass. Leaky ReLU, which can also be used in hidden layers, can be used if the values below 0 are important for the model.

**Fig.5.** activation function types [created by the author in Canva]

Loss Function (cost function, objective function): It is the function that measures the error rate of the model, as well as its success. Metrics are the same as supervised machine learning. Such as, it is Mean Square Error (MSE) for regression and Cross Entropy (LogLoss) for classification.

**Fig.6.** classification and regression [created by the author in Canva]

Backpropagation: While the model is feedforward, it propagates backwards by comparing the output of the input data with the target. There is no fully convex structure since there are many states that are made non-linear with the activation function, calculated in many neurons. One of the most effective parameters is the learning rate. If it is set larger than necessary, the local minimum or global minimum can be skipped. Also, the local minimum can be inserted if it is kept smaller than necessary. Whether the result is a global minimum or a local minimum can only be decided by comparing it with the previous one when good values are found.

**Fig.7.** backpropagation [created by the author in Canva]

Regularization: It is used to help the model achieve the optimum without going overfitting for various reasons. Early Stoppage stops the model where the variance does not increase much and the bias is optimum. Dropout adds error to the model by preventing some of the neurons from updating. These neurons are usually below 50% and are chosen randomly. Error is added to the model as in Weight Penalty, Ridge and Lasso. L1 Penalty, weights squared; L2 Penalty penalizes with the absolute value of the weights.

Normalization: Sometimes the weights are much smaller than 0 when updating. Sometimes it gets too big and dominates. Normalization is used to balance the inconsistency here. The most widely used is BatchNormalization. It is generally preferred to do this while entering the model. This ensures that the weights are balanced.

In this article, we talked about deep learning in general. Finally, some neural networks under three main headings are shown in Figure8. In the beginning, their usage areas can be looked at and architectures for the needs can be researched.

**Fig.8.** main types of deep learning [created by the author in Canva]

References

[1] Kelleher, J. D. (2019). Deep learning. MIT press.

[2] Gonzalez, Robbie, “If your brain were a computer, how much storage space would it have?” 2013.

[3] Geron, A. (2019). Handson Machine Learning with Scikitlearn, Keras & TensorFlow. o’Reiley Media. Inc, Sebatopol, CA.

Deep Learning for Beginners

Written by Aysen Çeliktaş