Deep Learning for Developers

Tomas Petras Rupšys
Zedge Engineering
Published in
7 min readDec 17, 2020
Photo by Jason Leung on Unsplash

So you have been working as a Software Engineer for many years, you know different frameworks/languages/libraries, you do know the best practices and use them.

I will try ensure you understand what Deep Learning is and things you should know about it from Developer’s point of view

But then in the background you can hear some buzz going on around data science, artificial intelligence, machine learning, deep learning and your inner evil starts tickling the impostor syndrome that makes you feel behind on this topic.

In this blog post, I will try ensure you understand what Deep Learning is and things you should know about it from Developer’s point of view. I.e. we will try to avoid going deep into maths.

Let’s go!

Let’s start with a business requirement: we are going to create an API, which can recognise if there is a flower in the image (see the picture of this blog post, where a robot is looking at a lego flower).

What do we need to do, to implement it using Deep Learning?

How to represent an image as a matrix?

So imagine a square image, which is 64 px x 64px. Every pixel has an RGB (red/green/value) value, where (0, 0, 0) would stand for black, (255, 255, 255) for white.

So if you wanted to represent an image as a matrix — it’s simply a three dimensional matrix, where dimensions are 64 x 64 x 3.

This should be a cold shower for many devs, who, just like me, hate adding matrices and math stuff into code. But Deep Learning requires that.

Matrices used: images (data) and labels

We will have two kinds of sets of images, one set for training and one for testing (to see the accuracy of the trained model).

So let’s say if we have a data set of 100 images, we may put it into a matrix and have a four dimensional one: 100 x 64 x 64 x 3 (64px x 64px, 3 — RGB).

Then, each image should have a label, which says 0 (false) or 1 (true) to indicate if you, as a human, see a flower in that image. This is the model pre-training, where you need to give some examples to software so it knows what a flower is.

What is Logistic (a.k.a. Sigmoid) Function?

So one of the cryptic terms you’ll hear when looking into Deep Learning is a Sigmoid function, or a Logistic Function. It uses the Euler’s number and gives values between 0 and 1.

In deep learning, the algorithms return values between 0 and 1 to give the probability “how likely there is a flower”. Then we round that value (e.g. 0.7 becomes 1) to a binary one.

There is no need to go into internals of Sigmoid, since it’s very easy to define in code or use as an abstraction.

https://en.wikipedia.org/wiki/Sigmoid_function

What is Jupyter?

Jupyter is like an IDE for Data Scientists. If JupyterHub is used, then it’s also a versioning system.

Jupyter is a web-based tool where you can create “notebooks” (*.ipynb extension), these consist of python code and comments/plots/images/tables/etc.

Basically, you read Jupyter notebook a some article and run the lines of code block by block:

From www.dataschool.io

What is NumPy and TensorFlow?

NumPy is a Python library which abstracts many scientific computations. For Deep Learning, we are mostly interested in operations with matrices (multiplication, transposing, shape shifting). If we had to do it in plain python — neither it would be efficient hardware-wise, nor you’d enjoy writing that code.

TensorFlow is an ecosystem of libraries for Machine Learning for different languages (Python, Java, JavaScript, etc.). Deep Learning is a subset of Machine Learning, therefore we are going to use it. A great thing about TensorFlow, is that it has a lot of predefined data sets / trained models, so you already may use them instead of having to train yourself.

Before looking into code examples:

  • If you want to try running python notebooks, you may use Google’s Colab to have an environment setup quickly
  • Training data set — it’s a set of images to train your model, i.e. let’s say you have 1000 images and for each of them you assign a binary value wether there is a flower in it or not
  • Testing data set — it’s a set of different images from training data set, again with binary values assigned. This data set is used like a fitness function in software engineering, to tell how accurate your model is. Similarly to human mind, if you learn maths by doing maths tests, the result will likely be better if during exam you will get an identical test that you already did before rather than a completely new one.
  • X and Y in data sets: x corresponds to a matrix of images, whereas y represent a binary (1 or 0) value that corresponds to “yes” or “no”

Some code:

Finally, some code that you may test out on Jupyter. For the purpose of explaining it given a real world example, I will avoid naming things in x, y, z and similar notations, just so you understand what is what.

Let’s start with the simplest, we will use NumPy:

import numpy as np

Next, let’s introduce data. For the simplicity fo this example, all pixels of all training and testing images will be zero, but in real world, you’d need to import images of the same size (e.g. 64px x 64px), each pixel has 3 values (red/green/blue, i.e. RGB values) and convert them to matrices:

# Constants
training_images = np.zeros((10, 64, 64, 3)) # 10 images, 64px x 64px, 3 — RGB
testing_images = np.zeros((2, 64, 64, 3)) # 2 images, 64px x 64px, 3 — RGB
training_images_labels = np.zeros((1, 10)) # labels for 10 training images
testing_images_labels = np.zeros((1, 2)) # labels for 2 training images

Since Logistic Regression doesn’t by default accept 4 dimensional matrices, we need to come back to a humanly understandable, two dimensional model, i.e. flatten the data into a two dimensional matrix (or a table), where the amount of columns means the amount of images and rows — all pixels stacked:


def flatten_images(images_matrix):
return images_matrix.reshape(images_matrix.shape[0], -1).T
flattened_training_images = flatten_images(training_images)
flattened_testing_images = flatten_images(testing_images)

Let’s define the sigmoid function (yes, TensorFlow has an abstraction for it, but just for the sake of understanding it):

def get_sigmoid(z):
return 1.0 / (1 + np.exp(-z))

The activation value (again, this is a calculus thing related to logistic regression) where you accepted flattened data (images), weights and bias (more on that — later):

def get_activation_value(flattened_data, weights, bias):
return get_sigmoid(np.dot(weights.T, flattened_data) + bias)

Calculating weights (something close to a probability) near each image and the bias:

def get_weights_and_bias(flattened_training_data, training_data_labels):
values_per_data_entry = flattened_training_data.T[0].shape[0] # How many pixels an image has
amount_of_training_data = flattened_training_data.shape[1] # How many images we have
weights = np.zeros(values_per_data_entry)
bias = 0
iterations = 1000 # You can set almost any value and optimise it
learning_rate = 0.5
for index in range(iterations):
activation_values = get_activation_value(flattened_training_data, weights, bias)
weights_derivative = np.divide(np.dot(flattened_training_data, np.subtract(activation_values, training_data_labels).T), amount_of_training_data)
weights = weights — learning_rate * weights_derivative
bias_derivative = np.divide(np.sum(np.subtract(activation_values, training_data_labels)), amount_of_training_data)
bias = bias — learning_rate * bias_derivative
return weights, bias

And here’s the final place where you actually train the model and put everything into one place:

def train_model(flattened_training_data, training_data_labels, flattened_testing_data, testing_data_labels):
# Gettings weights and bias
weights, bias = get_weights_and_bias(flattened_training_data, training_data_labels)
# Calculating predictions for each entry in the data set
training_data_predictions = get_activation_value(flattened_training_data, weights, bias)
testing_data_predictions = get_activation_value(flattened_testing_data, weights, bias)
# We only care about binary predictions, i.e. “it is a flower” or “it is not”, so rounding
training_data_predictions = np.around(training_data_predictions, 0)
testing_data_predictions = np.around(testing_data_predictions, 0)
# That’s it! Just for the sake of testing you may now check accuracy of your model:
accuracy_of_this_model = 100 — np.mean(np.abs(testing_data_predictions — testing_data_labels)) * 100
print(‘{}%’.format(accuracy_of_this_model))

To run the model, you’d need to insert the values defined earlier:

train_model(flattened_training_images, training_images_labels, flattened_testing_images, testing_images_labels)

Running it would take a few seconds and then print 100%, because all values in our data matrices were zeros and not actual RGB colour values. I.e. this model was doomed to succeed.

What wasn’t covered in this blog post

Many things! So in classical explanations we would see what Neural Networks are, explanations why do they look/act similarly to human brain and went deeper into calculus, so that you could write a Deep Learning software without using TensorFlow. You could even do it without NumPy, but it wouldn’t be as efficient because of heavy operations with matrices.

If you did enjoy this brief and simplified intro to Deep Learning and want to know more, I do recommend digging deeper into it in https://www.deeplearning.ai/ — they have a series of different courses that covers all you need to know and start applying at your work.

Summary

It might be extra difficult to truly understand how deep learning works, but you don’t necessarily need to know all of that just to get started. We could see, that flower recognition in images might be relatively easy.

In reality, if we had such task coming from business, we’d likely use Google Cloud Vision API (which we extensively use in Zedge for wallpapers) or some other service to do the job. But don’t forget, that Deep Learning could be applied to more things than just images.

--

--