Machine Learning is for everyone, you just need to know where to start

DNA Technology
DNA Technology
Published in
8 min readAug 5, 2020

Introduction

This article is for everyone wondering where to start their adventure with machine learning.

In this blog post you will not find much technical information about machine learning. However, I will share with you what I have studied to build a PoC Python application taking advantage of ML.

If you are interested in what I did to learn how to build a PoC of a recommendation engine please continue reading. I am pretty sure that even with no experience in machine learning you will be able to go through all the links and extract useful information for you.

Before we start, let me explain my motivation to investigate machine learning. I was tempted and encouraged by a completely different approach to solve problems from what I have known already. Until now, I usually was given input data, and had to develop logic that produced answers. In case of machine learning I let the computer figure out the logic. My role is to provide example inputs and answers, respectively. Sounds amazing and actually is.

The first steps in Machine Learning

The first step I can recommend to you is to watch the four movies on Youtube:
Neural networks. Those movies take 52 minutes in total and will give you fundamental understanding of concepts behind machine learning. As it takes time to grasp machine learning from a mathematical point of view, the recommended movies focus on visualisation of discussed ideas which is super accessible.

Equipped with basic information I started a course Introduction to TensorFlow for Artificial Intelligence, Machine Learning, and Deep Learning on Coursera. It is a part of a 4-courses specialisation “TensorFlow in Practice”. During this course you will get to know Tensorflow, the most popular ML framework. I really like its hands on approach. I had a chance to classify handwritten digits in a popular MNIST collection and train model to distinguish photos of humans from photos of horses.

Tensorflow and especially its Keras API is an excellent framework to start adventure with ML. It has great documentation, it is easy to use and at the same time powerful.

Machine learning can be deployed in many areas including voice and text processing, image recognition or online data processing. I decided to look more closely at its application in the field of recommendation engines mostly due to the fact that such a product can be immediately used in a few startups we are working on.

How to approach a Recommendation System

In general we can distinguish two types of recommendation systems:

  • collaborative filtering — predicts user choices based on historical choices of all users
  • content-based filtering — predicts user choices based on features of items previously selected by a given user

In order to get decent theoretical knowledge about recommendation systems I can point out week 9th of the course Recommendation Engines — Machine Learning Course, which contains 6 videos fully dedicated to recommendation systems.

Eventually, I have decided to follow the lectures guideline from Deep Learning course, a part of Master Data Science Paris Saclay University published on GitHub. The Lab 3: “Embeddings and Recommender Systems” is a wonderful source of theoretical and practical knowledge. If you are interested in understanding all equations from the slides but you are missing mathematical background, I encourage you to have a look at this free materials from The University of Texas at Austin — Advanced Linear Algebra: Foundations to Frontiers Advanced LAFF. I watched only a few lectures about linear transformations and matrix normalization but found them really useful.

In my case, I wanted to build a system that will recommend offers to users. The first approach I used was collaborative filtering and the model based on vector embedding and dot product. In order to successfully train a given model I needed a user rating matrix. Of course such a matrix has never been built for a given system. Fortunately, all data needed to prepare the matrix was present in the application logs. Every time a user opens any offer a frontend part of the system makes a request to the backend part, and log about it contains identifiers of both a user and an offer. By exploring data from logs, it was possible to build a rating matrix.

  • if a user has opened an offer only once, the rating equals to 3
  • if a user has opened an offer more than once, the rating is 4
  • if a user applied to an offer, the rating is 5

To get familiar with embeddings concept, please check this video:
Vectoring Words (Word Embeddings) — Computerphile.

Of course the movie is about text processing and in my model the embedding layers were used only to convert IDs to vectors. Nevertheless, in general the principle is the same.

The model and its performance are visualized in the graphs shown on Fig. 1.1 and Fig 1.2.

Fig. 1.1 Dot Product based model loss during training and evaluation
Fig. 1.2 Dot Product based model plot

The solid line illustrates loss during training and the dash line loss for evaluation. As one can see, the loss is approximately 1.5 which is not a satisfying result.

The next step was to build a model based on a neural network consisting of fully connected layers. There were 4 dense layers with a number of nodes from 512 to 1. The visualization of the model is shown in Fig 2.1 and Fig 2.2. In order to combine a user and an offer vector a simple concatenation was used.

Fig. 2.1 Neural Network based model loss during training and evaluation
Fig. 2.2 Neural Network based model plot

The final value of loss is close to 0.5, which is a way better result than in case of a model based on dot product.

Both dot product and model based on fully connected layers have one problem common to collaborative filtering approaches. If you have a fresh system with no historical data, you will not be able to recommend anything. This is a cold start problem.

I tried to fight this problem by involving users and offers metadata to train the model. Such hybrid models combine collaborative and content based filtering. There are many ways to extract features from data and employ them in a model. You can find a lot of useful information in Python scikit documentation: Feature extraction — Scikit or by watching these videos
Feature Engineering.

In case of our data, users and offers were labelled with categories. In order to convert categories to a form understood by machine learning I used MultiLabelBinarizer from the already mentioned Python scikit. In a nutshell, it turns categories to vectors in which all valid categories for a given user or offer are “1”. Additionally, I turned offers description to vectors using TfidfVectorizer.

The visualization of the model and its performance are shown in Fig 3.1 and Fig. 3.2.

Fig. 3.1 Hybrid model loss during training and evaluation
Fig. 3.2 Hybrid model plot

Its performance is comparable with the model based on fully connected layers. However, in case of a hybrid model a vector that is an input to the model contains metadata of user and offer and, therefore, can be used for completely new items in the system.

Conclusions

At the end of this article I would like to draw some conclusions and sum up my experiences.

The first thing I need to emphasise is that the quality of input data is crucial.

While building the user rating matrix I made a mistake: the same pair of user id and offer id could have two different ratings assigned. For example, when a user first opened an offer and then applied. This error ruined model performance completely. The optimizer was not able to tune layers weights properly.

Another significant factor is an architecture of the model.

As one could see, each model I tried presented a different performance. Moreover, tuning of model parameters like size of embeddings, type of activation function or number of nodes in each layer have an impact on the final model performance. Of course it is tricky and time consuming to find optimal parameters values. Fortunately, there are tools like Keras Tuner or Optuna that automates optimization processes.

Tensorflow version 2 and Keras API in particular simplifies building models amazingly. I had a chance to work a bit with Tensorflow version 1 and there is huge improvement in prerequisite knowledge needed to use the framework in version 2.

Efficient input data preparation is a key to build a successful machine learning application.

I had around 20GB of log to parse and then tens of thousands of objects to fetch from MongoDb. It took around 20 minutes to prepare all input data for model training. Therefore any changes in the format of input data was time consuming.

The machine learning topic is really hot and there are plenty of resources all over the Internet to study. They come from reliable sources and are appropriate for all levels of experience.

I have a feeling that this research was executed on a snapshot of input data, and in reality it should be a living project, with continuous updates of input data and recalculation of the model. That is why, in the future I would like to investigate machine learning pipelines, learn how to organize data lakes and process data in parallel using Apache Spark. I am wondering what ready to use building blocks are already available from the biggest cloud providers.

// Wojtek

Links from the article

  1. Neural networks
  2. Introduction to TensorFlow for Artificial Intelligence, Machine Learning, and Deep Learning
  3. Recommendation Engines — Machine Learning Course 9th week
  4. Deep Learning course
  5. Advanced Linear Algebra: Foundations to Frontiers
  6. Advanced LAFF
  7. Vectoring Words (Word Embeddings) — Computerphile
  8. Feature extraction — Scikit
  9. Feature Engineering

--

--

DNA Technology
DNA Technology

Everything we do we believe smart way is the only way to solve problems together. More about us: https://dnatechnology.io