Marketing Automation: Recommendation Systems

In a period of information noise and an excess of information through all possible communication channels, and accurate recommendation is one of the ways to increase the loyalty of our customers and reduce the risks of their churn.

Published in

Geek Culture

11 min readNov 8, 2021

Recommender systems are an essential feature in our digital world. Recommendation engines are everywhere these days. Some of the biggest brands we engage with every day are built around one, including Netflix, Amazon, Google, and Facebook. More than a third of purchases on Amazon or Alibaba come from product recommendations. So, what is a recommendation engine, and how does it work?

What is a Recommendation engine?

A recommendation engine is a set of data science algorithms that helps find the most relevant items to a particular user or customer. Having processed a large amount of data, these algorithms can find patterns that further help to make recommendations.

This is a very good definition, but as for me, real good recommendation system also should take into account time (When sending the recommendation?) and channel of communication (What way of communication is the best for a particular customer?).

So, the best recommendation engine answer for the questions:

What recommends for the user?
When recommends for the user?
How recommends for the user?

How does a Recommendation Engine Work?

In the image below you can see that the Recommendation Engine has a circle workflow. After the model creation and we collect its feedback results and then retrain or change our model at all.

Recommendation Engine workflow (Image by Author)

How to select a Recommendation Engine according to your business needs and technical capability?

I would like to start with one of the simplest but as for me the most ancient and common using approach. This approach is not a data science algorithm it is more a data mining approach. Its name is — Association rule learning.

Association rule mining

Association rule mining finds interesting associations and relationships among large sets of data items or in the case of retail or e-commerce from transactions. This rule shows how frequently a good or service set occurs in a transaction. A typical example is a Market Based Analysis.

Association rule mining is based on three main measures:

Support is an indication of how frequently the goods or services appear in the cashier’s checks.
Confidence is an indication of how often the rule is true.
Lift shows how likely item Y is purchased if item X is appears.

Association rule analysis is usually based on transactional data, all we need is the next table:

This data we need preprocess to the following format:

The next step is to encode our pivot table, for this we can use for example this code:

def encode_units(x):
    if x <= 0:
        return 0
    if x >= 1:
        return 1

ecode_sample_set = sample_set.applymap(encode_units)

This code could be applied only to the sample set. There are a lot of implementations of end-to-end solutions for this algorithm. For example:

These solutions are good, but as for me for real cases, when we have a lot of data we should use Spark solution. FP-growth algorithm is a good realization with Pyspark.

What we should get in the result? Here is the sample with the result table:

**Association rule result** (Image by Author)

We could sort this result by any of the measures or dimensions depending on our business task. For example, the most frequently encountered products in a check are often placed next to each other on the shelves in a store or given as an accompanying product in a recommendation in an online store. In this case, the most common position is at the top of the table, but at the bottom or in the middle you can find quite interesting combinations that may not have been encountered or noticed before.

My next approach is — Content-based.

Another common approach when designing recommender systems is content-based filtering.

Content-based filtering methods are based on the item’s characteristics and a profile of the user’s preferences. This method is good when we know data about an item (name, location, description, etc.), but not about the user. In other words, these algorithms try to recommend items that are similar to those that a user bought/like/exam in the past or do it right now on the website.

The main problem of this algorithm is — “cold start”, this means that we couldn’t recommend anything for a customer until he/she doesn't take any action(View/Ruy/Rank/Comment) with goods or services. We will solve this problem with the next algorithms.

This algorithm can be very popular in many industries, but in my opinion, it is most popular in fashion or music recommendations. In the first case, our data could be images of clothes or shoes in the second case it is an audio track.

In any case, firstly we should preprocess our data and then build the matrix of similarity based on some metric. Then we could sort it and find the most similar item.

For any of these tasks, there are now many options for implementation using the latest advances in deep learning.

For image preprocessing, we could use any of the pre-train NN architectures then use transfer learning and fine-tuning approach to fit NN to our dataset. Depending on your system capability we could use one of the following architectures. This NN should solve the next task:

Object detection or Semantic segmentation— to extract only the clothes or shoes.
Use extracted image to get it embedded.

Then save all embeddings to calculate in future distance between each other or with the new one. To solve this task we could use Faiss. Faiss is a library for efficient similarity search.

For text preprocessing, we could use any implementation of the BERT approach.

For audio processing, we could use simple signal processing build with only python without any deep learning or use CNN/LSTM architecture to build audio track embedding.

This algorithm is suitable for both sides. Buyers can spend less time searching through pages of different products in a digital marketplace. Sellers can better understand customer preferences, provide a more personalized buyer experience, increase sales, and build brand loyalty.

Collaborative filtering

The next approach is maybe one of the most famous and the most commonly used ones. This approach made a revolution in recommendation systems. Collaborative filtering is a method of making automatic predictions (filtering) about the interests of a user by collecting preferences or taste information from many users (collaborating). The underlying assumption of the collaborative filtering approach is that if person A has the same opinion as person B on an issue, A is more likely to have B’s opinion on a different issue than that of a randomly chosen person. — Wiki

There are several types of Collaborative filtering:

Memory-based

This approach uses users’ rating data to compute the similarity between users or items. Typical examples of this approach are neighborhood-based CF and item-based/user-based top-N recommendations.

Item Based Collaborative Filtering — new items are recommended to users based on their similarity with the items that the user has rated- highly in the past.

User-Based Collaborative Filtering — this method aims at finding similar users and recommendations are made to a user based on what a similar user has liked. The similarity between users is calculated using either Cosine Similarity or Pearson Correlation or you can use any other similarity metrics depend on your data.

The main advantages of this approach are the explainability of the results, which is an important aspect nowadays in any data science algorithm, also it is easy to create a model, easy to retrain with new data.

It also has some disadvantages. Decreasing of performance because of big and sparsity of data. Adding new items is also a problem of re-insertion of all the elements in the structure.

Model-based

In this approach, models are developed using different data mining, machine learning algorithms to predict users’ rating of unrated items. The most common algorithm as amodel is dimensionality reduction methods. Methods like SVD, PCA compress the user-item matrix into a low-dimensional representation in terms of latent factors. The main advantage of using this approach is that instead of having a high dimensional matrix containing an abundant number of missing values we will be dealing with a much smaller matrix in lower-dimensional space. The dimensionality reduction could be utilized for either user-based or item-based neighborhood algorithms that are described here.

Here is some hierarchy of model-based approaches:

Hierarchy of model-based approach (Image by Author)

A good implementation of collaborative filtering we could find in Pyspark implementation. Here is some example(code from Pyspark documentation):

from pyspark.ml.evaluation import RegressionEvaluator
from pyspark.ml.recommendation import ALS
from pyspark.sql import Rowlines = spark.read.text("sample_movielens_ratings.txt").rdd
parts = lines.map(lambda row: row.value.split("::"))
ratingsRDD = parts.map(lambda p: Row(userId=int(p[0]), movieId=int(p[1]),
                                     rating=float(p[2]), timestamp=long(p[3])))
ratings = spark.createDataFrame(ratingsRDD)
(training, test) = ratings.randomSplit([0.8, 0.2])# Build the recommendation model using ALS on the training data
# Note we set cold start strategy to 'drop' to ensure we don't get NaN evaluation metrics
als = ALS(maxIter=5, regParam=0.01, userCol="userId", itemCol="movieId", ratingCol="rating",
          coldStartStrategy="drop")
model = als.fit(training)# Evaluate the model by computing the RMSE on the test data
predictions = model.transform(test)
evaluator = RegressionEvaluator(metricName="rmse", labelCol="rating",
                                predictionCol="prediction")
rmse = evaluator.evaluate(predictions)
print("Root-mean-square error = " + str(rmse))# Generate top 10 movie recommendations for each user
userRecs = model.recommendForAllUsers(10)
# Generate top 10 user recommendations for each movie
movieRecs = model.recommendForAllItems(10)

Deep-Learning

And of course, it was not without deep learning) In recent years many neural and deep-learning techniques have been proposed. Some generalize traditional Matrix factorization algorithms via a non-linear neural architecture or leverage new model types like Variational Autoencoders.

Here is some example with FastAI.

Also, we could use the more complex approach of deep learning, for example, mix all the possible data we have and build a super-complex neural network:

Example of NN architecture (Image by Author)

And here is some Keras implementation with users, items, and item type embeddings:

item_input = keras.layers.Input(shape=[1],name='Item')
item_embedding = keras.layers.Embedding(n_items + 1, n_latent_factors_item, name='Movie-Embedding')(item_input)
item_vec = keras.layers.Flatten(name='FlattenMovies')(item_embedding)
item_vec = keras.layers.Dropout(0.25)(item_vec)user_input = keras.layers.Input(shape=[1],name='User')
user_vec = keras.layers.Flatten(name='FlattenUsers')(keras.layers.Embedding(n_users + 1, n_latent_factors_user,name='User-Embedding')(user_input))
user_vec = keras.layers.Dropout(0.25)(user_vec)type_input = keras.layers.Input(shape=[1],name='type')
type_vec = keras.layers.Flatten(name='FlattenType')(keras.layers.Embedding(n_type + 1, n_latent_factors_user_type,name='Type-Embedding')(type_input))
type_vec = keras.layers.Dropout(0.25)(type_vec)#concat = keras.layers.merge([movie_vec, user_vec], mode='concat',name='Concat')
concat  = keras.layers.concatenate([item_vec, user_vec, type_vec],name='DotProduct')
concat_dropout = keras.layers.Dropout(0.25)(concat)
dense = keras.layers.Dense(200,name='FullyConnected')(concat)
dropout_1 = keras.layers.Dropout(0.25,name='Dropout')(dense)
dense_2 = keras.layers.Dense(100,name='FullyConnected-1')(concat)
dropout_2 = keras.layers.Dropout(0.25,name='Dropout')(dense_2)
dense_3 = keras.layers.Dense(50,name='FullyConnected-2')(dense_2)
dropout_3 = keras.layers.Dropout(0.25,name='Dropout')(dense_3)
dense_4 = keras.layers.Dense(20,name='FullyConnected-3', activation='relu')(dense_3)result = keras.layers.Dense(1, activation='relu',name='Activation')(dense_4)
adam = Adam(lr=0.003)
model = keras.Model([user_input, item_input, type_input], result)
model.compile(optimizer=adam,loss= 'mean_absolute_error')Importance of RE

This is only one example of using a deep neural network in a recommendation system. Depending on our task we could use other architecture, for example, using recurrent networks with long term memory capabilities might be more appropriate to predict the next action (next purchases):

Architecture schema example (Image by Author)

As we can see mode-base approach can solve a variety of types of problems and as others approaches have their own advantages and disadvantages:

Advantages of Model-based algorithms:

Model-based CF technique addresses the shortcomings of memory-based CF algorithms such as scalability and sparsity.
It also improves prediction performance

Disadvantages of Model-based algorithms:

Model-based CF techniques improve scalability at the cost of prediction performance.
Model building is expensive.

Let’s go on and analyze Hybrid recommender systems.

Hybrid recommender systems

Hybrid approaches can be implemented in several ways:

Create content-based and collaborative-based predictions model separate and then combine their result together;
Adding content-based capabilities advantage to a collaborative-based model;

Hybrid methods can provide more accurate recommendations than if we will using only one of the approaches. These methods also can solve common problems such as cold start and the sparsity problem.

There are several hybridization techniques:

Weighted: The score of different recommendation components is combined numerically.
Switching: The recommendation system chooses among components and applies the selected one.
Mixed: Recommendations from different models are presented to users/customers together to choose the recommendation.
Feature Combination: Features derived from different knowledge sources are combined and given to a single recommendation algorithm.
Feature Augmentation: One of the models used to compute a set of features, which is then passed to another model as input.
Cascade: Model are given strict priority, with the lower priority one model providing recommendation ties in the scoring of the higher ones.
Meta-level: One algorithm is applied and produces some sort of model, which is then the input used by the next algorithm.

Most big companies in production use hybrid approaches, it is more complex but more flexible to solve a wide range of tasks.

The most famous example of such approach realization is Netflix. All the detail you can read in this article.

All the above is only a little part of basic techniques and approaches for recommendation systems. Nowadays become more popular such approaches:

Graph Learning based Recommender Systems
Reinforcement learning-based recommender systems

But no matter what techniques and approaches we use, we should always remember the ethical component of building recommendation systems.

ETHICAL ASPECTS OF RECOMMENDER SYSTEMS

Data collection.
User profiling.
Data publishing.
Data filtering/Algorithmic opacity.
Behavior manipulation.
A/B testing.

What we should do to comply with ethical standards:

Identifying and studying the socio-psychological impact of personalized filtering;
Helping people to understand and regulate the level of privacy(Follow GDPR and CCPA);
Developing a methodology to probe the subjective ‘validity’ of the information that is provided to users based on their interests;
Engaging with corporate information service providers to reinforce ethical practices.

Conclusions

Recommender algorithms and systems have come a long way in their development and use.

History of Recommender System (Image by Author)

They are increasingly being used for more user/customer/customer engagement.

35% of the purchases on Amazon are the result of their recommender system, according to McKinsey.
During the Chinese global shopping festival of November 11, 2016, Alibaba achieved growth of up to 20% of their conversion rate using personalized landing pages, according to Alizila.
Recommendations are responsible for 70% of the time people spend watching videos on YouTube.
75% of what people are watching on Netflix comes from recommendations, according to McKinsey.
Employing a recommender system enables Netflix to save around $1 billion each year, according to this paper written by an executive:

“Reduction of monthly churn both increases the lifetime value of an existing subscriber and reduces the number of new subscribers we need to acquire to replace canceled members. We think the combined effect of personalization and recommendations save us more than $1B per year.”

Cross-selling and category-penetration techniques increase sales by 20% and profits by 30%, according to McKinsey.

It can be used in personalized marketing, online advertisements, finding the best offers for customers, providing discounts, recommending the next best offer, finding items frequently bought together for cross-selling, and many, many more.

Recommendation systems are efficient data science solutions that can help increase customer satisfaction and retention, and lead to a significant increase in your business revenues.