Raman Damodar Shahdadpuri
POPxo Engineering
Published in
3 min readApr 2, 2018

--

Recommendation Engine at POPxo using Snowplow and Prediction.io with Universal Recommender

First things first, the most important and time consuming part of creating a recommendation engine is data collection. If you still haven’t been able to figure that part out, head to our post where we have given a soft introduction to POPxo’s Data Pipeline using Snowplow.

Why Prediction.io?

Prediction.io is a Machine Learning Server which can be setup easily with minimum time and resource. It uses Spark to train the data so depending on the requirement, the setup can work on a single machine or on a cluster with multiple machines.

Prediction.io has many ready-to-use templates so anyone with no or little experience with data science can also start using it. The templates are classified into different sections namely Recommenders, Classification, Regression, NLP, Clustering, and Similarity. Under each section there are many templates. We were particularly interested in The Universal Recommender under the Recommenders section.

Why Universal Recommender?

Universal Recommender is an all-in-one recommendation engine template which helps us to show data on these sections of our website.

  • People who read this also read…
  • Popular stories by an author
  • People who read this also watched these videos
  • Popular stories for a user
  • People who answered this also answered…
  • Personalised feed

Unlike other templates of Prediction.io, Universal Recommender can consider multiple events for recommending any story. For instance, we can consider a page_view event and also a page_ping (time spent) event to consider what will be of more interest to a user or what will be more similar item for a given item.

Another advantage of using Universal Recommender is the rich querying experience. We can filter out content based on different properties of the item. The item properties can be set, deleted or changed by sending the properties to the Event Server.

Setup Components

  1. POPxo’s Backend: The item (stories, questions, polls, videos etc) data is sent from POPxo’s backend to the event server of Prediction.io
  2. Event Data Stream: There are two steps involved within the data stream.
  • Step 1: Kinesis Tee transforms the Avro formatted enriched data stream to JSON stream of data and sends it to another Kinesis stream.
  • Step 2: A ruby consumer fetches the data from the Enriched JSON Stream and sends the event data to the Event Server of Prediction.io

3. Prediction.io with Universal Recommender: This is where the magic (math) happens!

  • Step 1: The data that is received by the Prediction.io’s event server is saved in Hbase.
  • Step 2: The data is trained and a new model is created every few hours using the Universal Recommender. This model is saved in Elasticsearch.
  • Step 3: The query server internally queries Elasticsearch to fetch the recommendations based on the query received from POPxo’s backend.

This was an overview of how we have implemented personalisation at POPxo. If you want to give it a try, head to POPxo and let us know your experience.

Want to be a part of POPxo? Check out our job openings page and let us know if you are interested.

--

--