Creating product recommendations using neo4j
In this blog post, I will show you how to use neo4j to create product recommendations. By tracking the users’ activity, you can find products, your users have in common and easier present relevant products for the users
Intro
If you work on large e-commerce sites, you will inevitably run into the problem of showing product recommendations for your users. Usually, we use a third-party service for doing all the hard work, such as Raptor. But maybe, we don’t need all those fancy features… We just want to show related products, based on what other users has shown interest in.
One way of creating product recommendations, is to collect all our user data in a graph database, and then finding relevant products, by traversing that graph.
Neo4j is arguably the most popular graph database currently, so in this blog post, I will show some of the features of neo4j, and how we can use it for generating product recommendations, based on user activity.
Collaborative Filtering
Collaborative filtering is the process of filtering information, using the information gathered by many users.
Imagine an e-commerce website with lots of users. Every user is navigating through the categories, and product pages, maybe even adding some products to their shopping basket. All this information could be useful for other people, who are looking for similar products.
By tracking the activity of our users, navigating through our website, we can use that data as a form of collaborative filtering.
By using a graph data structure for storing the data, we can easily traverse through the graph, finding products our visitors have in common.
We can store our users and products as nodes, connected by edges, to represent interest between users and products.
We can store a score value on the edges between the nodes, to create a weighted graph, making us able to sort by weight, and show the most relevant products first.
Neo4j Overview
Considering our data model, a graph database would be the ideal database for storing our data.
If you google “Graph database”, neo4j is pretty much the first result popping up. There are other graph databases, but neo4j seems to be the industry leader.
neo4j is an open-source graph database, written in java and scala. There is a free version available, with all the most basic features that you’ll need. If you need more advanced features like high-availability, you’ll need to dish up with some cash.
neo4j stores the whole graph, with the relationships added on create-time, which means we don’t have to create a graph in memory, every time we want to make a query. The graph model is always there, on the file system. This should theoretically save us a lot of expensive computations.
Since neo4j is a NoSQL database, nodes and relationships can contain properties, which are stored as key-value pairs. Property values can be of primitive types, or arrays of a primitive type.
Nodes can have labels assigned to them, so for example you can distuinguish from a Person node and a Product node.
The query language used with neo4j is called Cypher. It has some interesting features, such as an ASCII-art like matching syntax, for defining the pattern for our queries. A match query in cypher is basically written as arrows pointing from nodes to other nodes, which in my opinion makes cypher queries really intuitive and easy to read.
You can find neo4j driver libraries for all the major programming environments.
Our Data Model
In this post, i’m going to be creating a product recommendation engine, that can return product id’s of relevant products for a user, based on the activity of other users.
Our data model will be looking something like this:
The blue nodes are Person nodes, and the yellow nodes are our Product nodes.
The Person nodes represents users on our website, and the Product nodes are our products.
The Person and Product nodes are connected by a relationship labelled “LIKES”.
The Person node can contain all kinds of information about the user, but most importantly it contains the user id. Similarly the Product node contains a product id.
The “LIKES” relationship between the nodes, contains a “score” attribute, which is the weight of the edge. For example, adding a product to the basket would give a higher score, than just viewing the product page.
Creating Nodes
Creating nodes using Cypher is pretty simple.
Use the CREATE keyword like this, to create a Person node:
CREATE (p:Person { name: "Keanu Reeves" })
We have now created Keanu Reeves as a node with the label “Person”, and a name property with his name as the value.
To create a relationship between Keanu Reeves, and one of our products, like shown in the graphic above, use the CREATE keyword:
MATCH (pers:Person),(prod:Product)
WHERE pers.name = "Lars" AND ID(prod) = 1900
CREATE (pers)-[l:LIKES]->(prod)
RETURN l
We start by finding our two nodes, with a MATCH query.
Then we just use CREATE to create our relationship between the person and product nodes.
Registering User Event
So we want to gather information on the interactions of our users. We can do that by assigning a score to the different events, that get triggered by our users.
For example; 10 points for viewing a product page, 100 points for adding a product to the basket.
A cypher query for registering such an event, could look like this:
MATCH (prod:Product),(me:Person)
WHERE ID(prod) = {ProductId}
AND ID(me) = {UserId}
MERGE (me)-[l:LIKES]->(prod)
ON CREATE SET l.score = {AddScore}
ON MATCH SET l.score = l.score + {AddScore}
RETURN 0
First we use the MATCH keyword to find the two nodes we want to create a relationship between.
The MERGE keyword either matches an existing record, or creates a new one. So if no relationship between our user and product node exists, we create one and set the score to our score value. If a relationship exists, we just add our value to the score.
By using ON CREATE and ON MATCH, we can differentiate between creating and updating a relationship, and change up the logic for setting the score attribute.
If a user actually ends up buying a product, it might be a good idea to register that event, as a distinct relationship type in neo4j. By doing that we can easily query for all products bought by a specific user.
To do that, simply change the label of the relationship, in the example above:
MERGE (me)-[b:BOUGHT]->(prod)
Querying Our Graph
Now that we have some data in our graph, lets try and get some product recommendations.
Lets say we’re looking at a specific product right now, and we want some recommendations for other products. We can query our graph database for products our other users, who viewed this same product, has also shown interest in.
A simple query for returning some product recommendations for Keanu:
MATCH (me:Person)-[meLike:LIKES]->(myProd:Product)<-[:LIKES]-otherPerson:Person)-[otherLike:LIKES]->(otherProd:Product)
WHERE me.name = "Keanu Reeves"
AND otherLike.score > 10
AND ID(myProd) = 1900
RETURN otherProd, otherLike
ORDER BY otherLike.score DESC
The first part of the MATCH query, shows our traversal through our initial product, to every user who has an interest in it, and then to other products those users like.
We can limit the score of the relationships we go through, if we want to make sure, the product recommendations are somewhat interesting to us.
Heres the result of the query above:
As you can see, our query returned two new products Keanu has not seen before.
In Conclusion
We learned that you can quickly whip up a simple product recommendation engine, without complex machine learning algorithms, huge distributed computing clusters, and without having to use a potentially expensive service, like Raptor.
For a lot of projects, a simple graph database can solve our product recommendation needs.