Beer Recommendations using Collaborative Filtering with Neo4j
In this post, I’ll outline how to use a Neo4j graph database to generate user recommendations for a data set consisting of users, products, and user ratings for those products.
For my data set I’m using a database of 30,000 different beers (pulled from brewDB’s open API), and 100 users (I asked facebook friends to rate some beers).
For the recommendation engine, I’m going to explain how to implement a user based collaborative filtering algorithm using Neo4j’s Cypher query language.
The approach is very straightforward:
We will first calculate the similarity(distance) between each user, based on each user’s individual beer preferences. Then, for a given user, we will find the N most similar(closest) users to them, and we will aggregate those users’ beer preferences in order to calculate beer recommendations for the current user. If this seems cloudy right now, bare with me, we will take it step by step.
Step 1: Define database structure
To keep things simple we’re going to use a graph database containing user nodes and beer nodes. User nodes will be connected to beer nodes through ‘rating’ relationships. Every time a user rates a beer, we draw a new ‘rating’ relationship between that user and the beer they rated. In my beer rating app, users can rate beers on a scale of 1–5 stars, so the ‘rating’ relationship will store a ‘rating’ property which represents the number of stars the user rated the beer.
[User 1] — 4 stars→[Blue Moon]
Step 2: Calculate user similarities
The next thing we’re going to do is calculate a similarity index between each user. In Neo4j we do this by creating a ‘similarity’ relationship between each user node. As I mentioned before, relationships in Neo4j can have properties, so each similarity relationship will have a ‘similarity’ property which will store the similarity index (a number between 0–1 that represents how similar two users are to each other). That index can be calculated using any kind of similarity/distance metric you choose. We are going to use the Euclidean distance.