Getting started with Neo4j — Building a follow system.

I often find myself trying telling people about how wonderful it is to use Graph databases such as Neo4j for projects. However, it is hard to show a new concept without a good example. So here is my attempt to convince you to try it out.

Pedro Mendonça
Neo4j Developer Blog
7 min readDec 3, 2018

--

A follow system

For this article, I will run through how to make a follow system for a social network in Neo4j. If you are a Medium user it will work something like that. Any user can follow any other user. All users will then get recommended posts based on who he or she follows.

Photo by NASA on Unsplash

Database design

Here is the first place where Neo4j really shines. If you ever worked with traditional relational databases you are probably thinking of normalisation, primary and foreign keys. Neo4j takes all that away and focuses on what they like to call the Whiteboard Model. I like to call it no database design. But for the sake of completeness here is what our database should look like by the end of this.

Whiteboard Model

Users and posts will be Nodes, and they will be related to each other with Created by, Liked or Follow Relationships.

Creating Users

For users, we are going to have a very simple structure. Let’s say every user has a name and an email, that should be enough for now at least.

We also want to make sure that no two users have the same email. Almost like a primary key. Neo4j provides us with the concept of constraints so we can do the following.

Cypher’s syntax for constraints is verbose, but if you are not familiar this basically states that Neo4j will not allow two entities of the type USER to have the same email property. In other words that all emails are unique.

Now, all we have left to do is to create our first user.

If you are using the Neo4j web browser to run this query you should have gotten your first node. And should look something like this.

Our first node

As we can see from our previous query we created a node and assign it to a variable john That variable serves as a variable for the query scope. This is only useful because we want to be able to return it later so we can actually see what we have created.

If you are implementing this in an API you can skip the return step but it may be useful depending on what your end goal is so I will be including it for the rest of this article but feel free to remove it.

Because we set our constraints previously you can run the query one more time and you will be able to see that it will fail as Neo4j will not allow us to have two users with the same email.

To finish this let’s just create one more user and move on.

Creating a post

A post in our little social network will only have two properties and they will look like this:

Although posts don’t have the same constraints that users do we are going to be creating the posts in a slightly different manner. Every post is created by a user so to avoid problems our best bet is to create all posts in association to the user from the start.

If your social network allows for anonymous posting then you can create a post just like we created our users in the previous step.

Ok, this is a slightly more complex one so let me explain. This query has two parts:

  1. MATCH — We find the user which is creating this post in our case “John”. We do this because we don’t want a post to be created if there isn’t a corresponding user in the database.
  2. CREATE — We create r a relationship of type CREATED_BY and also a post with the appropriate properties, very similarly to how we created our initial user.

If you did this right we should now have something like so:

Following a user

Now to create a follow we can do something similar to what we did to create a post but this time we only need to create the relationship, as both users already exist.

Again we are doing a match and a create, this way we can make sure that the users exist before creating the follow relationship.

Liking a post

For a user to like a post, we can reuse the majority of the code from the follow query. However, there is one key difference. Because posts don’t have a unique identifier such as email we will be using Neo4j’s automatically created id.

If you have just started using Neo4j you may have missed the fact that it actually assigns an Id to all its nodes, one easy way to check this is with the following query which will return all post titles and its respective IDs.

This will return a table like this:

In Neo4j’s query language(cypher) id(n) works somewhat like a function which may not be very intuitive but I will eventually get on how we can make this more straightforward when querying for posts.

Now knowing that our post has ID = 20 we can do the following.

With that, we finish our initial database design without the need of relationship tables, normalisation or foreign keys. I think for that reason alone you have to consider Neo4j instead of a traditional relational database for any kind of related data.

To check out our progress so far we can perform the following query to return all our nodes and relationships:

This will return all the nodes and relationships in the database. Consider using the LIMIT keyword if you have a database with more than just a few nodes.

One last step

This whole process would have been a bit useless if it wasn’t for the fact that we can query this data and get relevant information. I won’t go in much detail of how these queries are working but I will try to explain what they do.

I will also be using a feature called mapping in Neo4j to convert all the output into JSON to make it more friendly to APIs.

Get all users

Note that I am specifying which parameters to return (name and email) this may not sound useful now but is very good practice in the case that your user node contains private properties such as password hashes.

Get a user with their posts

Here we are using the collect function in order to create an array of related content. This is often done in SQL with a JOIN, which results in fairly complex and lengthy queries as well as redundant results. Neo4j's mapping allows us to define this in a much more readable syntax.

Get recommended posts

This is probably one of my favourite queries in this post. It will find recommendations given a user’s email. To explain the query on its own it returns all the posts(including Neo4j’s ID) related in any way to followed users. This means that not only will you get posts written by the people you follow but also posts liked by the people you follow. And because we don’t have to pre-define our relationship types in Neo4j that means if we eventually implement more relationships such as the concept of a Commented on or Repost the recommendations will take those in consideration as well.

Note that we are also adding the id to the return object, allowing us to use that ID to perform things like adding likes to the post later on.

Conclusion

I hope that this example gives you a taste of how Neo4j works and how straight forwards it is to create very common data structure patterns. Neo4j’s graph nature works really well for a follow system, and most social network type structures. So if you have the opportunity do give this a try and see what kinds of stuff you can come up with.

--

--