# Neo4j Graph Algorithms Release — Random Walk and Personalized PageRank

Earlier this week we released a new version of the Neo4j Graph Algorithms library, almost a year to the day since the library was launched on 24th July 2017.

Update: The O’Reilly book “Graph Algorithms on Apache Spark and Neo4j Book is now available as free ebook download, from neo4j.com

# tl;dr

This release sees the addition of the Random Walk algorithm, as well as support for a basic variant of Personalized PageRank.

You can install the latest release directly from the Neo4j Desktop in the ‘Plugins’ section of your project. Jennifer Reif also has a detailed post explaining how to install plugins if you haven’t used any yet.

If you’re installing the library on a server installation of Neo4j you can download the JAR from the releases page.

# Random Walk

A random walk means that we start at one node, choose a neighbor to navigate to at random or based on a provided probability distribution, and then do the same from that node, keeping the resulting path in a list. It’s similar to how a drunk person traverses a city.

Let’s see how this algorithm works on the movies dataset that comes with Neo4j. If you want to play along you’ll need to run the :play movies command and then click through to import the data.

The following query will find 1 random walk of 10 hops starting from Tom Hanks:

`MATCH (person:Person {name: “Tom Hanks”})CALL algo.randomWalk.stream(id(person), 10, 1, {path: true})YIELD pathRETURN path`

We can return a path as we have above, or we can return a list of nodeIds. In the following query we do this and then find the corresponding nodes and output either the name property if one exists (i.e it’s a Person node) or if not then the title property.

`MATCH (person:Person {name: “Tom Hanks”})CALL algo.randomWalk.stream(id(person), 10, 1, {path: true})YIELD nodeIdsUNWIND nodeIds AS nodeIdMATCH (node) WHERE id(node) = nodeIdRETURN coalesce(node.name, node.title) AS node`

Since this algorithm has an element of randomness we’ll get back a different set of nodes than in the first example:

By default the algorithm chooses the next neighbour to visit randomly, but you can choose to use a node2vec style probability distribution instead by passing in the parameter mode:node2vec. You can learn more about this algorithm in the documentation.

Thanks to Freya Behrens, Sebastian Bischoff, Pius Ladenburger, Julius Rückin, Laurenz Seidel, Fabian Stolp, Michael Vaichenker and Adrian Ziegler of the MetaExp-Project for their work implementing the algorithm.

# Personalized PageRank

The library has supported the PageRank algorithm since it was released, but we didn’t have support for a common variant called Personalized PageRank.

With Personalized PageRank, rather than getting a list of the most influential nodes in a graph as a whole, we get the most influential nodes relative to a set of source nodes that we provide.

Personalized PageRank can be used in recommender systems, and the paper WTF: The Who to Follow Service at Twitter explains how they use it to present users with recommendations of other accounts that they may wish to follow.

We still use the same PageRank algorithm as before, but we can now pass in a sourceNodes parameter to which we should provide the list of nodes that we want to run Personalized PageRank around.

In the following example we’ll run it for Keanu Reeves, but we could pass in more that one node if we wanted:

`MATCH (p:Person {name: “Keanu Reeves”})CALL algo.pageRank.stream(null, null, {direction: “BOTH”, sourceNodes: [p]})YIELD nodeId, scoreMATCH (n) WHERE id(n) = nodeId AND n <> pRETURN coalesce(n.name, n.title) AS node,        labels(n)[0] AS label,        scoreORDER BY score DESCLIMIT 5`

This query return the following results:

Not too surprising — those are the movies that Keanu Reeves acted in so we’d expect them to show up.

We could filter those results to show the top ranking people:

`MATCH (p:Person {name: “Keanu Reeves”})CALL algo.pageRank.stream(null, null, {direction: “BOTH”, sourceNodes: [p]})YIELD nodeId, scoreMATCH (n:Person)WHERE id(n) = nodeId AND n <> pRETURN coalesce(n.name, n.title) AS node,       labels(n)[0] AS label, score,       [node in nodes(shortestpath((p)-[*]-(n))) |        coalesce(node.name, node.title)] AS pathORDER BY score DESCLIMIT 5`

This query returns the following output:

All these people have worked directly with Keanu. What if we want to get some recommendations for people that he might want to work with?

We could write the following query:

`MATCH (p:Person {name: “Keanu Reeves”})CALL algo.pageRank.stream(null, null, {direction: “BOTH”, sourceNodes: [p]})YIELD nodeId, scoreMATCH (n:Person)WHERE id(n) = nodeIdAND n <> pAND length(shortestpath((p)-[*]-(n))) > 2RETURN coalesce(n.name, n.title) AS node,       labels(n)[0] AS label, score,       [node in nodes(shortestpath((p)-[*]-(n))) |        coalesce(node.name, node.title)] AS pathORDER BY score DESCLIMIT 5`

This query only returns people that are at least 2 hops away from Keanu Reeves.

We’ve added the most basic variant of this algorithm so please let us know if it works for you or if you need some other features supported by sending an email to devrel@neo4j.com

# Bug fixes

This release also contains bug fixes for the formula of the Wasserman Faust version of the Closeness Centrality algorithm, and an issue with the Delta Stepping algorithm when returning distances for unreachable nodes. We’ve also fixed an edge case when loading nodes which link to themselves.

The Yens k-shortest path algorithm has also been updated to store weights along with the stored shortest path relationships

We hope you enjoy using these new algorithms and if you have any questions or suggestions please send us an email to devrel@neo4j.com

--

--

## More from Neo4j Developer Blog

Developer Content around Graph Databases, Neo4j, Cypher, Data Science, Graph Analytics, GraphQL and more.