Graph Data Science with Neo4j Book

Why and what’s different from my previous book

4 min readFeb 11, 2023

TL;DR

Latest book Graph Data Science with Neo4j (GDSN) covers new features of the Neo4j’s Graph Data Science library, including its handy Python client and the introduction of machine learning pipelines (classification and link prediction) directly in the library. It also contains a new chapter about the Pregel API, the way to go to extend the GDS and implement your own algorithm. On the other hand, it has less focus on the algorithm internal working compared to its older brother, Hands-on Graph Analytics with Neo4j (HOGAN).

As a general advice, I would suggest to start from the newest one (GDSN), which is more up to date. If you have specific needs in terms of algorithm understanding, then you can focus on chapters 6 (for centrality measures) and 7 (for community detection algorithms) of HOGAN.

On the other hand, if you have already read my previous book, Hands-On Analytics with Neo4j (HOGAN), the following chapters from GDSN will bring new information to you: chapters 5, 6, 8 and 10.

Slightly more details are given below, about the content of the new book and who could still be interested in the first one.

New book: target audience and content

Graph Data Science with Neo4j is for Data Scientists even with little practical experience with data science. Topics like classification, model training, training sets or confusion matrix are used in this book without prior explantation, but it does not dive much deeper. On the other hand, it assumes no or very little knowledge about Neo4j and Graph Data Science. Here is a summary of the topics covered in this book:

Graph dataset: what it is, how to find or build a graph dataset
How to understand a graph dataset: similar to classical metrics like mean value or standard deviation, what are the graph metrics that help us understand how the graph looks like. This is done using Neo4j and neodash for chart visualization. Introduces centrality and community detection algorithms.
Graph visualization, using the Neo4j Bloom application or Gephi
Machine Learning on Graphs: using the tools available in the GDS, such as pipelines and embedding algorithms, we learn how to make predictions on a graph dataset, using both node classification and link prediction techniques.
Extend the GDS with Pregel API: an implementation of PageRank is shown, including unit tests.

In the next section, I’ll try to explain why you may want to read this book now.

Why reading it today?

Depending on whether you have already read my first book, Hands-On graph Analytics with Neo4j (HOGAN) or not, here are the reasons that could make you choose to read this newly released book.

Option A: you are new to the field of graph data science

In other words, you have not read Hands-On Graph Analytics with Neo4j. Then it is probably better for you to start from the most up-to-date book, since all code is running with the latest versions of the software used in the book (Neo4j and GDS mainly).

Compared to other books on the same topic on the market, the approach is oriented towards the “how”, rather than the “what”, in a beginner-friendly way.

Why HOGAN is still worth a read?

HOGAN was released in 2020. It was already relying on the Neo4j Graph Data Science library. A large part of it was dedicated to algorithm understanding, the reader was guided through a few implementations of PageRank or Louvain algorithms. In the latest book GDSN, this part has been reduced, and only a few lines are talking about the PageRank algorithm in the last chapter, when we implement it using the Pregel API. So, if you are interested in a deeper understanding of these algorithms, you can have a look at chapters 6 (for centrality algorithms) and 7 (for community detection algorithms) of HOGAN.

Note that HOGAN chapter 4 about spatial data is partly outdated and has not been updated in the new book. Maybe room for some future blog posts.

Option B: you have already read HOGAN

In this case, you will find some overlapping between the two books, which are about very similar topics. I would not recommend reading GDSN from end-to-end, since you will find a lot of content you already know about. However, the following chapters cover really new topics, that you may find interesting:

Chapter 5: Visualizing Graph Data: covers tools like Neo4j Bloom and Gephi
Chapter 6: Building a Machine Learning Model with Graph Features: use of the GDS Python client
Chapter 8: Building a GDS Pipeline for Node Classification Model Training: introduction to pipelines in the GDS
Chapter 10: Writing your Custom Graph Algorithm with the Pregel API in Java: how to extend the GDS by writing your own message passing algorithm.

If you have any remaining question, please feel free to reach out to me, I’ll try to guide you as best as I can.