Optimising graph lookups in MongoDB

Graph databases such as Neo4J specialise in traversing graphs of relationships — such as those you might find in a social network. Many non-graph databases have been incorporating Graph Compute Engines to perform similar tasks. Since MongoDB 3.4 release, we now have the ability to perform simple graph traversal using the $graphLookup aggregation framework function.

$graphLookup is documented in MongoDB Jira SERVER-23725. The basic syntax is shown here:

I started playing with this capability originally using the POKEC dataset which represents data from a real social network in Slovakia. The relationship file soc-pokec-relationships.txt.gz contains the social network for about 1.2 million people. I loaded it into Mongo using this perl script. The following pipeline did the trick:

gzip -dc ~/Downloads/soc-pokec-relationships.gz |perl loadit.pl|mongoimport -d GraphTest -c socialGraph --drop

Now we have a collection with records like this:

We can expand the social network for a single person using an aggregation pipeline like this (I’m using dbKoda for convenience here, but you could do the same thing on the mongo command line):

GraphLookup (in dbKoda)

What we are doing here is starting with person 1476767, then following the elements of the friends array out to two levels — i.e.: to “friends of friends”.

Increasing the maxdepth exponentially increases the amount of data we have to cope with. This is the notorious “seven degrees of separation” effect — most people in a social network are linked by 6–7 hops, so once we get past that we are effectively traversing the entire set. Unfortunately, this meant that traversing more than 3 deep causes us to run out of memory:

The graph lookup can only consume at most 100MB of memory, and currently doesn’t spill to disk, even if the allowDiskUse clause is specified within the aggregation arguments. SERVER-23980 is open to correct this but it doesn’t appear to have been scheduled yet.

So I tried building a “flatter” network so that I wouldn’t run out of memory. This JavaScript builds the network and this Javascript runs some performance tests. I tried expanding the network both with and without a supporting index on the connectToField (person) in this case. Here’s the results (note the logarithmic scale):

For shallow networks, having an index on the connectToField makes an enormous difference. But as the depth increases, the index performance advantage decreases and eventually performance matches that of the unindexed case. In this example data that just happens to be at the “7 degrees of separation” but it will clearly depend on the nature of the data.

The $graphLookup operator is a very powerful addition to the MongoDB aggregation framework and continues the trend of providing richer query capabilities within the server. Mastering the aggregation framework is clearly a high priority for anyone wanting to exploit the full power of MongoDB.

Try out dbKoda — an open source, free IDE now available for MongoDB! You can experiment with indexing options, build queries and aggregates and analyse explain plans from within a modern graphical interface. The $graphLookup aggregation shown in this article was generated using dbKoda’s aggregation builder, which makes building complex aggregations a breeze.




Occasional posts relating to MongoDB Database performance tuning. From the authors of the Apress Book “MongoDB Performance Tuning” https://www.apress.com/us/book/9781484268780.

Recommended from Medium

Damage VFX using Animated Sprites in Unity

Getting Started with Python for Google Cloud Functions

How to Use New Line Character In Text Widget?

Digital Healthcare Platform Website Clone.

Getting to Know Go, Python, and Benchmarks

DFINITY 2019 — Community Review

Announcement of AntiBet(ANTI) launches on COSOSWAP NFT Launchpad (0604)

My new book “Game Development with Unity for .NET Developers” is ready for pre-order on Amazon!

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Guy Harrison

Guy Harrison

CEO at alwaysNFT, CTO at ProvenDB.com. Author of many books on database technology. Hopeless old geek. http://guyharrison.net

More from Medium

Containerizing Go + React & Automating w/ GitHub Actions | Part 1

How to Handle Objection.Js

MongoDB: Avoid using unbounded arrays in documents

Using Mongoose Models with TypeScript, the EASY Way