Tips and tricks in Neo4j Bloom

Looking at some Bloom patterns to help explore your data

Ljubica Lazarevic
Neo4j Developer Blog
7 min readJan 9, 2020

--

Introduction

Neo4j Bloom is a graph visualisation, exploration and communication tool. It uses near-natural language querying without reliance on user knowledge of Cypher.

Neo4j Bloom only works with the Enterprise Edition of the database. Bloom is available by license, and also through the Startup Program.

Getting started in Bloom

Whilst this post is targeted more at users who have had some experience with Bloom, it may be helpful to those just starting out with Bloom.

If you’re completely new to Neo4j Bloom, I would recommend you explore these very useful resources:

Introducing the Bloom patterns

A reasonably common question I’ve encountered with Bloom has been along the lines of the best way to ask a question. Many Bloom search phrases tend to be very similar, depending on the type of question. The goal of the patterns presented is to try and categorise these into groups. These patterns should help quickly identify how to ask questions in the form of Bloom phrases. The following patterns we’ll cover are:

  • Specific path
  • Shortest path
  • Node paths
  • More than one of
  • Extended more than one of

We’re now going to go over these Bloom patterns to assist exploring your data. We’ll have a look at how we get information of interest back using near natural language, how we can explore our data based on our domain knowledge of it. We’ll also flag any ‘be aware’ situations with the pattern, such as unexpected behaviours. I also covered these patterns during my GraphConnect ’18 talk if you prefer the video version.

The data

We will be using a cut-down data set from Kaggle which has data from the Summer and Winter Olympics between 1896 and 2016.

If you’d like to follow through this post, you can import the data running this set of Cypher queries in Neo4j Browser. You will need to have multi-line statements enabled. If you’re not sure how to do that, you can find more information here.

One thing to be aware of — the data set does include athlete initials, middle names, etc. so you will see verbose names in some of the examples.

Our data model

Pattern #1 — Getting a specific path

Why use this pattern?

This is a great example of being able to ask a question in near natural language in Bloom, powered by a sensible data mode. We use this pattern when we want to retrieve all of the information from a specific start point. For example we may be asking a question from an anchor point.

Question 1 — Which Olympic games did athlete Helen Glover compete in?

There are a few interesting observations to point out with our Bloom search phrase:

  • You don’t need to tell Bloom that Helen Glover is an Athlete. For databases with less than a thousand nodes, Bloom will scan the properties and try to find which nodes match the string you’ve provided. For larger databases, Bloom will have a look at what indexes are in use, and use those to auto-populate lists with suggested names, and try and match the provided string.
  • Bloom will take your relationship types and turn them in to user-friendly alternatives. We did not need to put in PART_OF — Bloom will take relationship types, convert them to lower case and split out words accordingly (e.g. replacing underscores with spaces). This not only makes the user experience of searching for data more friendly, but places that important emphasis on a good data model.

It’s also worth noting that Bloom will also fill the gap for you. This means that you don’t always have to specify all the categories and relationships, in fact, you can miss either all the categories or all the relationships. For example, the following will both do the same thing:

Pattern #2 — Showing the shortest path

Why use this pattern?

This pattern is very useful for starting to understand the shortest path between two nodes.

Question 2 — What’s the shortest path connecting athletes Helen Glover and Serena Williams?

To answer this question requires a couple of steps:

  1. Bring back the Helen Glover and Serena Williams nodes, we can do this by running these two phrases:

2. Select both nodes and then right-click for the path->shortest path:

A word of warning: The shortest path returned may not necessarily be the one you were expecting. In this example, there could be a number of things that would result in the shortest path of equal length, such as common countries, etc. The shortest path function will only return one of the shortest paths, not all of them.

Pattern #3 — Paths between nodes

Why use this pattern?

This pattern is very useful for revealing potentially different paths of a certain pattern between set start an end points. For example, we encountered that we may get interesting/unexpected shortest paths returned in the previous example. Here, if we know paths of interest, we can force the pattern to get responses more in-line with our expectations.

Question 3 — What Olympic games link athletes Helen Glover and Serena Williams?

or, allowing Bloom to fill the gaps for us:

Pattern #4 — More than one type of

Why use this pattern?

This pattern is useful for when you’re looking for more than one instance of an element. The way we approach this in Bloom is repeat the item we’re looking for more than one of around the ‘pivot’ point of our question.

Question 4— Which cities have held the Olympic games more than once?

Or

In this example, we want to know where there’s more than one of Games, so we use City as a pivot point, and repeat Games either side of it.

Please bear in mind this is a reduced data set, so not all cities which have hosted multiple Olympics (e.g. Athens) will be displayed.

Pattern #5 — Extended more than one type of

Why use this pattern?

Much like the previous pattern, we want to find more than one instance of an element. However, the element in question isn’t always conveniently connected to our chosen ‘pivot’ point. We still use the same principles from the previous pattern, and our query will look very much like a palindrome!

Question 5— Which athletes have won more than one gold medal?

For those of you who have not encountered it, we’re now calling properties on the nodes, here we’re calling the type property on Medal. This allows us to specifically only query Gold medals. As this property will not have an index on it (there are only three variants), it will not be automatically picked up by Bloom so we need to specifically query for it. Of course, you can query node properties with indexes this way as well! You will notice that we mirror the phrase around our pivot point, which is Athlete.

Some words of warning:

  • Nodes with more than one path (relationship) will be revisited.
  • You may notice only 327 Athlete nodes returned, and this figure should be a lot higher. This is because to improve performance, Bloom will place (configurable) internal limits on the amount of data returned, which may cause result sent truncation. Whilst Bloom is an excellent tool to examine samples of data, it cannot guarantee returning all matching results.

Summary

In this post we’ve explored some categorisations of Bloom phrases and examples to assist your data exploration.

We’ve also covered some of the mechanics of how Bloom works, and some scenarios to be aware of.

--

--