Ljubica Lazarevic
Neo4j Developer Blog
9 min readAug 24, 2020

--

*Update! Now with hints and solution!*

Summer of Nodes: Final Week — Exploring the Area

Hello everybody!

Summer of Nodes 2020 is now over. If you’ve not had a chance to look at the challenges, you can always have a go at your leisure:

This week’s theme — exploring the area

Central Park in New York is one of the most popular tourist attractions in the city. Last year, Central Park saw 42 million visitors alone. This week we are going to be using this amazing location to do some exploration, with graphs!

We have imported some routes and tagged Points of Interest for Central Park, based on OpenStreetMap, and we will be asking you to take on the role of the virtual tourist. We will be using the sandbox exclusively for the challenge this week.

For those of you who are interested, the data was extracted with this plugin, and there was some Cypher data wrangling applied after import. As a result, we don’t have all the sights, but we do have a pretty good sample to work with.

Before we get stuck in…

We’d like to draw your attention to the excellent graph app, NeoMap map visualiser created by Estelle Scifo.

Points of Interest from this week’s challenge data set

This fantastic app allows you to plot spatial data onto an OpenStreetMap view. You can see an example here where we’ve plotted points from this week’s data set to view. To find out how to get going, you can read about it here.

You will probably find it helpful and fun to try plotting out the various Points of Interest we ask you to look for during the challenge. You can even plot routes!

To install NeoMap, you just need to paste the following link into the graph apps tab in Neo4j Desktop:

https://registry.npmjs.org/neomap

To run NeoMap, you will need to create a remote database in Neo4j Desktop using your sandbox credentials, and then open NeoMaps there.

If you’re not sure how to follow the above steps, check back here when we’ll add a link on how to do it soon.

Estelle also gives you an example of how to get up and running with your own locations if you want to check out your local area too! It’s also useful for seeing how to work with shortest path.

Overview of the data

We will be using the sandbox database this week. It’s probably worth mentioning a little bit about the data. Should you go and CALL db.schema.visualization() you will be greeted with this…

Don’t panic! Many of these labels we are not going to use. There are only two node label types we’re going to be using:

  • OSMNode — Think of this as a junction node — it hooks together routes from specific points
  • PointOfInterest:OSMNode — In addition to the above, this also has our points of interest, e.g. types of statues, restaurants, tennis courts, etc. and their names

And we’ll only be using the ROUTE relationship type. Here’s a simplified data model that you’ll need for the challenges:

Note that you might use either the latitude/longitude or location (point). Check out the spatial documentation on how to use either.

Beginner’s challenge: Walkabouts

This week we’ve got some questions for you to navigate the park with. Some of them may push you a bit out of your comfort zone, but you’ve got this! All of the questions are achievable with Cypher, and the reference card will be very helpful. Don’t forget, if you’re really stuck, ask on the forum. We’ve also have some extra hints to get you on the road (no pun intended, or maybe it was?!).

We’d like you to answer the following:

  • You start your adventure at a clock (type:’clock’), What’s its name?
  • From this position, what other Points of Interest is within 100m from a straight line?(Hint, you will need to use the spatial functions. Do not assume anything about the relationships just yet)
  • How far apart are they as a straight line? (Hint, check out the spatial functions)
  • How far is the actual distance? (Hint, check out shortestpath())
  • You’re thinking of going for a cycle ride, but first a coffee! Locate which cafe is closest to a bicycle rental place. What’s the name of the cafe? (Hint, you may need to list out all of the Point of Interest types to determine the correct syntax)
  • If you want to (not required), compare your outputs of shortestPath() against weighted shortest path with the Dijkstra APOC function

Don’t forget about NeoMap! Why not have a go at visualising the different Points of Interest you discover?

Some hints

This is a bit harder than normal, so we’ll give you some early hints to get you started:

  • Do read up on the documentation on how to use the spatial functions. When you are comparing Points of Interest to see how far apart they are as a straight line, you will not be using relationships, e.g. MATCH (p1:PointOfInterest {some properties}), (p2:PointOfInterest {some other properties})
  • When you use shortestPath() you will have a path that contains relationships and nodes. You will need to extract the relationships out to get distance. This will look something like this snippet:
WITH relationships(path) AS relCollection //extract out the rels as an array
UNWIND relCollection AS rel //unwind the array into individual rels
//now you can do stuff with rel
  • You will also be doing some aggregation, the sum() function will be helpful
  • You may notice sometimes that you might get a long distance when using shortestPath() between Points of Interest. Don’t panic! This is due this function looking for the minimum distance based on number of hops, not based on properties on the relationship (e.g. distance!). This function will find a valid shortest path (not all of them). If you want to do weighted shortest path (optional), have a look at apoc.algo.dijkstra()
  • If you only want a list of distinct items, e.g. unique sports pitch names, use the DISTINCT keyword, e.g. RETURN DISTINCT poi.type

The data

Hints please!

Of course! Catch up on the hints stream here.

Solutions

The beginners solution video stream can be viewed here.

  • You start your adventure at a clock (type:’clock’), What’s its name?

For this, we’ll bring back all Points of Interest that have the type of clock, and see the name:

MATCH (p:PointOfInterest {type:'clock'})
RETURN p.name

Which will return: Delacorte Musical Clock

  • From this position, what other Points of Interest is within 100m from a straight line?

We are now going to use the spatial functions to what other points of interests are within 100 m of the clock:

MATCH (p1:PointOfInterest {type:'clock'}), (p2:PointOfInterest)
WHERE p1<>p2 AND distance(p1.location,p2.location) < 100
RETURN p2.name

There’s only one place nearby: Zoo School

For this, we just make a slight change to the query above (and there’s more than one way of doing it!):

MATCH (p1:PointOfInterest {type:'clock'}), (p2:PointOfInterest {name:'Zoo School'})
RETURN distance(p1.location,p2.location)

Which returns 21.5 m (to 1 decimal place)

We’re going to use shortestpath() for this. Bear in mind that:

  • This is based on the shortest number of hops between nodes, not absolute distance, and
  • There could be more than one valid shortest path between two points, this function will only return one of them (at random).

As a result, there could be a number of different answers:

MATCH path=shortestpath((p1:PointOfInterest {type:'clock'})-[:ROUTE*]-(p2:PointOfInterest {name:'Zoo School'}))
WITH relationships(path) AS rels
UNWIND rels AS rel
RETURN sum(rel.distance)

In this scenario, there is only one route, so the answer is: 212.2 m (to 1 decimal place).

  • You’re thinking of going for a cycle ride, but first a coffee! Locate which cafe is closest to a bicycle rental place. What’s the name of the cafe?

Firstly, we’ll want to check the correct type property value to use, we could use something like this:

MATCH (p:PointOfInterest)
WHERE p.type contains 'cafe' OR p.type contains 'cycle'
RETURN DISTINCT p.type

This confirms we want to use the types of cafe and bicycle rental.

Now, let’s find the closest (bearing in mind the warning around shortestpath() from above):

MATCH path = shortestpath((p1:PointOfInterest {type:'cafe'})-[:ROUTE*]- (p2:PointOfInterest {type:'bicycle rental'}))
WITH p1, p2, relationships(path) AS rels
UNWIND rels AS rel
RETURN p1.name, p2.name, sum(rel.distance) AS dist ORDER BY dist

There may be different answers, one of them may be: Petrie Court Cafe.

If you were feeling adventurous, you could have tried this challenge with the Dijkstra APOC function:

MATCH path = (p1:PointOfInterest {type:'cafe'}),(p2:PointOfInterest {type:'bicycle rental'})
CALL apoc.algo.dijkstra(p1, p2, 'ROUTE', 'distance') YIELD weight
RETURN p1.name, p2.name, weight ORDER BY weight

You will get: Le Pain Quotidien.

Experienced challenge: Fountain flight

So, it turns out you’re on bit of a mission. You’re very keen to get through the park, but it’s a very hot day, and you want to make sure you pass every single fountain through the park, in the most efficient way! We’d like you to tell us, what is a likely minimum distance to visit all of the fountains (type:’fountain’) in the park?

What are we looking for?

  • Minimum distance to visit all of the fountains Points of Interest (type:’fountain’)
  • Whilst you can ‘brute force’ the answer, why not have a look at what’s available in the APOC and GDS libraries. Do any of the shortest path algorithms help? NeoMap might be helpful to help visualise the task at hand.

The data

Hints please!

Of course! Catch up on the hints stream here.

Solution

You can catch up on the experienced solution stream here.

Let’s have a look at a potential way to solve this particular challenge of the minimum distance to visit all the fountains:

  • Find all the shortest paths between all the fountains
  • As the points conveniently form a ‘line’ of sorts, we could use Minimum Spanning Tree from the Graph Data Science library to tell us the shortest path to connect all the points. We just need to determine a sensible start point

To link all the fountains to each other by shortest distance, let’s use apoc.algo.dijkstra(), to make it more performant, we’ll wrap it in apoc.periodic.commit(). We will create a new relationship between fountain pairs and note the distance between them:

CALL apoc.periodic.commit("
MATCH (p1:PointOfInterest {type:'fountain'}), (p2:PointOfInterest {type:'fountain'})
WHERE p1 <> p2 AND NOT (p1)-[:SHORTEST_PATH]-(p2)
WITH p1, p2 limit $limit
CALL apoc.algo.dijkstra(p1, p2, 'ROUTE', 'distance') YIELD weight
CREATE (p1)-[:SHORTEST_PATH {distance:weight}]->(p2)
RETURN count(*)
", {limit:1})

As the points line up, we could use Minimum Spanning Tree to link all of the points together, and it will give us ‘one’ path with no branching off. Firstly, let’s figure out a sensible start point. We’ll do that by querying all the points with SHORTEST_PATH and pick the one that appears most frequently. We're using internal ids as there are lots of fountains named 'fountain':

MATCH (a:PointOfInterest)-[r:SHORTEST_PATH]->(b:PointOfInterest) 
RETURN id(a), id(b), r.distance ORDER BY r.distance desc

Now, let’s use this as a start point for minimum spanning tree. This will write new relationships between the nodes. We’ll again write the distance on this new relationship type:

MATCH (p:PointOfInterest) WHERE id(p) = 3490
CALL gds.alpha.spanningTree.minimum.write({
nodeProjection: 'PointOfInterest',
relationshipProjection: {
SHORTEST_PATH: {
type: 'SHORTEST_PATH',
properties:'distance',
orientation:'UNDIRECTED'
}
},
startNodeId:id(p),
relationshipWeightProperty:'distance',
writeProperty: 'MST',
weightWriteProperty:'distance'
})
YIELD effectiveNodeCount AS n
RETURN count(n)

Last but not least, let’s get the distance:

MATCH p = ()-[r:MST]->()
RETURN sum(r.distance)

Which will give us 5023.7 m (to 1 decimal place). A rather nice, hearty walk through Central Park :).

--

--