Py2neo Spatial

This is an article from our archives. It was originally published in January 2015 by Simon Harrison.

Some of the more advanced databases implement a “geometry” datatype for spatial queries that might involve distances, intersections, containments or areas. To efficiently optimise these queries a special spatial index is usually created. We use the graph database Neo4j which has a spatial server extension that implements this index with an RTree.

To access our graph — which models all the lovely homes that we manage throughout the world — we rely on the REST library Py2neo. For a little fun I’ve contributed an api to Py2neo for this server extension so we can now ask questions such as “how many properties do we have in Mayfair?” or a mobile app could find the nearest dozen properties… or.. maybe just all those within a mile of the user — with a swimming pool!

So let’s give it a quick go.

note/caution/caveats/care: from here on in I assume you have Neo4j running with a disposable data store to work on, are comfortable in a bash(like) shell, the neo4j-shell and the Python (2.7.x) shell, plus are comfortable with the Neo4j web browser interface and some Cypher. Furthermore, are able to install Python dependencies, have Py2neo installed (>= py2neo-2.0.3) and have some love for graphs and maps — but let none of this stop you from continuing and simply reading along.

For this we’ll need some fun data, and what’s more fun than that of crime scenes from East London? I live in Hackney and (data on) wrongdoing here is plentiful. Data for the last year can be downloaded in CSV format which we can then import from the neo-shell using the LOAD CSV command (which, beware, does not support whitespace in headers). Given the (amended) headers:

CrimeID,Date,Latitude,Longitude,Location,Category,Status

we can create a rudimentary data model — to the neo4j-shell!

USING PERIODIC COMMIT 100
LOAD CSV WITH headers FROM "file:///Users/simon/data/hackney-crime.csv" as line
WITH line WHERE line.CrimeID <> ""
MERGE (c:Crime {crime_id:line.CrimeID, latitude:line.Latitude, longitude:line.Longitude})
MERGE (cs:CrimeScene {location:line.Location})
CREATE (c)-[:COMMITTED_AT]->(cs)
MERGE (date:Date {date:line.Date})
CREATE (c)-[:COMMITTED_ON_DAY]->(date)
MERGE (o:Offence {label:line.Category})
CREATE (c)-[:COMMITTED_OFFENCE]->(o);

note: do replace the file path with your own.

Let’s check we’ve got a sensible graph by getting some crime from Dalston.

To the browser!

MATCH (crime:Crime)-[:COMMITTED_AT]->(crimescene),
(date)<-[:COMMITTED_ON_DAY]-(crime)-[:COMMITTED_OFFENCE]->(offence)
WHERE crimescene.location =~ '(?i).*Dalston.*'
WITH offence, date, crime, count(crime) as crimes, crimescene
RETURN offence, date, crimes, crime, crimescene

As expected, we’re seeing a lot of crime here. Each of the crime Nodes (the blue ones) have two geographically aware properties (latitude and longitude) which we now want to add to a spatial index in order to be able to carry out exciting spatial queries — using Py2neo Spatial!

First check that you have the Neo4j JAVA Spatial Extension installed…

To the shell!

curl -v http://localhost:7474/db/data/

On success (when the key `SpatialPlugin` is found in the `extensions` values) get a connection to the graph.

Now you need Py2neo >= 2.0.3 which you can pip install. You also need the dependency Shapely, which again, you can pip install.

pip install py2neo shapely

Then, to the Python console!

In [1]: from py2neo.core import Graph
    In [2]: from py2neo.ext.spatial import Spatial
    In [3]: DEFAULT_DB = "http://localhost:7474/db/data/"
    In [4]: graph = Graph(DEFAULT_DB)
    In [5]: spatial = Spatial(graph)

Now we can create a layer for our geometries which we’ll imaginatively name, “London”.

In [6]: spatial.create_layer("London")

This has now created the magical index for you — the layer is the index!

Let’s check what we expect to add to this layer… to the neo4j-shell!

neo4j-sh (?)$ MATCH (crime:Crime) RETURN count(crime);
+--------------+
| count(crime) |
+--------------+
| 2212 |
+--------------+

To the Python console!

In [7]: from py2neo.ext.spatial.util import parse_lat_long
    In [8]: query = "MATCH (crime:Crime) RETURN crime;"

In [9]: records = graph.cypher.execute(query)
    In [10]: for record in records:
node = record[0]
node_id = node._id
properties = node.properties
lat = properties['latitude']
long = properties['longitude']
crime = (lat, long)
shape = parse_lat_long(crime)
crime_id = properties['crime_id']
spatial.create_geometry(geometry_name=crime_id, wkt_string=shape.wkt, layer_name="London", node_id=node_id)
print('created {}'.format(crime_id))

This operation depends on the amount of crime committed and since this is London — be patient. Once processed we can check out what we’ve created….

The “special” magical spatial index is in fact simply just another graph so all we need is some Cypher to explore it. The relationship names require prior introspection of spatial indices so I’m helping out with a litte more Cypher. This will return our new “geometry” nodes related to the “application” nodes that we initially created.

To the browser!!

MATCH (spatial_root)-[:LAYER]->(london {layer:"London"})-[:RTREE_ROOT]-(layer_spatial_root)
OPTIONAL MATCH (layer_spatial_root)-[tree_child:RTREE_CHILD]->(child),
(geom_ref)<-[tree_ref:RTREE_REFERENCE]-(child),
(geom_ref)-[gis_related_crime:LOCATES]->(crime)
RETURN spatial_root, london, layer_spatial_root, crime, tree_child, tree_ref, gis_related_crime

Expect to see something pretty like this:

The yellow nodes that you now see are geographically aware of the blue nodes — those ghastly crimes.

Well done to all those that have followed with more than just eyes. We will finish with a couple of “criminal” queries.

My IP address is often approximately here: (51.561268, -0.082662).

Back to the Python console!

In [11]: me = (51.561268, -0.082662)
    In [12: local_crime = spatial.find_closest_geometries(
coords=me)
    In [13] len(local_crime)
Out[1]: 282

hell! Thankfully, Church Street is quite well-to-do.

In [15]: spatial.find_within_distance(
"London", me, distance=1)
Out[2]: []

But drag Dalston into range….

In [14]: spatial.find_within_distance(
"London", me, distance=3)
Out[3]: [Anti-social behaviour,
Public order,
Drugs,
Drugs,
Theft from the person,
Anti-social behaviour,
Anti-social behaviour,
Bicycle theft,
Criminal damage and arson,
Drugs,
...
]

many, many more lines then follow!

The full api so far is:

    create_layer
delete_layer
create_geometry
delete_geometry
update_geometry
find_within_distance
find_closest_geometries
find_within_bounding_box

And this api is brand new and eager for people to use it so please do so for fun or for pleasure.

I hope you’ve enjoyed this little journey to at least a fraction of the amount that I have had writing it. Please contact the maintainer of the Spatial extension with any comments, questions or requests. And thanks for reading to the very bottom!

Like what you read? Give onefinestay tech team a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.