Diving into GraphQL and Neo4j with Python
TL;DR: If you don’t want to read and just want the code, check the Github repo.
GraphQL has been a pretty hot topic lately and after attending PyCon Nove (which was amazing) and watching a few talks about GraphQL and Neo4j got us really interested to try them out together, which surprisingly didn’t seem like a common thing 🤔.
We’ll try to explore what GraphQL together with a Graph Database has to offer, therefore this is not aimed as a tutorial, so we’ll be “leaping” through some basic steps by linking to external links and documentation. We’ll start exploring Neo4j and its different querying language and then move to its Python driver py2neo and its object-graph mapping (OGM) and finally move to Graphene which is Python’s GraphQL implementation and tie everything together with Flask by building a simple financial management API.
Our API is gonna be designed around the user, which we are calling a Customer, in addition to that, we’ll have Products, Receipts and Stores. We have all the relations between them in the graph below.
This is actually what we’ll be modelling our data to in Neo4j, which will make it much easier for you to be able to grasp the concepts here. So let’s break this down. The colored circles are what we call nodes, and they hold a set of properties which is basically a way to store data in them. The lines connecting nodes are relations (which can also hold data!), and they can be one-way or bidirectional, which dictates a certain interaction between two nodes. The boxes close to some nodes are simply their properties, customer will only have the email and name properties to start with. This is a very simple initial graph, but this could be easily expanded later on. So the things to remember here: nodes, relations and properties.
Before jumping into Python code and awesome graphs we first need to setup a couple things. Let’s get started with Neo4j.
Here at Elements we really like Docker for setting up our local development environment, so we were really happy to see that there’s an official Neo4j docker image and it’s really easy to set it up, you can check all the instructions on that link.
With that out of the way it’s down to the Python part, and for requirements management we’re using the awesome Pipenv and you can check the Pipfile for the full requirements (which is not that much really).
Neo4j and Cypher
Cypher is Neo4j’s query language (like SQL is to Postgres), it’s a very declarative way of querying your graph database by traversing the nodes in your graph using all the relations defined in a very powerful way.
There are several example datasets here so you can play around with Cypher queries.
There’s a really handy cheatsheet that you should definitely checkout.
After we played around enough with Cypher we came up with our own trial dataset for our API, which you can find here and using Neo4j local dashboard (which is amazing by the way) it looks like this:
To show how declarative the queries are, let’s say we’d like to know the prices of all products from the cheese category from myself (as a customer):
Which returns us the following tabular results:
Py2neo and Object Graph Mapping (OGM)
Now what we need is a Python interface to Neo4j. Enter py2neo, which is exactly that. It is pretty straight forward and easy to pick up. It allows us to map our above model in a Pythonic way like so:
As you can see above, we started with a very simple model and were able to define properties and relations for each node. You can check the rest of the full models at the repository.
Besides this high level OGM API that Py2neo offers, you can also use the low level one to directly write and run Cypher queries against Neo4j.
Graphene and Flask-GraphQL
We needed something to tie everything together, we wanted something simple but that could be expanded upon, so we chose Flask and Flask-GraphQL, which you can see that is maintained by the same people behind Graphene. Flask-GraphQL simply provides a nice GraphQL-ready view for parsing the queries, etc.
Graphene is our GraphQL interface, so it’s what we’re gonna use to define our schemas and mutations. Think of schemas as being more or less controllers/serializers, it’s where we are going to define the properties from each label (node) that will be exposed to the API and fetch that information through resolvers. Mutations are the interface through which you perform “actions”, for us it felt more or less like exposing functions over the API.
Schemas and Resolvers
We first have to define the entry point schema, and it seems to be some sort of convention of calling it
Query, so we went ahead and started simple with the following:
You can read the
customer variable basically as: I’m going to reply with a Customer schema’s information and there’s an
And query it like so:
However, this won’t work, since we haven’t yet defined a way to resolve
customer, so we define a method inside our
This way Graphene knows what to do when it receives the query we tried before. There also seems to be some sort of convention regarding the resolvers in which you’ll have a
resolve prefix followed by
_ and the property’s name.
You might have also noticed that we’re using
.as_dict() methods in our Customer model, these were implemented by us to extend the OGM’s functionality as Py2neo’s OGM is fairly simple and quite lacking feature-wise when compared to other ORM’s for SQL like SQLAlchemy. You can check the full model here.
Now that we have a way to query our API we want to do a couple more things than just simple “GET” requests (analogously speaking), we’ll start by writing a
create_customer mutation. We have to define a mutation class and the
This mutation is very simple, we have defined two required arguments and we are exposing two properties on our response (
mutate method is also very simple, it instantiates our Customer with the provided info and saves it in the database (the
.save() method you see here is also one of our own extensions to Py2neo).
We then need to add our mutations to our API, which is simple as:
Now we can finally query our API with the mutation like so:
For us it really feels like we’re saying: “Hey, I’m going to call a function (mutation) that’s named
create_customer passing these arguments here, and I expect in response a
customer object containing his name and email, in addition to the
success property too!”, which is quite literate in our opinion.
And in case we don’t want the customer’s name, or its entire object we just remove it from our query! So no over-fetching unnecessary information.
We didn’t feel like overwhelming you with several schemas, models and code snippets, so we omitted a good chunk of things, but you can check it all on our Github repository.
GraphQL is becoming more and more consolidated now since its first stable release in October 2016, you can quickly iterate on constantly changing requirements and different information and data needs from users. Having a new “endpoint” in your API can be quickly prototyped and then mocked to the front-end developers.
Graphene is a nice Pythonic interface to GraphQL with fair documentation and a few nice integrations like Django, Flask, SQLAlchemy, to name a few. The only remark we have is regarding when things go wrong, let’s say someone queries your API with a non-existent email in your database, you’d like to provide a user-friendly error message right? From our research, as of this blog post, there seems to be no standard way of doing this and no documentation as well, despite some discussion regarding this issue there doesn’t seem to be anything laid out for the future on improving it.
Neo4j feels like a good graph database, however, we felt that a few expected features were missing. For instance, despite being able to set uniqueness constraints on the labels you cannot set unique constraints on a label’s relationship with another label. Imagine the following scenario, you have a Customer and this customer has Receipts, Receipts don’t need an uniqueness constraint as you might have different Customers submitting identical receipts if your user base is big enough, however I want all the receipts that a certain Customer possesses to be unique. Right now this is not possible in Neo4j, not directly at least, but you can use sort of a workaround to get this behaviour.
Py2neo as a Neo4j driver felt very young and lacking features that we usually expect from object mappers (mainly because we’re spoiled with very well established ones for SQL languages). The API is also a bit quirky and could definitely be improved (maybe we’ll be contributing to it in the future!).
All in all, you should definitely give these technologies a try and keep an eye on them as they’ll definitely be evolving significantly in the near future and seem very promising!
If you have any questions regarding our code, tech, decisions or just want to chat and discuss this subject further hit us up in the comments section or an email!