Replicating The GitHub GraphQL API With Neo4j

Working With Your Github Exported Data In neo4j-graphql.js

--

GitHub recently added a download-all-of-your-GitHub-data feature that allows you to export all GitHub data related to your account (repositories, pull requests, commit comments, etc) as a series of JSON files. Github also has a comprehensive GraphQL API, so I thought this would be a fun opportunity to try to replicate some of the functionality in the Github GraphQL API with my exported data and neo4j-graphql.js. Here we go! 🚀

The Github GraphQL API

Github released one of the first public GraphQL APIs in 2016. This API exposes repositories, users, organizations, issues, pull requests, and more — both reads through queries and write operations through mutation GraphQL operations.

Querying the Github GraphQL for releases of the grand-stack-starter project

Neo4j Graph Database

Neo4j is a graph database that allows you to model, store, and query your data as a graph. Neo4j is particularly well suited to working with connected data — like users that are connected to repositories, that have issues and comments!

Neo4j uses a query language called Cypher, which you can think of as “SQL for graphs”. Cypher uses graph pattern matching to enable the developer to work declaratively with graph data. We can also easily create a GraphQL API backed by Neo4j using neo4j-graphql.js, a GraphQL integration for Neo4j that can auto-generate a GraphQL API and implement resolvers, creating database queries from arbitrary GraphQL requests. neo4j-graphql.js works with any of the JavaScript GraphQL implementations, but here we’ll use it with Apollo Server.

Exporting Your Github Data

There are many ways to access Github data, such as through the API or via GitHub Archive but in this case, I used the new export data feature to export all of my Github data in a few JSON files. This data includes the repositories my Github user owns, pull requests opened against those repositories, and commit comments from other users

1. Visit your account settings page.

2. Click “Start export” in the “Export account data” section. You will receive an email when the export is ready.

3. Click the link in the email to download the archive.

Some of the GitHub data available through the data export feature.

Once the export is available we’ll have a series of JSON files that we can load via Neo4j Browser. We’ll just need to write a few Cypher queries to load the data into Neo4j.

Importing Into Neo4j

Now that we’ve downloaded our Github data the next step is to import it into Neo4j. First, we’ll define the graph data model we’ll use to represent our data, then use Cypher to read the JSON data and create the data as a graph in the database. Looking over the data, we’ll follow the basic graph data modeling approach:

1. What are the entities? These become nodes.

2. How are these nodes connected? These become our relationships

3. What attributes describe our nodes and relationships? These become properties.

The graph data model for our exported GitHub data.

Here is the Cypher script for loading repository data, which in this case includes collaborators and webhooks:

You can see the full Cypher import script here.

Neo4j GraphQL

We’ve already seen how to interact with Neo4j using Cypher, the database query language for graphs. Cypher is great for querying the database directly, but we often want to build an API application that our client (such as a React app, native mobile application, etc) can query, which in turn can fetch data from the database. This is where GraphQL comes in. Building a GraphQL API on top of Neo4j can be done with neo4j-graphql.js and GRANDstack by:

  1. Generate query and mutation types
  2. Auto-generate resolvers
  3. Translate any arbitrary GraphQL request into Cypher and handle the database call

With Neo4j and GraphQL it’s graphs all the way down — GraphQL makes the observation that your application data is a graph and allows the client to express arbitrary traversals through that graph with a GraphQL query. While you are free to use any backend system with GraphQL — one of its major strengths — using a graph database as the data store for a GraphQL API removes much of the mapping and translation that occurs in GraphQL server implementations.

Let’s take a look at how this works in closer detail.

Create A GraphQL API With neo4j-graphql.js

The first step to implementing any GraphQL API is to create GraphQL type definitions.

GraphQL Type Definitions

GraphQL APIs are driven by GraphQL type definitions. The type definitions define the data available in the GraphQL API and with neo4j-graphql.js drive the Neo4j database data model. Unlike other implementations that require us to maintain the schema both in the database and the GraphQL server, with GRANDstack we use the GraphQL type definitions to drive the database:

You’ll notice that we make use of two GraphQL schema directives in the type definitions above:

  • @relation— defines a relationship that connects two nodes.
  • @cypher — binds a field to a Cypher query, this allows us to define computed fields in our GraphQL schema using Cypher.

You can read more about these directives here.

Auto-generated GraphQL Schema

In a typical GraphQL implementation the next step would be to define Query and Mutation types that specify the entry points for our GraphQL service. However, we don’t need to define a Query or Mutation type in our type definitions, these will be created for us automagically by neo4j-graphql.js.

Here, we use the type definitions created above to create a GraphQL server, with help from neo4j-graphql.js:

All we need now is to run npm run start to start our GraphQL server.

Auto-generated Resolvers

Note that we haven’t created any resolvers, the functions that contain the logic for fetching data from the database. Implementing resolvers are typically the next step however we don’t need to do that since neo4j-graphql.js implements our resolvers for us automatically by translating GraphQL requests to Cypher and handling the database call.

We can inspect the generated GraphQL API in GraphQL Playground. Notice that queries and mutations have been created for each of the CRUD operations:

The GraphQL API generated from our type definitions using neo4j-graphql.js includes auto-generated queries and mutations for all CRUD operations.

Querying With GraphQL

Now that we have our GraphQL API up and running, let’s try some GraphQL queries.

Query for repositories

Let’s start with a fairly simple query: find three repositories, ordered by name, select the name and description, then traverse the graph to find any pull requests connected to those repositories:

{
Repository(first: 3, orderBy: name_desc) {
description
name
pull_requests {
url
}
}
}

Behind the scenes, this GraphQL query is translated into Cypher and sent to our Neo4j database instance, in a Cypher query that looks something like this:

MATCH (`repository`:`Repository` {}) 
RETURN `repository` {
.description,
.name,
pull_requests:
[(`repository`)<-[:`BASE`]-(`repository_pull_requests`:PullRequest)
| repository_pull_requests { .url }]
} AS `repository`
ORDER BY repository.name DESC LIMIT 3

These GraphQL queries can be arbitrarily complex. Here’s another example, this time in GraphQL Playground:

You can find all the code for this example on GitHub (of course). And you can see it running live in CodeSandbox below:

Try it live!

--

--