Slack Lunch Club, Part 3/7: Backend

Image for post
Image for post

Please read Part 2 if you have not.

This part is an overview of this pull request. It is the most important part of the series.

One of the books I started reading during my sabbatical was Domain Driven Design: Tackling Complexity in the Heart of Software, by Eric Evans. I wanted to start learning more about Software Architecture and higher level design principles. The books is full of useful, actionable insights and I highly recommend it (though it is quite long). The main goal of the book, is to help the reader understand how to build the right models and abstractions for their given business domain, and use succinct but precise language in describing those models and their business rules / interactions. Reading this book, while at the same time learning GraphQL, caused me to think about API design in a totally new way.

GraphQL + ArangoDB = ❤

Many people view GraphQL as a replacement to REST, due to API discoverability, data fetching efficiency, easy deprecation, etc. And while all of those features are true and valuable, I think the real value of GraphQL is in its utility as a design tool.

GraphQL is not actually a graph database query language, which is confusing to some at first. It is agnostic as to how you actually store your data behind the scenes, which is important. A single GraphQL query can pull data from many sources, and can even be incrementally adopted by wrapping an old REST API. GraphQL allows a developer to model their domain as a combination of Types, Queries, and Mutations (along with Subscriptions which I will not cover).

The combination of GraphQL + ArangoDB is a developer’s dream. In today’s world, most of the value to be found is in how things are connected. Its all about relationships. That is what makes GraphQL such a joy to use, it puts those relationships front and center, and turns your API into an actual graph that you can traverse. While this is obviously useful for social networking applications, I argue that thinking in graphs has many other useful applications. GraphQL makes the actual implementation more intuitive, ( we as humans naturally think about things as entities and relationships ) and ArangoDB allows that graph model to be efficiently stored and queried on disk. This lets you not only model your domain as a graph, but truly execute complex graph queries against that model in real time! And yes other graph databases enable this, but ArangoDB is uniquely relevant for javascript applications, due to the embedded V8 engine, and it’s ability to store nested JSON documents. As far as I know, only OrientDB and gunjs also support arbitrary JSON documents as nodes. Neo4j for example, although more popular, does not. And its important that the storage engine details do not affect the API design. I am also not the first to discover this powerful combination.

Using a custom database, instead of DynamoDB is a serious technical decision that must be justified by the business goals. For a social networking, recommendation service, a graph database is a perfect fit. If your business has similar requirements, using a graph database can give you a huge competitive advantage by allowing you to offer features that are nearly impossible with a relational or standard NoSQL database.

I try to follow the SOLID principles as closely as possible when writing code. Therefore I wrapped the ArangoDB driver behind a more general GraphDatabase interface that the GraphQL API can use. This also allows the user to supply a JSON configuration to initialize the graph database topology, collections, indices, etc. The developer ideally should not even know they are using ArangoDB, and in the future, if needed, the storage engine can be swapped out for some other graph database with no affect on the API code. This package will be open sourced as an npm module when I find time to write the tests and clean it up. ( or someone steps up to help me )

Powerful Tooling

Another great benefit of adopting GraphQL, is the tooling ecosystem. Once you define your entire API as a statically typed graph, you can do some interesting things with the introspection query. I added 2 routes to help with developer on-boarding at /graphql/playground and /graphql/voyager. The first let’s you interact with the API to understand what it is capable of. ( similar to Postman) The second is an interactive graph visualization of the entire API! See below for screenshots.

Image for post
Image for post
^ This tool is the living, interactive documentation of the API.
Image for post
Image for post
^ And this tool let’s you visualize how all the types are connected. ( similar to UML diagrams )

Clean Architecture

Its important when writing software to follow scalable patterns, to avoid technical debt and increase the readability and maintainability of the code. I am a big fan of the Onion Architecture described here.

Image for post
Image for post

For example, our backend API follows this pattern in the following way:

Entities > GraphQL Types

Use Cases > GraphQL Queries & Mutations

Controllers > GraphDB Interface

External Interfaces > ArangoDB driver module

Scalability

The only component that might be an issue for scaling is the database, as it is running on a single EC2 instance. The other components (S3, CloudFront, Lambda) have scaling built in. The good news is that ArangoDB uses memory-mapped files in it’s implementation, meaning as long as your database fits into system memory, you will get blazing performance similar to other stores like redis. Even on a t2.micro instance with 1 cpu and 1 gb of memory, the API was able to easy handle 10k concurrent requests. I used this tool to do the benchmarking: https://github.com/Nordstrom/serverless-artillery

ArangoDB has great clustering support which should be used if the app begins to gain traction. This is less for performance and more for resiliency, as the EC2 instance could eventually fail over, at which point the seconday DB server could step right up.

Graph databases are inherently difficult to shard, as there needs to be some knowledge of the business domain to do the sharding efficiently. Otherwise your requests will slow to a crawl due to the network hops between servers as the graph is traversed. Luckily ArangoDB has a solution to this problem called “Smart Graphs”, though it is for enterprise customers only. The good news is that AWS has EC2 instances with up to 512 gb of system memory with 72 vCPUs, so there’s tons of room for vertical scaling if your app starts to gain serious traction.

I would like to eventually merge my serverless-artillery branch, as well as create a separate test suite for load testing. As developers, we should know what the maximum load our deployed apps can handle is, and we should have data to prove it. This is not needed for the initial release however, I just wanted to get a rough idea of the current performance.

Serverless + Webpack

Our serverless.yml file declares all the functions that our backend API needs to accomplish its goals. Many people split up their API into many small lambda functions, which has its merits. I personally like the “Lambda monolith” approach, as I do not want the infrastructure to affect the way I write code. I just want to write my express app, which handles all HTTP requests and routes, and ship that in a lambda. For HUGE express apps this does not make sense, but again this stack is tailored for the indie hacker / small startups. You can always split up the express app by routes later into separate lambda functions, but I think that is a premature optimization.

I also am a fan git mono-repos, even for large projects. So splitting up the backend into all these little lambda micro services doesn’t make sense to me. Perhaps for large distributed enterprise teams this is a good idea. Each lambda function should definitely be its own npm module however, to enforce clean boundaries. Thanks to yarn workspaces, this is easy.

We use webpack to transpile our backend node.js code with babel, as well as bundle all the other necessary assets (email .html templates, .graphql files, .AQL queries, etc) We can also ship standalone binaries with our service, which we are doing for our database backup and restore service. You usually want to compile the binary yourself on an AWS EC2 instance running Amazon Linux, but knowing that Amazon Linux is based on Centos7, you might be able to get away with just shipping the Centos7 binaries. For the arangodump and arangorestore binaries, this seems to have worked.

Flow Type

Using a type system is crucial for any serious javascript application. It not only eliminates a whole class of runtime errors, but more importantly, serves a a form of documentation for future developers. You always want to optimize for code readability and maintainability first, as developer time is the most expensive resource, especially if there is just one developer! Flow also has great VSCode integration which makes jumping around to type definitions seamless. Again, the less things you have to hold in your mind at once the better. Typescript seems more popular than Flow, but I chose Flow because it is incrementally adoptable and plays well with other Facebook tooling. You could even give Nuclide a try if you want to feel like a fb developer. :)

What’s more is we can actually generate flow types from our graphql schema! I cannot stress enough how awesome this is. This not only makes implementing the API easier, but can create an even tighter coupling between the backend and frontend as our React app can import the types. I am even thinking about just using Flow on the frontend and ditching the prop-types package all together.

Deployment

Our deployment is pretty simple, as we just replace the lambas all at once. In the future we can allows for canary deployments which provide even more protection against mistakes and outages. For small projects with little traffic, this is another premature optimization. The good news is the Serverless framework will most likely have a plugin to support this in the future as AWS now natively supports it in API Gateway.

Follow the steps in the README to deploy both the staging and production backend APIs. We will need these APIs for the React frontend which we will now deploy in Part 4.

Further Reading

Here is another excellent blog post about GraphQL + React using the apollo client.

To see what a real production app looks like using these technologies, checkout https://github.com/withspectrum/spectrum

Read this amazing blog post if you want to really understand graph theory.

Written by

Engineer. Investor. Founder peapods.com

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store