How I Made PostGraphile Faster Than Prisma GraphQL Server In 8 Hours

(Lower and to the right is better.) 95th precentile latency comparison of Prisma (blue), current PostGraphile (v4.0.0-beta.9, purple), and the next version of PostGraphile (postgraphile@next, pink). Prisma stays under 50ms latency up to 250 requests per second (rps); current PostGraphile reaches 900rps without breaching 50ms latency; but the next version of PostGraphile kicks the ball out of the park with an astounding 1450rps!!

TL;DR: PostGraphile was already highly performant for complex GraphQL queries, but the next release is faster still, and now it’s great at the very simple queries found in Prisma’s benchmarks too!

UPDATE: postgraphile@next is now released.

As PostGraphile v4 gets towards the end of it’s beta period, the time has been coming to start thinking about performance optimisations. And as luck would have it, both Hasura and Prisma have recently released benchmarks comparing their GraphQL offerings to PostGraphile! This sudden interest in comparative benchmarking was sufficient excuse for me to dedicate some time to finding the low-hanging fruit of PostGraphile performance optimisations.

One of the main reasons behind the PostGraphile v4 rewrite was to achieve significantly greater performance than v3 (which was known as “PostGraphQL”). This has been very successful for complex queries of the type that PostGraphile typically deals, where you’re requesting all the data required to render a web page in one single GraphQL query.

What I realised with the chart on Prisma’s shiny new marketing site is that their benchmark wasn’t against the large nested queries I had been focussed on optimising. Instead, they had focussed on the other end of the scale — extremely small simple queries — which makes sense given the way that they intend Prisma to be used. (PostGraphile is a front-line GraphQL server intended to serve requests direct from end clients such as web browsers, whereas Prisma is intended to be the backend to another GraphQL schema.)

Still, the performance difference for these small queries shouldn’t be as great as Prisma’s chart implied… Grateful for the wake-up call I realised it was time to look at PostGraphile’s per-request performance — how quickly could we serve trivial queries that just request a couple of fields from an individual database row like the following?

{
artistById(id: 3) {
id
name
}
}

Prisma gave me the heads up about the benchmarks the night before they launched their site (Tuesday last week), and by Friday I’d managed to allocate about 6 hours on the task and had achieved quite substantial performance gains. On Monday, whilst writing up this post, I got distracted with another performance thread and spent another couple hours on it. Here’s some of the strategies I applied to improve performance:

Simple list-based collections

First, I intuited that Prisma were likely benchmarking simple GraphQL lists, whereas PostGraphile only supported best-practices Relay-compatible connections (which are more expensive to execute). I’ve seen chatter here and there that people would like to see simpler collections in PostGraphile so this seemed a “two drones¹ with one stone” opportunity.

You may be happy to hear that, as a direct result of this, PostGraphile now has support for simple list-based collections as well as Relay connections! When the next release of PostGraphile goes out you’ll be able to use them with the CLI flag--simple-collection both. This will probably become the default before PostGraphile v4 is officially released.

¹ no birds were harmed in the writing of this post

Disable expensive query logging

Second, I realised that our logging is unnecessarily expensive — by default we print a reformatted single-line version of every single GraphQL query that comes in — great for debugging but not so good for performance benchmarks! When you use PostGraphile as an Express middleware you can turn this off via the disableQueryLog option, but there wasn’t a way to disable it on the PostGraphile CLI; until now! Just add --disable-query-log to skip logging entirely; at some point we may add more efficient and conventional logging as an option but I didn’t have time this week!

Lighter resolution

I took a quick scan over the most high-trafficked areas of the GraphQL schema: the resolvers. It turned out there was a couple of places where I was performing chained .filter(…) calls inside the GraphQL resolvers. A simple V8 optimisation here would have been to roll these up into a single filter call; but I realised… these don’t need to be in the resolvers at all! I moved the calculations to the parent contexts so the resolvers could use the pre-computed results and thus had less work to do.

Optimising the HTTP layer

Another realisation was that I’d spent much of my time optimising the actual GraphQL schema and SQL execution (through the invention of the graphile-build suite). However, the HTTP server/middleware portion of PostGraphile has barely been touched since I took over as maintainer from the ever-inspirational Caleb (who has now moved from the Flow team to the Oculus(!) team at Facebook). Well, the time had finally come! I broke out the Chrome DevTools, hooked them up to PostGraphile and got profiling.

(Aside: to everyone in the Chrome, V8 and Node.js projects who have made debugging and profiling the latest versions of Node such a delight, my heartfelt thanks go out to you. These tools are incredible and it has been a joy to experience the frequent advancements over the last 5 years! Thank you so much, and keep up the great work!)

Long story short, we now perform more of the computations during startup rather than at request processing time; we don’t recalculate things over and over if we can avoid it¹; we do standard V8 optimisations such as storing repeatedly referenced deep object properties to a variable to avoid re-incurring object traversal costs² and avoiding unnecessary function calls³; and we avoid fancy new things like Map in performance-critical paths for more performant alternatives such as plain objects and arrays.

¹ no more multiple calls to parseUrl(req) for the same request!
² and making the code more readable as a bonus
³ for example by turning array.forEach(…) calls into traditional for loops, and in certain cases moving function guards outside of the functions

No unnecessary transactions

When I first wrote this article yesterday, Prisma were still beating PostGraphile on the tiny query byArtistId, and I enumerated some reasons that might be. Then I realised that one of them was relatively easy to fix — I intuited that Prisma might not be using transactions for their queries because they don’t need to (saving them for mutations only). Turns out that PostGraphile doesn’t need transactions around queries either, unless it is configured to set some PostgreSQL settings e.g. via pgSettings or from a JWT or similar.

I quickly edited PostGraphile to only use transactions when it needs to, and that was sufficient for us to squeeze the last bit of performance out to beat Prisma even on the simplest queries. Seems that even the begin/commit network packets were weighing down PostGraphile’s performance, not including the additional work the database had to do to track the transaction! Fixing the tests to not expect these statements took longer than implementing the feature itself 😅

Benchmarking

On Thursday I asked Prisma for their test suite, and they happily provided it to me the next day along with some guidance on how to get it up and running. The suite was based on Hasura’s graphql-bench suite, but quite heavily modified and with the benchmarking tool wrk2 replaced by vegeta. It took quite a lot of time and effort to get it running reliably unattended, but with Prisma’s permission I’ve published my heavily modified version of it on GitHub.

To make the results reproducible I decided to run it on a Digital Ocean 8GB RAM compute-optimised droplet. I’ve provided detailed instructions in the README on how to set up a similar environment for yourself so that you can validate my findings.

I’ve provided detailed instructions in the README on how to set up a similar environment

The test suite as configured takes about 15 hours to run on the Digital Ocean Droplet. You can easily shorten that by simply editing bench.json to not record so many data points — but beware that according to @do4gr, the Prisma engineer responsible for the benchmarks, because Prisma is written in Scala and runs on the JVM it needs more time to get the JIT up to speed than PostGraphile (which is Node.js-based). @do4gr stated that “Prisma was only fully warmed up after more than 100k requests” and suggested that warming up slowly is important to an accurate benchmark on Prisma’s performance.

To accommodate Prisma’s preferences I give each environment a full 5 minute warmup period, then let it rest for 15 seconds before jumping into the testing. Each test performs requests at a constant rate lasting 60 seconds and then the service gets a 15 second respite before moving onto the next test. We use different rps (requests per second) values for the different GraphQL queries because some are significantly heavier than others, and thus need a smaller range and higher resolution (not even PostGraphile can resolve prisma_deeplyNested at 3000rps!).

When I set up the suite I followed the instructions @do4gr provided for benchmarking Prisma (using the Docker image prismagraphql/prisma:1.8-beta), and also benchmarked the current npm version of PostGraphile (v4.0.0-beta.9), and the next version of PostGraphile which will include the optimisations and features listed above (available via npm as postgraphile@next, or v4.0.0-beta.9.3).

Show me the results!

Okay, enough chit-chat — time to show some numbers!

Importantly I was able to reproduce the effect from Prisma’s website — this was on a simple query that requests just 2 fields from the first 20 rows in a database table. On the test environment outlined above, Prisma reaches an impressive 2000rps for this simple query! The current version of PostGraphile starts struggling after 1700rps. However, thanks to the improvements listed above, the next version of PostGraphile takes a strong lead hitting 3000rps with sub-50ms latency — more than 50% more requests per second than Prisma — and even gets past 3250rps before it starts dropping connections.

the next version of PostGraphile takes a strong lead hitting 3000rps with sub-50ms latency — more than 50% more requests per second than Prisma
# PostGraphile@next
query tracks_media_first_20 {
allTracksList(first: 20) {
trackId
name
}
}
# Prisma
query tracks_media_first_20 {
tracks(first:20)
{
id
Name
}
}
(Lower and to the right is better.) Prisma beats the current version of PostGraphile in the sub-50ms latency game by 2000rps to 1600; but the next version of PostGraphile takes an incredible lead with sub-50ms latency at an astounding 3000rps — that’s 50% more requests per second than Prisma achieves. (I’m not sure what caused the spike in latency for prisma at 1450rps; you should ignore it.)

Despite PostGraphile’s strong lead, it’s important to keep in mind that this synthetic benchmark is not the kind of query that most users would be issuing to a PostGraphile server — they’re likely to want to traverse the graph at least a little bit!

Once you get into queries that actually do use graph traversal, even the current version of PostGraphile takes a clear lead immediately. But we’ve widened the gap with postgraphile@next anyway 😁

This next query requests all the albums from a particular artist, the tracks within that album and the genre name of each track. (Note: this query was from Hasura’s and hence Prisma’s test suite, I didn’t invent any of the queries in this post.) PostGraphile can serve over 4x as many of these queries per second as Prisma while staying below the 50ms line in the 95th percentile latency chart!

# PostGraphile@next
query albums_tracks_genre_some {
allAlbumsList(condition: {artistId: 127}) {
artistId
title
tracksByAlbumIdList {
trackId
name
genreByGenreId {
name
}
}
}
}
# Prisma
query albums_tracks_genre_some {
albums(where: {Artist: {ArtistId: 127}}) {
AlbumId
Title
Tracks {
TrackId
Name
Genre {
Name
}
}
}
}
(Lower and to the right is better.) Prisma starts flagging after 300rps; postgraphile after 900rps; but postgraphile@next not until after 1450rps!

Next, we look at a more complex query prisma_deeplyNested that Prisma added to their test suite (note the formatting is verbatim from the test suite):

# Postgraphile@next
query prisma_deeplyNested {
allAlbumsList(condition: {artistId: 127}) {
albumId
title
tracksByAlbumIdList {
trackId
name
genreByGenreId {
name
}
}
artistByArtistId {
albumsByArtistIdList {
tracksByAlbumIdList {
mediaTypeByMediaTypeId {
name
}
genreByGenreId {
name
}
}
}
}
}
}
# Prisma
query prisma_deeplyNested {albums(where: {Artist: {ArtistId: 127}}) { AlbumId Title Tracks {TrackId Name Genre { Name } } Artist{ Albums{ Tracks{ MediaType{ Name } Genre{ Name } } } }}}
(Lower and to the right is better.) Prisma hits >50ms before 100rps, postgraphile survives beyond 400rps and postgraphile@next makes it past an impressive 550rps before crossing the 50ms barrier.

Another important way to look at the data is to look at the number of successful responses completed, rather than the latency.

(Higher is better.) Successful requests completed when serving X requests per second over a period of 60 seconds. For the GraphQL query `albums_tracks_genre_some` Prisma serves 300rps without dropping any connections; current PostGraphile reaches 1000rps; but PostGraphile@next hits 1450rps before dropping any connections!

Browse the data for yourself

The graphs above are screenshots of an open source webpage I built to browse the results, hosted by now.sh. Check it out, here.

The webpage allows you to browse all 4 queries that I profiled¹, and choose from the stats: average/max latency, 50th/95th/99th percentile latency, successful request count and failed request count.

I apologise for the lack of pretty styles; I don’t have the design resources that Prisma have! (The source can be found within the benchmark repository on GitHub here.)

¹ Technically I also profiled albums_tracks_genre_all but the results came back with Prisma reporting no successes at even 1 rps, so I discounted it from the results since I don’t have time to look into it.

Conclusion

As you can see from the graphs, the next version of PostGraphile outperforms Prisma for even simple queries, and the more complex a query becomes the wider PostGraphile’s lead. PostGraphile is designed to serve both simple and complex queries extremely rapidly without requiring complex setup.

These performance results don’t mean that we’re resting on our laurels — there’s still a huge number of things we can do to improve PostGraphile performance further— if you want to support this then please give generously to my Patreon! The results in this post came from just 8 hours work on the low hanging fruit; think what I could do with a month!

Couldn’t have done it without my Patreon supporters ❤️

I’d like to take this opportunity to send more heart-felt thanks to my Patreon supporters. Knowing that these 14 wonderful folks value my work enough to put their hand in their pocket and support it makes a huge difference to my motivation — they really keep me going. It’s easy to become weighed down in Open Source — especially because most of the time you have no idea who is even using your software until they have an issue — but this concrete support means the world to me! Thank you all 🙏

If you or your company would like to support my work on PostGraphile or hear what it can do for you, please get in touch on via @Benjie on Twitter or just head straight to my Patreon 😉

I’m also starting to build some commercial features around PostGraphile, so if this interests you please get in touch — I’m interested to hear what your company would like to see in a professional or enterprise PostGraphile offering!

Say “hi” at GraphQL Europe in June!

Coming up in just under a month, on June 15th, is the GraphQL Europe conference in Berlin (tickets are still available). I’ll be there among the crowd, do come and say “hi!” I’ll be hanging around for a few days afterwards, so if you happen to be in the area, get in touch via my Twitter.

Benchmark limitations / further plans

There’s a lot of issues with this benchmark process.

I’d like to think that I’ve improved a number of issues over Hasura’s and Prisma’s benchmarks, but there are still some serious shortcomings. Prisma are keen to set up a benchmarking framework that different providers in the GraphQL ecosystem can use to benchmark their own software, I would definitely like to see this go ahead.

There’s quite a lot that needs doing to make these tests more fair. Here are some of the inadequacies of the benchmarking I performed in this post:
  1. The databases differed. I ran Prisma against their stable MySQL implementation (as advised by them) rather than against PostgreSQL, so we were effectively benchmarking the databases along with the GraphQL software.
  2. The GraphQL queries differed. They don’t need to — both PostGraphile and Prisma support renaming GraphQL fields—it would be good to be sending exactly the same JSON blob to both servers so that differences in performance cannot be blamed on the difference in field names or whitespace.
  3. Loading the sample dataset into the databases differed. Though the tests should use the same dataset, because one was loaded via SQL file into PostgreSQL whereas the other was loaded with a proprietary tool into the Prisma cluster it’s hard to verify that the datasets are equivalent, including checking that they have the same indexes, etc. It would be best if we could have a single database that all the benchmarks run against.
  4. Everything ran on the same machine. The benchmarking tool, GraphQL software, and databases should all run on independent hardware such that they do not compete for resources and interfere with each other.
  5. Everything ran on a virtual machine. The benchmarks should run on dedicated physical hardware so that they’re unaffected by fluctuations in resource usage from other tenants on the same box. (This is why I chose the compute-optimise droplets, but dedicated hardware would be better still.)
  6. The recorded metrics should be widened — for example it would be great to monitor CPU usage, memory usage, or even energy usage of the different solutions when resolving the same queries.
  7. Consistent configurations should be used for each solution — how many connections can it make to the database? Must it use transactions? How many cores can it use? Should it produce logs? Is it okay to configure the environments differently because their databases have different characteristics?
  8. Testing should be performed against a larger data set. This test was ran against the chinook data set that Hasura used, however it might be better to run tests against much larger (e.g. many-gigabyte) databases, this will give a much better impression of how each software treats the database when the entire data set can’t fit in RAM.
  9. I’m not aware of any performance throttling, but if you run a benchmark long enough and hard enough the hypervisor may throttle it, or the hardware may do so automatically if there are any thermal issues. There’s no detection of this in the current suite.
  10. GraphQL-specific response validation. I noticed quite early on that Prisma will sometimes fail with a GraphQL error but still produce a 200 status code. This is acceptable GraphQL behaviour (“GraphQL successfully resolved to this error”), but the benchmarking tool vegeta is only set up in the suite to monitor for status codes. It’s possible (if not likely!) that things are going wrong under the hood without us knowing. Any response validation that did take place, however, would have to be very careful to avoid contaminating the benchmark results with different processing times…
  11. Error logging: when Prisma was overwhelmed it was writing stack traces to Docker’s logs. This may have had a noticeable negative impact on performance, causing a cascading failure. But on top of this, it also filled up the hard drive on the server with Docker logs, which broke the first benchmark after about 5 hours. All docker-compose.yml files are now set up with a log limit to prevent this flooding, but we should agree on what the systems being benchmarked should do in the case of errors to make comparisons fair.
  12. Garbage collection. Both Prisma and PostGraphile use garbage-collected languages; I believe this contributes to the “bumps” in the charts — e.g. where the system appears to suddenly go slower and then speed up again. Seeing how the system deals with GC is an important metric, but when you’re only benchmarking 60 second windows you’re sometimes going to hit a GC run, and sometimes not. One way to solve this is to force GC before each test. Another might be to extend the test period so that the GC results get absorbed in.
  13. Unnatural load. Most public-facing servers on the internet deal with varying load from second to second; these synthetic benchmarks do not model that well at all — once the system stumbles there’s no chance of it getting back up again because the load never lets up, even for a few milliseconds, which can allow GC pauses and JIT changes to allow a queue to build up that it cannot then clear.
  14. The tests only ran once. If a system process kicked in at one point, that may have interfered with one of the tests. It would be good to run the tests multiple times and average the results (perhaps even excluding outliers); however with the suite already taking 15 hours to execute that was unachievable for me.
  15. I’m not a Prisma engineer. Benchmarking third party code is challenging — only they know how best to set up that software, and the incentives are not aligned for you to go out of your way to research how to set the tests up right for them, or even how to write sensible queries. I sought a lot of guidance from Prisma on running these tests (and use the queries that they provided), but still there could be issues with what I have done or mistakes that I have made during setup.
  16. Doesn’t include Hasura in the tests. This is simply because I used Prisma’s test suite, and I’ve absolutely no familiarity with Hasura and didn’t have time to re-introduce benchmarking on it in a way that would be meaningful.

Thanks for reading! Check out PostGraphile, follow me on Twitter: @Benjie or GitHub: @Benjie, and consider giving to to my Patreon.