Monitor your GraphQL Apollo Server in Google Cloud

Philipp Schmiedel
4 min readJan 13, 2019

--

[tl;dr] Migrate to Apollo Server 2.0, use formatError and custom error classes

Google Cloud Stackdriver is an amazingly powerful tool to monitor your applications and to be alerted if something goes wrong. And if you’re using App Engine Flex or Google Cloud Functions, every crash of your nodeJS application (which means something is written to stderr ) is reported automatically. However, in case of Apollo GraphQL you should not rely on this magic to happen, because it is not. The good news: the fix is easy.

If you’re still using Apollo in version 1, don’t miss the last section of this blog post.

Our test setup

In our test scenario we simulate that our database is not reachable while trying to resolve a graphQL request. Apollo Server provides some predefined error classes e.g. UserInputError to differentiate between different error cases. To follow best practices and be able to define a specific handler for our custom error later, we introduce our own error class for this case:

In our example resolver books.js we’re simulating a randomDB outage:

Running the server on our local machine and querying our resolver multiple times, the Exception appears in our console as expected. Deployed to an App Engine Flex, our Error Reporting and Logging stays empty, which would mean we would not recognize if something in our API goes very very wrong (which is in general a bad prerequisite for a production system).

stderr does not appear, because no exception was logged in console :(

Meet formatError()

The Apollo graphQL server provides a callback function which is triggered every time our resolver runs into an Exception. Together with Googles Stackdriver Error Reporting library we can send our own error reports to the Google Cloud API every time something unexpected happens. It is important to note that all Apollo errors are wrapped in a GraphQlError object, however our original error is still available inside the originalError property of the error provided by the GraphQL server. As we introduced our own error class before, it is now possible to implement a different reporting behavior for our specific case:

Don’t forget to enable the Stackdriver Error Reporting API for your project!

Let’s query our resolver again inside of Google Cloud until we see our simulated exception. Checking our App Engine logging now shows a /stderr category and our error log message. We could now create a metric and alarm policy to get informed about errors happening in our graphQL implementation.

Error is logged in stderr in Google Cloud Logging :)

But wait… Let’s see what happens if we deploy a version of our code that contains an unexpected exception. For example, a developer tries to access .length on an undefined variable and we did not see this mistake in our code reviews or final testing of our API.

First, we defined in the fallback case of our error formatter that details about unexpected errors should not be shown to the client.

Error details are hidden from client

Second, we are calling the Error Reporting API in this case with the error stacktrace. The error appears in our Google Cloud Dashboard and the Error Report section, additionally (if configured) we can send an automatic notification to our Dev Team via Mail, Slack or whatever communication tool we prefer.

Cloud Console Home Dashboard
Cloud Console Error Details

Using the Error Reporting functionality in your graphQL setup can make a huge difference in your daily business, here is why:

  • Same errors are automatically grouped, including a history when and how often this error happened. It is way more likely that you oversee a second error by just scrolling through your App Engine logs.
  • You can link the bug to your internal ticketing system, flag it as Acknowledged or Solved and therefore don’t lose the overview.
  • In our example implementation reporting to Google Cloud is the fallback case, whenever something unexpected is going on in your resolvers, you’ll know.

The complete code of this example can be found at my Github repository.

Sorry Apollo Server 1 folks

The formatError() callback function is also available in Apollo Server 1, but there are mayor downsides:

  • The more specific error classes UserInputError etc. are not available.
  • The error object inside of the formatError() callback does contain the originalError property, however it is always casted to the general Error JavaScript object. Therefore you can not distinguish between different types of errors.
  • … which makes it hard to not spam your Error Reports as every reject in your resolvers (expected or unexpected) will trigger the formatError() callback.

As Apollo Server 2.0 is fully backwards compatible there should be no valid reason to stay with the old version. See the migration guide for further information.

If you want to learn about how to log the request context that caused an error, continue reading at my next GraphQL logging article.

--

--