At Streak, we recently partnered with Google to release our Gmail Add-on as part of the Add-ons launch. The Add-on is our first major foray into integrating with your existing mobile apps: it lets you better organize your work email in the mobile Gmail app, and makes it easier to collaborate with your team to make sure nothing slips through the cracks. The reception has been great. Our Add-on is top rated in the Marketplace and tens of thousands of users interact with it regularly. But the backend of the Add-on is also new: it’s 100% powered by GraphQL, albeit in an unusual way, and I wanted to share the engineering story of how we got here.
A Performance Problem
Our GraphQL journey started with us staring at a loading spinner. The fundamental purpose of our Gmail Add-on is to give the user more context about their emails. Useful context comes in many forms. If you’ve told Streak that an email is part of a sales deal, then we should add information about that deal when you view the email. If the customer you’re talking to is also talking to somebody else on your support team about an issue, you probably want to know that. And if the customer’s trial is going to run out today and might need to be extended, that’s important, too.
We realized that to get acceptable performance, we needed to combine the requests. Instead of getting “/email/15fb6cb3b627304f” followed by getting “/contacts/15fb6cb3b627304f”, we needed one endpoint: “/emails+contacts/15fb6cb3b627304f”. We spent a while manually bundling handlers together, but it turns out nobody on our team wanted to pick up artisanally hand-crafting batch endpoints as a long-term hobby.
Griping about it at lunch one day, one of our frontend developers suggested GraphQL. GraphQL bundles a lot of functionality that’s useful for making flexible APIs in a query language and a simple HTTP protocol. And relevant to our interests, GraphQL lets you request multiple objects in one go.
It also provides functionality for introspecting response schemas, only fetching portions of a response (to save bandwidth costs), and for eagerly requesting related objects (e.g. following foreign keys to their object). This is very powerful, but in the context of a user-facing API, also a little scary. Some of our teams have hundreds of thousands of deals. What if somebody requested all of them and their related information in one go?
So we decided to take a measured approach to GraphQL, using it only for communication between our Add-on running in the Apps Script sandbox and our backend to start.
GraphQL Implementation Strategies
There are two main models for retrofitting GraphQL onto a RESTful API backend. The first adds a proxy layer in front of the existing server, typically in Node since that runtime has the original GraphQL backend server. The proxy translates the GraphQL query into traditional API requests, sends them in parallel, and constructs the GraphQL response.
The second model adds a GraphQL endpoint directly on our backend server, using the graphql-java library. This endpoint either programmatically requests information from the existing endpoints, or does the same work they do to fetch the data from our backend datastore.
The main arguments in favor of the proxy approach was that it was completely separated from our existing stack. Our existing endpoints were well-tested, we were confident that we knew how to monitor them, and we knew that they correctly enforced permissions.
The main arguments for the integrated approach center around development and runtime efficiency:
- Thinking about development time, our existing API uses GSON to serialize data model objects into responses. By reusing the annotation and type information from GSON, we don’t have to duplicate our data model schema when creating our GraphQL endpoints.
- Then at runtime, many of our existing API endpoints have to fetch multiple objects in order to provide their results. For instance, when fetching a contact, we also have to fetch the team that contact belongs to in order to make sure the current user has permission to view the contact. If a GraphQL query wants both the contact and its team, we shouldn’t need to fetch the team once for the GraphQL query and then again to check permissions for the contact.
Since our whole purpose of using GraphQL was to make our Add-ons more efficient, we decided to pursue the integrated option. After some fun with Maven and getting Java 8 deployed on App Engine, we got a proof of concept working.
The good news: requests were batched and the request took about half as long as the previous serial requests.
The bad news: half as long as the previous serial requests was still roughly six seconds longer than we were looking for.
Hopeful but with a ways to go, we dove into the exciting world of GraphQL layering and instrumentation to track down the lingering performance issues.
We’ll talk more about our GraphQL experience next week, and in future blog posts delve into the build infrastructure we developed to build the Add-on itself and how we used Google Cloud Spanner to enhance our backend performance.