Things We’ve Learned While Working with GraphQL

Meir Levi
Fiverr Tech
Published in
7 min readDec 29, 2021

GraphQL is a powerful query language for APIs, as well as a runtime for fulfilling those queries with your existing data. It’s an elegant approach that solves many problems typically found with REST APIs.

This article revolves around how our journey with GraphQL began, what challenges we faced (and are facing), and the set of guidelines that drove us each time to a better, faster, more decoupled and resilient solution.

Micro services

Fiverr’s architecture is based on microservices. Each team is responsible for its domain services, such as order service, gig service, etc. Those services publish and consume events that allow each service to build a unique, optimized dataview for presentation.

Read more about the microservice evolution of Fiverr.

In some cases, this approach of owning a dedicated dataview for a specific page can cause a lot of

  • redundant consumption of kafka events
  • duplication of the same data in several dataviews
  • A replay of kafka events to consume new data that you need (overloads on the production database).

For example, in most cases, everyone defines the user with the same structured fields, so the dataviews defined for each service look essentially the same.

We then started considering a new architecture, a hybrid that will include two approaches:

Centralized microservice — serves other microservices to return flattened data for a specific entity.

  1. No more duplication of data.
  2. No more consuming of kafka events in your domain.

Dedicated microservice — will work via our old design, and it retains its specific data view to combine a set of mixed data in one optimized dataview for better performance.

Each group can pick the best methodology for their dedicated feature use case and can consolidate those strategies.

GraphQL

Since we began to join those methodologies of centralized and dedicated services, GraphQL entered the picture. Our business is continually developing, and to be more adaptable, we really want the ability to have autonomy between our backend and our clients. GraphQL is incredible for this reason: now, with centralized services, we can expose a subgraph for each centralized service and build a supergraph that aggregates all the subgraphs under it. Furthermore, with GraphQL we have the Schema Definition Language (SDL), which is a graph data model. It specifies what types of data are available and the relationship among them to create a contract between the server and the client. In this way, we get a structured response that is well defined with specific types — and a response that won’t break our mobile clients.

MVP

We began the MVP (Minimum Viable Product) in the mobile team, building a single query for a specific screen. We chose to work on the “manage orders” screen query, under the assumption that we will utilize a similar query and expand it to different screens in the app to demonstrate the independence it gives the clients to expand the query to other screens — without a need for backend resources.

Define the schema

As referenced previously, at FIverr each team owns a domain. We had to define the GraphQL schema before we could start working and developing it. We set up meetings with all the stakeholders related to the order entity, we defined the fields and their types, and we created scalars to yield a well-defined entity that will serve us. This step took us a lot of time; we had a lot of follow-ups before we got to the last schema that was acceptable to all the stakeholders. We then created a dedicated repository on GitHub that contains all the new schema PRs that we will open. This was done so that the stakeholders can see the changes of an existing/new schema and can provide their feedback on its structure before we move to the development step.

Our AIs from the schema planning were the following:

  1. If you are going to start this journey of defining a schema between a lot of stakeholders, you need to be focused, and you need to cut and make quick decisions to move fast to the next steps. Otherwise, you will get stuck in an infinite loop of meetings.
  2. You should define the entity as required for your current use case. There is no need to define the entire entity schema at the beginning if you are not going to implement it. You can always expand your entity as needed in the future.
  3. Designing an entity in a way that it can be expanded in the future is a must. For example, if you want to expose the buyer ID, don’t expose a field that represents a userID. The right solution would be to define a buyer field from type “User”, and the User entity should have the ID in it. In that way, you give yourself the ability to expand your schema without breaking it.

Code vs. Schema first

There are two methodologies for working with GraphQL. The code vs. schema first. Our mobile API gateway is written in Typescript using NestJS, which supports those two approaches and gives you the freedom to choose how you work. We chose to implement our gateway with the code-first approach.

In code-first, you use decorators and TypeScript classes to generate the corresponding GraphQL schema. This approach is useful if you prefer to work exclusively with TypeScript and avoid context switching between language syntaxes. Code-first is better because there are no exceptional features supported by schema-first that code-first does not support. Additionally, it requires less effort to use, because in contrast to the schema-first approach, it doesn’t depend on an excessive amount of tooling. Schema-first forces developers to use myriad additional tools, bogging down their experience.

Once we finished writing our class model, we checked that the schema generated from it was equal to the schema defined in the dedicated repository that holds all the GraphQL schemas.

Since we also have microservices written in Kotlin and other languages, we don’t force the approach of code/schema first. Each team can and should choose the best approach for them, based on the tech stack limitations in their services.

Monitoring

Monitoring a production environment of any system requires inspecting many statistics. As a backend developer, I would like to be able to determine the health of my service by classifying and monitoring errors programmatically.

At Fiverr we work within a REST methodology that provides a lot of statistics for our monitoring and alert tools, which gives us the power to know in detail about the health of our service — Bad status code responses, response time, requests amounts, etc.

With GraphQL, the story is different. GraphQL service exposes only one route that can be queried.

For example:

fiverr.com/graphql

This route can handle all the queries you defined in your schema. Since it has only one route, we need to find a better way to monitor our queries, their response time, and their errors in a way that will give us a clear view of our service health. It’s unacceptable to go live to production with GraphQL knowing that we have lost our monitoring ability.

So what did we do? We gave each query a unique name: a name that represents the query of a specific screen, which is called: operationName. Once we have an operation name per query, we can use the apollo plugin to monitor our queries.
Plugins are JavaScript objects that implement one or more functions that respond to events. This enabled us to do the following

  1. Know when the query started and when it was finished, as well as report the operationName response time.
  2. Report the queries calls amount per operationName.

You can read this article on how we actually monitor it with code examples.

Queries response time by operation name
Operation names calls

Errors handling

Another point in GraphQL is that you will always get a 200 status code response, also on failures. As mentioned before, we can’t lose the ability to determine the health of our service. So we had to figure out what to do to maintain a full overview of our domain.

You can modify and implement error handling in GraphQL so the errors the client receives will have your error format. But it’s not enough: it gives the client a better structure of error formatting, but we still need to know that there was an error in order to report it to our monitoring system on the server side with the real status code error.

We use the apollo plugin to monitor the relevant errors to Graphite.
We extract the relevant status code error from the errors array that exists in the GraphQL response, and we report it additionally with the relevant operationName that gets the error.

Operation names real status code

Federation

We started our MVP as a simple GraphQL service that holds all the schemas in it with specific resolvers per entity.

In the next articles, we will share more about our journey with federated GraphQL, explaining how we expanded it to be a supergraph using subgraphs schemas from other microservices, as well as how we joined forces with other teams and moved them forward to GraphQL.

Thanks to Matan Gilad, Nir Gazit and Lior Iluz for helping pushing this task forward.

Fiverr is hiring in Tel Aviv and Kyiv. Learn more about us here.

--

--