GraphQL — Case Study : KASKUS Groups

Karol Danutama

Published in

gdplabs

7 min readSep 30, 2018

Our Development Experience on using GraphQL

Introduction

In the era of web development, creating an Application Programming Interface (API) using Representational State Transfer (REST) is a common practice. But, since REST is an architectural style, sometimes the implementation for each platform can be different. Client also needs to learn all of the endpoints and parameters needed so it can be complicated for the development teams. The amount of request and bandwidth used by using REST architectural style also become a concern.

In 2012 Facebook internally developed GraphQL, Facebook motivation is to make a data-fetching API that is powerful enough to describe all of Facebook and yet simple enough to be learned and used by the developer teams. Unlike REST, GraphQL is a standardized language, a specification that creates a strong contract between client and server. Its strongly typed specification allows client to dictate what data is needed. This also solves problem of data over-fetching which decreases the amount of request and bandwidth used.

KASKUS Groups

KASKUS Groups, a product of KASKUS, is a fun & friendly community platform for passionate young soul who wants to seek for new friends or authentic information. KASKUS Groups consists of Group objects, each of them belongs to a specific Category. User can send Post objects to a Group, if s/he already joins the Group. User can also view the posts in the Group page. Posts are displayed in an infinite scroll feed. The posts are lazily loaded, next posts will be loaded when user is about to reach bottom scroll limit. Each post may have one or more comments. Comments are also lazily loaded. Semantically, a comment is described as Post data structure as well. The difference is that comment is a Post with parentPostId.

Why GraphQL

Take a look at single post card above. The information we should show to users are:

post.creator shown as the original author.
post.content shown as the brief content of the post.
post.commentsCount
post.likesCount
post.comments

Fetching a single post is done in a single request on GraphQL:

GraphQL at a Glance

Unlike REST that is architectural style, GraphQL is a contract, a query language that has been defined to ensure its consistency. GraphQL is self-documented, all queries and types was defined in a schema. The schema then will be used as a contract when doing development.

We will only cover sufficient concept for this article. GraphQL learning site provides deeper coverage.

Schema

Each query sent to the server must be defined in the schema. The mental model of GraphQL is to select certain set of fields from an object. Query sample above means select field name from me object. Each fields should have a type.

Example of a GraphQL schema

Schema above suggests:

Field name is a string that is not nullable. Non-nullability is denoted by bang (!) character.
Field posts is a list of Post object, also not nullable.
Field activity comes with argument, that means the result depends on the period argument value. The period argument itself is optional since there is no bang character here. If not provided, default value ONE_MONTH is applied.
Field referrer is a nullable field.

Complete query and response example. Note that query should define all requested fields explicitly.

Querying for more fields

Response with requested fields

Query

As the name suggests, query is a message to the server to request certain data. The language itself loosely resembles JSON.

Example of a GraphQL query

Example of a GraphQL response

Mutation

Unlike query, mutation is used to mutate the data.

Mutation definition on a schema

Mutation response

Development Experience

There are some highlights we experienced during development of web application using GraphQL. The highlights cover the benefits and some drawbacks we encountered so far.

Design by Contract

We use Spring Boot GraphQL Starter library for our backend implementation. The library imposes strict schema definition for the code to successfully compile and run. The schema is expressed on a number of .graphqlsfiles, located on the project resource. The library then resolves the schema with the resolver/query/mutation implementation. Since it is strict, any slightest differences in naming or type definition will cause a runtime error when booting up the application. It seems annoying at first, but we are seeing a blessing in disguise situation.

Our schema snippet

Java implementation on top of Spring Boot GraphQL Starter library. Absence of isJoined(ChatRoom) method will cause a runtime error.

The strictness of the implementation forces the developers to fully comply to the contract. The effort of fulfilling the contract is embedded into the process of implementing it, there is no separate effort for implementation and maintaining documentation. The end result is the final implementation is indeed correct. It is also beneficial to frontend team. At every development cycle, we only need to follow the contract defined previously.

GraphiQL is also helpful us in reading the documentation. We enable this by default on our development environment. The result in GraphiQL is also always in sync with the schema defined in .graphqls files. We can also run a query in the web interface to test a few things.

Separation of Concerns

During development there were some occasions where we needed to add some fields in the response object due to some pragmatic frontend reasons. For example: we have following Group database schema

Group:
ID:  String
Name:  String
Description:  String
Type:  Enum[PRIVATE, PUBLIC]
…

Initially we implemented the Group type in mirror with its database schema:

Group type schema

However, the web client needs to know the current user membership in a certain group to decide the ACL (Access Control List) that should be imposed on the user when visiting a certain group. We can create another query, let’s say, getGroupMembership(groupID, userID)but we deemed this query would complicate the client implementation and expose unnecessary security hole. We then decided to add one additional field named membership: GroupMembership! when returning Group object. The value of this field differs for each user requesting the group object.

The implementation of the field addition inherently does not require any database schema changes. We only need to add a method in the resolver to resolve the group membership field. This shows a clear separation of concerns:

Resolver is responsible for resolving the data required by the client,
Database entity is responsible for persisting the data, and
Changes in resolver does not need to alter the entity implementation.

GroupResolver with a method to resolve membership value. Spring Boot Starter GraphQL automatically scans resolver classes to find matching method with field name.

Under-fetching/Over-fetching and Backward Compatibility

This is the most straight-forward benefit as this problem was one of the main idea of GraphQL. The client is now able to request the fields it needs, hence improving the performance. The practical highlight of this compared to RESTful development is described through following scenarios:

Client needs field X of type T for the new UI, but it is not available yet in the backend implementation.
Backend team added field X in the response by adding a new resolver and adding new field in the schema.
Client experimented the new UI, but turned out the experiment of the UI is not satisfactory. Field X is not needed as the frontend team reverts to previous version and iterates over the new UI experiment.
Client decided to drop field X from the request. Backend team decides to keep field X in case it is needed.

In the RESTful development, in the final step backend team should remove field X as it provides no value to the client other than causing over-fetching. The fetching flexibility also supports backward compatibility. From the scenario above when the new field is not needed, client only needs to request for required fields without experiencing over-fetching.

Caching

This is the point where we need further exploration. Our GraphQL endpoint was implemented using POST HTTP verb. In addition to that, there is only one HTTP endpoint for all queries and mutations. This constraint prevents us from using HTTP level caching like Varnish. The workaround we implemented was using Spring Boot cache support (application level). We are still monitoring if there is any improvement or new support for HTTP level caching.

Database Query Optimization

As I was about to publish this article, we stumbled upon a challenge. Consider following scenario. There are 2 groups [A, B] and 5 posts [1, 2, 3, 4, 5]. Posts 1 and 2 belong to group A. Posts 3, 4, 5 belong to group B. Let’s say client queries as follows:

The result will be:

Note there are repeated group objects in the result. It turned out the implementation of the server queries group objects to the database as many as the posts object, that is 5 times. We found that this operation can be optimized by caching the group query result, thus saving database I/O operations. We used the same caching mechanism above to optimize this. The caching turned out to be successful in reducing response time by 70%.

Although caching is proven to be effective, I personally think this effort is not really scalable since we have to do manual caching for every cacheable fields. We are looking for a tool with less boilerplate to do similar. Facebook Dataloader might provide it, since it is often used when implementing GraphQL. We are currently investigating the possibility.

Conclusion

Using GraphQL introduces us to a new experience building web based application. We deem flexibility, backward compatibility and maintainability are some significant benefits we gain. However we still have to take extra steps to ensure performance and scalability. This drawback is quite understandable since GraphQL is a young technology. It is interesting to see what GraphQL will be capable of in the next 5 years. If our journey appeals to you, we are hiring 1000 great software engineers in five major cities: Bali, Bandung, Jakarta, Surabaya and Yogyakarta in Indonesia.