What is GraphQL and why Facebook felt the need to build it?

Published in

The DevOps Corner

14 min readMay 11, 2021

This article is part 1 of the series on Exploring GraphQL. Check out the other articles from the series:
Part 2: Fundamentals of GraphQL
Part 3: Building a GraphQL Server using NodeJS and Express

In this part, we will look into the old ways of fetching resources from the server using SOAP and REST and examine the problems with these approaches. We will see why Facebook felt the need for building GraphQL while developing the News Feed section for their IOS application. We will learn how GraphQL solves some of the common problems of the REST architecture. We will then dive deep into the Type System adopted by the GraphQL specification.

The World Wide Web was invented in the early 90s. It was just a pile of HTML pages back then. But the idea that a resource can be transferred over the wires from a server to the client was amazing! There was a lot of enthusiasm among people to try out this new feature. As people started experimenting with the web, they wanted to do a lot more new things than just browsing over static pages. The animations, scripting, and graphics were improved to have a better experience. Over time, the web revolutionized and is now the largest platform in the history of computing, around 5 billion devices are connected to the web.

Initially, the resource fetching mechanism was implemented using SOAP (Simple Object Access Protocol). It is a messaging protocol specification for exchanging structured data in the form of an XML Information Set. It uses HTTP or SMTP as the application layer protocols for transferring the messages across web services. SOAP was doing great and was being used in some of the major projects. However, SOAP did not perform well for low-memory and low-processing-power devices with limited bandwidth. There was a need to build for something better!

Representational State Transfer (REST)

Roy Fielding developed the Representational State Transfer (REST) architecture in the late 90s. REST was designed to increase the efficiency of communication systems.

Representational state transfer (REST) is a programming architectural implementation intended to increase the efficiency of communication in computing systems. It embodies the idea that the best way to share large amounts of data between multiple parties is to make that data available on-demand by sharing references to that data rather than a complete copy of the data itself. Systems which implement REST are called ‘RESTful’ systems. — Simple WikiPedia

REST aimed to provide a simple and flexible model for data transfer in the form of XML or JSON. It was adopted by the community worldwide. But as the applications became more complex, people started facing problems with the REST architectural style. Let’s consider one example to understand this more.

Let’s build a Movie Review System. The product manager has stated the requirements for building this application as below 1. It will have two views — Movies Listing and Movie details. 2. The Movies Listing view will display the name, poster and the average rating of a particular movie. The Movies details view would show the list of reviews for a particular movie.

The resources required for building the Movies Listing view are — Movies and Ratings. We will have to make n+1 (n is the number of movies) calls to the server for fetching the list of movies and their corresponding ratings. The movies API /v1/movies should send the response as below:

{
    "data": [
        {
            id: 1
            name: "Movie A",
            poster: "/movie-a",
            rating_id: 101,
            viewed: "1000",
            recommended: "200",
            review_id: 201,
            cast: {
                director: 'Blah Blah',
                // ...
            },
            summary_id: 301
        },
        {
            id: 2,
            name: "Movie B",
            poster: "/movie-b",
            rating_id: 102,
            viewed: "2000",
            recommended: "400",
            review_id: 202,
            cast: {
                director: 'Blah Blah',
                // ...
            },
            summary_id: 301
        }
    ]
}

The front-end developer uses id, name, poster and rating_id from the above API. He is very stubborn on his coding standards and hence changes the names of some of the keys on the front-end to reflect the naming convention being used in Javascript. He makes rating_id as ratingId. He then traverses the entire list and makes API calls to fetch ratings for each of these movies as:

/v1/movies/ratings/{rating_id}

And gets the response as javascript { data: [ { rating_id: 101, rating: 2, //... some more garbage data as always } ] } He simply ignores the rest of the data and displays the value of rating.

Please note: We have to make separate calls for fetching movies and ratings as these are two different resources and should ideally be in different models in a Database Schema. The developer can hack around this problem by building a custom endpoint that would return the result in one API call. However, this approach is not scalable and is not the ideal way to solve a problem. If a requirement changes, we will have to modify the implementation of the custom endpoint to handle various use-cases.

Can we reduce the number of API calls?

Certainly! We can reduce the number of API calls from n+1 to 2. We can batch the ratings API to get the list of ratings for all the movies in one go!

Let’s see how this can be done:

// Endpoint
movies/ratings
// Request Parameters
{
    ratingIds: [101, 102, 103]
}

The response of the above API would be javascript { data: [ { rating_id: 101, rating: 2, //... some more garbage data as always }, { rating_id: 101, rating: 2, //... some more garbage data as always } ] } The reviews page could be implemented in a similar way. It shows the details of the movie and its reviews. We make the following API calls to implement this functionality:

Fetching the details of a movie /v1/movies/{movieId}
Getting its reviews /v1/movie/reviews/{movieId}
Getting the number of likes on each review — /v1/movie/reviews/{reviewId}

While we were about to ship our product, the product manager came in with another requirement of displaying the number of users who have rated the movie. While the back-end guy was sending a lot more information than needed in the ratings API, he did not have anything in store for the number of users. So changes were required on the back-end as well as front-end to incorporate this change.

Clearly, there are some problems while fetching data using REST. These problems are listed below:

Multiple Endpoints

REST was designed as a resource fetching mechanism. So different resources are fetched using different endpoints. In our example above, we have different endpoints for movies, ratings and reviews. The front-end developers literally maintain a list of these endpoints in a separate file to make sure nobody tampers this file in any way.

Under-Fetching

The design style of REST deals with resources. In general use-cases, a typical functionality is implemented using at least two to three resources. In our case, we used Movies and Ratings resources to implement a simple movie listing page. We have to make two round trips to the server for fetching data of movies and ratings. The number of API calls is dependent on the number of resources being used for developing the user interface. In more complex scenarios, the number of round trips can increase up to 5 or 6 for fetching data for the initial load.

Over-Fetching

As we saw earlier in the /v1/movies API, some of the fields in the API were not used in the application. A lot of unnecessary data is being sent across the wires and this impacts the performance of an application. There is no systematic way of telling back-end developers what all fields are required on the front-end. The API simply returns all of the fields that are defined in the resource model.

Versioning

The first phase of our Movies Reviews system uses v1 in the API calls. The versioning of APIs is done to avoid any breaking changes in the applications. We will have to use v2 for the second phase of our application to support more complex functionalities. If not for versioning, changes in the initial APIs could tear down the application. Models and Views are tightly coupled.

Let’s say the android native application for movies reviews system uses v1 version for API calls. The developers made some breaking changes in the API and versioned them as v2. The app users who did not update their application would still be using the version v1 for interacting with the back-end. Versioning helps in supporting the older instances of an application.

If we want users to use our new features in the native application, we will have to force them to update the application from the play store. This is not a good user experience.

Syncing between back-ends and front-ends

This approach hinders the rapid iterative development on the front-end. If any changes are to be made on the view, it is likely the case that those changes are to be handled on the back-end as well. This kills productivity and increases the dependency on developers.

Why Facebook built GraphQL?

You might have not realized that these problems exist with the REST way of fetching resources. But companies like Facebook, Twitter, GitHub, and many more started facing issues with the REST implementation as the complexity of their application increased to many folds.

The developers at Facebook were finding it difficult to implement the News Feed section for their IOS application. During their development phase, they experienced a lot of bugs because there was no convention regarding the exchange of data between the front-ends and the back-ends. The shape of the data assumed by the front-end was different than the one being sent by the back-end APIs. There was a need to build something better!

Facebook solved this problem by building GraphQL. GraphQL is a static strong-typed query language that lets clients declaratively specify their data requirements. The clients specify the shape of the data that they need and the server responds back with the exact same data as the response. The client is in control of the view! This leads to fewer or no bugs. Let’s write our first hello world query in GraphQL:

{
    me {
        name
    }
}

This looks a lot like the JSON structure. The client asks the server the name of the currently logged in user and the server would send in the JSON response as below:

{
    "me": {
        "name": "Tim Berners Lee"
    }
}

The query sent by the client specifies the shape of the data and the server responds back with the exact same shape.

This solves our problem of under-fetching and over-fetching. It is not the headache of the client anymore to collect data from different end-points. The client just specifies the shape of the data it needs and then it is the job of the server to get the work done. That’s exactly how we solved one more problem of handling multiple endpoints with the REST style of doing things.

There is only one endpoint in GraphQL and the query is passed as a string to the server. Let’s check out one more query to understand better:

{
    me {
        name,
        posts {
            title,
            body
        }
    }
}

The above query asks the server to send in the name of the currently logged-in user, posts along with their title and body content. The server sends back the response as:

{
    me {
        name: "The Free Radical",
        posts: [
            {
                title: "Progressive Web Applications",
                body: "...."
            },
            {
                title: "A Beginners guide to getting started with React",
                body: "..."
            }
        ]
    }
}

The client does not make any assumptions here. It clearly knows the shape of the response. Notice the type of posts as an array of objects and each of these objects contain the title and body as specified by the client. Let’s try to implement the same use-case using REST and understand the subtle differences between the two of them:

// Fetching User details
/v1/userFetching posts
/v1/posts

We will have to query two different endpoints to get the resources accordingly. Also, the shape of the response is not in sync with that of the client. The server might send a few extra fields and could send some of the fields with different names of different data types. There is no contract between the client and the server.

GraphQL Type System

GraphQL uses a type system to validate queries sent by the client. The entities are represented in the form of a Schema. The client can request only those fields that are defined in this schema. The type system describes the capabilities of the server. Let’s consider the hello world query again to understand more:

{
    me {
        name
    }
}

me and name are called fields. The content inside { } is called a selection set. This is the structure of the query on the client side. However, before executing this query, it should be validated on the server-side to check if such data exists. The validation of the query will be done against the User type. Let's see the definition of the User type:

type User {
    id: String!
    name: String!
}

The above client-side query is of type User. As we can see from its schema, it has two fields - id and name of type String. The exclamation mark at the end of the field definition signifies that it is a non-nullable field. The server will return some value other than null for this field.

If the client requests something like -

{
    me {
        name
        location
    }
}

The server would throw an error before executing the query as location field is not defined on the User schema.

Scalar Types in GraphQL

The name field in the above query is at the leaf level. It would resolve to concrete data at run-time. The fields that yield some data at run-time and can longer be divided into sub-fields are called Scalar Types. GraphQL supports the following scalar types:

Int: A signed 32-bit integer
Float: A signed double-precision floating-point value
String: A UTF-8 character sequence
Boolean: true or false
ID: It is used as a unique identifier

Enumeration Type

We can also define enums in the GraphQL schema as:

enum postCategory {
    TUTORIAL
    PRODUCT
    PROMOTIONAL
}

The enum postCategory restricts the value of postCategory field to be one among TUTORIAL, PRODUCT and PROMOTIONAL. The GraphQL engine would throw an error if the postCategory field holds any other value than the ones listed above.

Object Types

Object Types, scalars, and enums and the only kinds of types that can be defined in the GraphQL Schema. We have already learned about Scalars and enums. The Object Types are the basic components of a GraphQL schema. They represent the shape of the schema that you can fetch from the server. For example, let’s consider the User type:

type User {
    id: String!
    name: String!
}

The User type defines the fields that can be queried for a user. It is called as an Object Type.

Lists in GraphQL Schema

We can define the type of a field as a List by using [] as:

type Post {
    id: String!
    title: String!
    body: String!
}type User {
    id: String!
    name: String!
    posts: [Post]
}

Notice the type of posts field in the User type. It is an array of type Post. We can add type modifiers to the above User type declaration as:

type User {
    id: String!
    name: String!
    posts: [Post]!
}

The exclamation mark at the end of the posts field suggests that the value of posts cannot be null. It will have some definite value. To indicate that the list can be empty but the values inside it cannot be empty, we will have to define the posts field as below:

type User {
    id: String!
    name: String!
    posts: [Post!]
}

Exploring GraphiQL

The GraphQL team has built an amazing user-interface GraphiQL to play around with GraphQL queries. Let's see some of the Star Wars queries in action:

{
    allPeople {
        people {
            name,
            birthYear,
            gender,
            height
        }
    }
}

When you execute the above query in GraphiQL, it will return the response in the same format as expected. This is the response returned for the above query:

{
    "data": {
        "allPeople": {
            "people": [
                {
                    "name": "Luke Skywalker",
                    "birthYear": "19BBY",
                    "gender": "male",
                    "height": 172
                },
                {
                    "name": "C-3PO",
                    "birthYear": "112BBY",
                    "gender": "n/a",
                    "height": 167
                },
                {
                    "name": "R2-D2",
                    "birthYear": "33BBY",
                    "gender": "n/a",
                    "height": 96
                }
            ]
        }
    }
}

people is an array of objects having the same shape as required by the client. On the right-hand side of the GraphiQL, you will find a link for the documentation. GraphQL also helps with the documentation of Type Systems. We will learn more about it in the later articles of the series. Click on root to see all the fields defined on it. root is an object type. Expand the allPeople field. Notice its type as PeopleConnection. Let's check the fields defined on the type PeopleConnection. The people field is of type list of Person type. The Person type has a number of fields including id, name, birthYear, etc. We are only using some of these fields in our view and GraphQL sends back only those fields in the response. Cool Stuff!

Let’s do something interesting!

Here we want a list of films. Each of these films has some connections with different types of species and planets. The use-case is to list down the films’ title, releaseDate, producers, speciesConnection and planetsConnection. The speciesConnection is of type list with fields as name and classification. The field speciesConnection is also of type list with fields as name and diameter.

{
    allFilms {
        films {
            title,
            releaseDate,
            producers,
            speciesConnection {
                species {
                    name,
                    classification
                }
            },
            planetConnection {
                planets {
                    name,
                    diameter
                }
            }
        }
    }
}

Try out the above query in GraphiQL. Here’s the shape of the response for the first film:

{
    "data": {
        "allFilms": {
            "films": [
                {
                    "title": "A New Hope",
                    "releaseDate": "1977-05-25",
                    "producers": [
                        "Gary Kurtz",
                        "Rick McCallum"
                    ],
                    "speciesConnection": {
                        "species": [
                            {
                                "name": "Human",
                                "classification": "mammal"
                            },
                            {
                                "name": "Droid",
                                "classification": "artificial"
                            },
                            {
                                "name": "Wookiee",
                                "classification": "mammal"
                            },
                            {
                                "name": "Rodian",
                                "classification": "sentient"
                            },
                            {
                                "name": "Hutt",
                                "classification": "gastropod"
                            }
                        ]
                    },
                    "planetConnection": {
                        "planets": [
                            {
                                "name": "Tatooine",
                                "diameter": 10465
                            },
                            {
                                "name": "Alderaan",
                                "diameter": 12500
                            },
                            {
                                "name": "Yavin IV",
                                "diameter": 10200
                            }
                        ]
                    }
                }
            ]
        }
    }
}

While REST would have taken multiple round trips to fetch the above information, GraphQL does that in one single round trip.

Conclusion

In this tutorial, we built up the foundations of GraphQL. Let’s recap all that we have learned so far

There are some issues with the REST Architecture style such as Multiple endpoints, Over-fetching, and Under-fetching of resources. There is no efficient way of handling various versions of native applications.
Facebook experiences these problems with REST and hence built the GraphQL. GraphQL is a declarative way of specifying the data requirements on the client-side. It can be operated on a single end-point. It is more structured than REST.
We can fetch multiple resources in a single endpoint. This reduces the time wasted in multiple round-trips from the browser. With GraphQL, the client specifies the data it needs in the form of a query. The server sends back the response in the exact same shape.
GraphQL has a schema definition language for defining the types on a query. This makes it easy to develop applications without making assumptions on the type of fields.
The Object types, scalar types, and enums are the only kinds of types that can be defined in the GraphQL schema.

In the next article, we will learn more interesting things as listed below:

How to pass arguments in a query
How to define Alias and Fragments
How to reuse the GraphQL types by using interfaces
What are Mutations and Subscriptions in GraphQL

Next in series: Fundamentals of GraphQL

This article was originally published on Buddy’s blog.