Practice of GraphQL in microservice architecture

28 min readSep 27, 2017

For nearly half a year, authors have been using GraphQL, a relatively new technology to develop Web services. Compared to earlier SOAP and REST, GraphQL actually provides a relatively complete set of query languages instead of Similar to the REST design specification, so the language ecology is required to provide the corresponding framework support, but since it has only been open sourced for two or three years, it will actually be used in the process of using it, especially in the micro-service architecture. Encountered a lot of problems.

In this article, we will first briefly introduce what GraphQL is and what it can solve; after that, we will focus on the use of GraphQL in the microservices architecture and the thorny issues encountered in the practice. In the end, the author will Given the design of the reasonable GraphQL micro-service architecture, I hope to provide some help to engineers who also use GraphQL in the micro-service architecture. As for whether the suggestions can meet the needs of the reader in a specific business scenario, the readers need to Judged.

GraphQL

Simple Object Access Protocol (SOAP) is a very old Web services technology from today. Although many services are still using SOAP-compliant interfaces, today’s REST-style resource-oriented API interfaces are very popular. It is also very mature; but the protagonist to be introduced in this article is actually another more complex and complete query language GraphQL.

As a query language launched by Facebook in 2015, GraphQL can provide a complete and easy-to-understand description of the data in the API, enabling the client to obtain the data it needs more accurately, including Facebook, Twitter and GitHub. Companies have already used GraphQL to provide APIs in production environments; in fact, whether or not we decide to use GraphQL in a production environment, it is indeed a technology worth learning.

Type system

GraphQL’s powerful expressive ability mainly comes from its complete type system. Unlike REST, it regards all resources in the entire Web service as a connected graph, rather than a resource island, which can be accessed when accessing any resource. Access other resources through connections between resources.

When we visited as FIG Userresource, can be accessed by connecting the current GraphQL Userof Repoand Issueresources, we no longer need access to these resources by a plurality of interfaces respectively REST, only through the following query can be shown Get all the results in one go:

{
    user {
        id
        email
        username
        repos(first: 10) {
            id
            url
            name
            issues(first: 20) {
                id
                author
                title
            }
        }
    }
}

GraphQL can aggregate multiple requests in the original RESTful style into one request, which not only reduces the delay caused by multiple requests, but also reduces server pressure and speeds up the rendering of the front end. Its type system is also very rich, in addition to scalar, enumeration, list and object types, it also supports advanced features such as interfaces and union types.

In order to better represent empty and non-empty field, GraphQL also introduced Non-Nullother non-empty type identifier representative of, for example, String!represents a non-empty string.

schema {
  query: Query
  mutation: Mutation
}

Schema types are common in most of the object types, but each in a Schema has two special types: queryand mutationthey are GraphQL the entrance of all queries, while using all the query interface is querya sub-field, all change the server resource requests should belong to mutationtype.

Centralized vs decentralized

GraphQL shows the resources in the entire Web service in the form of graphs. In fact, we can understand that it displays the entire Web service to the front end and the client in the form of “SQL”. The resources of the server are finally aggregated into a complete In the figure, the client can call itself according to its needs. The requirement of adding a field no longer requires the backend to be modified multiple times.

Unlike RESTful, each of the GraphQL services actually provides an endpoint for calling the internal interface, and all requests access the exposed unique endpoint.

GraphQL actually become a plurality of HTTP request requests the polymerization, it is only the resources into a plurality of RESTful requests a resource from the root Postaccess to other resources Commentand AuthorFIG multiple requests into different fields of a request, From the original decentralized request to a centralized request, this method is very suitable for the single service to directly provide GraphQL services, can establish a very clear separation between the data source and the presentation layer, but also through some powerful Tools such as GraphiQL provide visual documentation directly; however, as the business complexity index improves , microservicearchitecture becomes an indispensable solution to solve certain problems, so how to use GraphQL to improve front and rear in the microservice architecture The efficiency of communication and the reduction of development costs have become a problem worth considering.

Relay standard

If RESTful is actually a fixed standard defined by the client and server in the HTTP protocol communication, then Relay is actually a set of specifications we can follow using GraphQL.

The emergence of such standards allows different engineers to develop more similar communication interfaces. In some scenarios, such as the common requirements of identifying objects and paging, the introduction of well-designed standards can reduce communication costs between developers.

The Relay standard actually sets some specifications for the three most common API-related issues:

Provide a mechanism to reacquire objects;
Provide a description of how to page the connection;
Standardized mutationrequest to make them more predictable;

By normalizing the above three problems, it is possible to greatly increase the efficiency of the front and rear ends for interface formulation and docking.

Object identifier

NodeRelay is an interface defined in the standard, all follow the Nodetype of interface should contain a idfield:

interface Node {
  id: ID!
}type Faction : Node {
  id: ID!
  name: String
  ships: ShipConnection
}type Ship : Node {
  id: ID!
  name: String
}

FactionAnd Shiptwo types have a unique identifier idfield, we can regain the identifier from the server through the corresponding objects, Nodeinterfaces, and fields by default assumes that the entire service of all resources idis different, but many the type and we are idbound together, the composition can be a particular type ID; to ensure idopacity, returns idare often Base64 encoded character string, GraphQL server receives a corresponding iddecoding can be obtained related information .

Connection and paging

In a common database, many relationships are very common, a Usercan have multiple simultaneous Postand multiple Commentnumber of these resources are not finite in theory, there is no way to return all in the same request, so be on the This part of the resources is paginated.

query {
  viewer {
    name
    email
    posts(first: 1) {
      edge {
        cursor
        node {
          title
        }
      }
    }
  }
}

Relay through abstract “Connecting the model” one to many relationship to provide support for fragmentation and paging, the Relay seems that when we get one of Usera plurality of corresponding Postwhen, in fact, been a PostConnection, that is a connection :

{
  "viewer": {
    "name": "Mena",
    "email": "i@mail.com",
    "posts": {
      "edges": [
        "cursor": "YXJyYXljb25uZWN0aW9uOjI=",
        "node": {
          "title": "Post title",
        }
      ]
    }
  }
}

In a PostConnectionwill there is a plurality of PostEdgeobjects, which cursoris what we used to do pagination field, all cursorare actually Base64 encoded string, which can remind the caller cursoris an opaque pointer to get the current cursorafter it can be as the afterparameters passed to the next query:

query {
  viewer {
    name
    email
    posts(first: 1, after: "YXJyYXljb25uZWN0aW9uOjI=") {
      edge {
        cursor
        node {
          title
        }
      }
    }
  }
}

When we want to know whether the current page is the last page, in fact, only need to use every connection in the PageInfoobject, which contains a lot of information related to paging, an object generally has the following structure and field connections, such as: Edge, PageInfoas well as cursors and nodes.

PostConnection
├── PostEdge
│   ├── cursor
│   └── Post
└── PageInfo
    ├── hasNextPage
    ├── hasPreviousPage
    ├── startCursor
    └── endCursor

Relay uses a lot of features to build abstractions around the connection, which makes it easier for us to manage the cursors in the client. The entire connection-related specification is actually very complicated. You can read the Relay Cursor Connections Specification to learn more about connections and cursors. design.

Variable request

Each Web service can be thought of as a large, complex state machine that provides two different interfaces to the outside world. One interface is the query interface, which can query the current state of the state machine, while the other interface can be changed. A variable operation of the server state, for example POST, a DELETErequest.

By convention, all mutable requests should start with a verb and their inputs end with Input, and all output ends with Payload:

input IntroduceShipInput {
  factionId: ID!
  shipName: String!
  clientMutationId: String!
}type IntroduceShipPayload {
  faction: Faction
  ship: Ship
  clientMutationId: String!
}

In addition, the variable incoming request can also clientMutationIdensure idempotent requests.

Summary

Facebook’s Relay standard is actually a convention for common domain issues on GraphQL. Through this kind of agreement, we can reduce the communication cost of engineers and the maintenance cost of the project and ensure the uniformity of the external interface provided by the multi-person collaboration.

N + 1 problem

In the traditional back-end service, the problem of N + 1 query is very obvious. Because the one-to-many relationship in the database is very common, plus most of the current services use ORM to replace the data layer, so many times related issues are It won’t be exposed, it will only be discovered if there is a real performance problem or a slow query.

SELECT * FROM users LIMIT 3;
SELECT * FROM posts WHERE user_id = 1;
SELECT * FROM posts WHERE user_id = 2;
SELECT * FROM posts WHERE user_id = 3;SELECT * FROM users LIMIT 3;
SELECT * FROM posts WHERE user_id IN (1, 2, 3);

As a more flexible way of providing API services, GraphQL is more prone to the above problems than traditional Web services. Similar problems may be more serious when they occur, so we need to avoid the N + 1 problem.

Database-level N + 1 queries we can be solved by reducing the number of SQL queries, we will be more general =query into INthe query; however GraphQL in N + 1 is a bit more complicated, especially when the resources required by RPC requests When you get from other microservices, you can't solve it by simply changing the SQL query.

Before dealing with the N + 1 problem, we need to really understand how to solve the core logic of this type of problem, that is, turn multiple queries into one query, and turn multiple operations into one operation, which can reduce the number of requests due to multiple requests. The overhead — network latency, request parsing, etc.; GraphQL uses DataLoader to solve the N + 1 problem from the business level, the core logic is the entire number of requests, through bulk requests to solve the problem.

Microservice architecture

Microservices architecture has become a common solution when it comes to complex or complex business situations, increased team size, and high concurrency requirements or problems. When the microservice architecture encounters GraphQL, there will be many theoretical collisions. There are a lot of usage methods and solutions.

In this section, we will introduce the common problems encountered when using GraphQL in the microservice architecture. What solutions are needed to balance these issues, and also analyze the design philosophy of GraphQL. what.

When we incorporate the standards of GraphQL into the microservices architecture, we encounter three core problems, which are mainly a series of technical difficulties introduced when migrating from single services to microservice architectures. These technical difficulties are difficult. And the trade-off between choices is the key to practicing GraphQL in microservices.

Schema design

GraphQL’s unique Schema design actually brings a lot of variables to the architecture of the entire service. How to design and expose the external interface determines how we can implement user authentication and authentication and routing layer design.

In general, there should be only two types of GraphQL interfaces exposed by the microservices architecture; one interface is decentralized, and each microservice exposes different endpoints to each other and provides services to the outside world.

In this case, the routing of traffic is distributed according to the different services requested by the user, that is, we have some of the following GraphQL API services:

http://localhost/posts/api/graphql
http://localhost/comments/api/graphql
http://localhost/subscriptions/api/graphql

We can see that the current blog service is provided by content, comments, and subscriptions in three different services. At this time, the benefits of the GraphQL service are not fully utilized. When the client or front end needs resources of multiple services at the same time, it needs Requesting resources on different services separately does not satisfy all requirements with one HTTP request.

Another way actually provides a centralized interface from which all the services of external micro jointly expose an endpoint, it is not based on the URL of the request, but then routed traffic route in accordance with the request in a different field.

This way of routing can’t be done by traditional nginx, because in nginx, the entire request actually has only one URL and some parameters. We only need to parse the query in the request parameter to know which resources the client has accessed.

http://localhost/api/graphql

The resolution requests actually parse tree, which is actually a part of the analytical business logic, and here we need to know is that in this request is in accordance with Schema design fieldfor routing, GraphQL in fact help us complete analytical In the process of querying the tree, we only need to implement the logic returned by the specific Resolver processing for the corresponding field.

However, when multiple microservices provide Schema, we need to integrate the Schema of multiple services through a mechanism. The most important idea of this Schema integration is to solve the problem of duplicate resources and conflicting fields between services. Services need to provide the same type of base resources at the same time, for example: Usercan be accessed indirectly from multiple resources.

{
  post(id: 1) {
    user {
      id
      email
    }
    id
    title
    content
  }

As a developer or provider of microservices, the relationship between different microservices is equal. We need a higher level or more business-oriented service pair to provide the functionality of the integrated schema to ensure the fields between the services. Resource types do not conflict.

Prefix

How to resolve conflict resources From the current point of view, there are two different ways. One is to add a namespace for the resources provided by multiple services. Generally speaking, it is a prefix. When merging a schema, you can avoid duplicate fields in different services by adding prefixes. The possibility of conflict.

SourceGraph actually uses this method of adding prefixes when practicing GraphQL . This method has low implementation cost and can quickly solve the Schema conflict in microservices . The advantage of this prefix-resolving conflict is that the development cost is very low, but it treats the resources of multiple services as islands, there is no way to concatenate resource relationships in multiple different services. This will actually cause a certain loss of experience for the centralized design of GraphQL.

Bonding

In addition to the addition of prefixes, which are very low engineering development costs, GraphQL officially offers a solution called Schema Stitching that glues the GraphQL Schemas of different services and exposes a uniform interface. Ability to glue different resources across multiple services to take full advantage of GraphQL.

In order to open up the barriers between resources between different services and establish a reasonable and complete GraphQL API, we actually need to do some extra work, that is, to complete the processing of public resources in the upper layer; when the entire Schema is merged, if it encounters the public Resources are parsed using a specific Resolver whose logic is specified during Schema Stitching.

const linkTypeDefs = `
  extend type User {
    chirps: [Chirp]
  }
`;

We need to define the common resources between the services at the service layer on the service layer, and create a new Resolver for these public resources. When GraphQL parses the public resources, we will call the Resolver that we passed in the merge schema. Parsing and processing.

const mergedSchema = mergeSchemas({
  schemas: [
    chirpSchema,
    authorSchema,
    linkTypeDefs,
  ],
  resolvers: {
    User: {
      chirps: {
        fragment: `... on User { id }`,
        resolve(user, args, context, info) {
          return info.mergeInfo.delegateToSchema({
            schema: chirpSchema,
            operation: 'query',
            fieldName: 'chirpsByAuthorId',
            args: {
              authorId: user.id,
            },
            context,
            info,
          });
        },
      },
    },
  },
});

In the whole process of Schema Stitching, the most important method is mergeSchemasthat it accepts a total of three parameters, a Schema array that needs to be glued, multiple Resolvers, and callbacks when the type conflicts:

mergeSchemas({
  schemas: Array<string | GraphQLSchema | Array<GraphQLNamedType>>;
  resolvers?: Array<IResolvers> | IResolvers;
  onTypeConflict?: (
    left: GraphQLNamedType,
    right: GraphQLNamedType,
    info?: {
      left: {
        schema?: GraphQLSchema;
      };
      right: {
        schema?: GraphQLSchema;
      };
    },
  ) => GraphQLNamedType;
})

Schema Stitching is actually a better way to solve the multi-service common exposure of Schema. This method of bonding Schema is actually the official recommendation of GraphQL, and they also provide users with JavaScript tools, but it needs us to merge. Schema’s place to manually handle common resources and conflict types between different schemas, and to define some Resolvers for parsing common types; in addition, the current Schema Stitching function of GraphQL does not have a language other than JavaScript. Official support, as an important component that carries functions such as service discovery and traffic routing , stability is very important, so you should carefully consider whether you should use it for Schema Stitching components.

Combination

In addition to the above two ways to solve the problem of external exposure of a single GraphQL, we can also use a very traditional RPC way to combine the functions of multiple microservices to provide a unified GraphQL interface:

When we use RPC to solve the problem of GraphQL Schema under the micro-service architecture, all the internal service components are not much different from the services in other micro-service architectures. They all provide RPC interfaces externally, but we use another way. GraphQL integrates resources from multiple microservices.

Using RPC to solve the problem in microservices is actually a more general and stable solution. GraphQL is a centralized interface provider. It is also reasonable to call the interfaces of other services through RPC and merge and integrate them. In this architecture, we can actually expose the RESTful interface to each microservice directly or through other business components to provide more access methods when the GraphQL interface is provided.

Although the use of RPC provides more flexibility for our services, it also enables the splitting of GraphQL-related functions into separate services, but this gives us some extra work, which requires engineers to manually stitch The interface of each service provides the GraphQL service externally, which may also result in modification and update of multiple services when business requirements change.

Summary

From the use of prefixes and glues to the interface provided by the various microservices using RPC, the exposed schema is actually a process of gradual aggregation from point to point, and the complexity of implementation will gradually increase.

In these three ways, the author does not recommend using a prefix to isolate the interfaces provided by multiple microservices. This approach does not take full advantage of the benefits of GraphQL. It is better to use RESTful to directly decouple the interfaces of multiple services, using GraphQL. Instead, there is some feeling of abuse.

In addition to the use of prefixes, both glue and composition provide a complete GraphQL interface, both of which need to integrate the interfaces provided by each microservice in the directly docked user’s GraphQL service, when we use Schema When Stitching, in fact, more requirements for the latter services — engineers developing micro-services need to master the design and development methods of GraphQL Schema, at the same time, the types between micro-services may also conflict, need to be in the upper layer Resolve, but it also reduces the workload of some of the top GraphQL services.

In the end, using a combination means that the GraphQL service in the entire architecture needs to handle all the logic related to GraphQL by combining RPC, which is equivalent to pulling all the logic related to GraphQL to the forefront.

After several architecture refactorings, in the microservices architecture, authors prefer to use the RPC to combine the various microservices to provide the GraphQL interface. Although this brings more work, it can have better. Flexibility, and developers of other microservices do not need to understand the design specifications and conventions associated with GraphQL to make the responsibilities of each service clearer and more controllable.

Certification and authorization

In a common web service, how to handle user authentication and authentication is a key issue, because we need to understand how we authenticate and authorize users in the service using GraphQL.

If we decide that the Web service as a whole exposes the interface to GraphQL, then to a large extent, the way the Schema is designed determines how the authentication and authorization should be organized; as an article that introduces GrassQL’s practice in the microservices architecture, We will naturally introduce how the user’s authentication and authorization methods should be done under different Schema designs.

In the previous section, three different Schema design methods were mentioned: prefix, glue, and combination. These design methods will end up with an architecture diagram as shown below:

All structures using GraphQL will eventually accept a GraphQL request from the client from a centralized service, even if it is just a proxy. When we have the architecture diagram of this GraphQL service, how to design the user’s authentication and authorization It became very clear.

Certification

First of all, it is unreasonable for the user’s authentication to be implemented in multiple services. If the logic related to user authentication needs to be processed in multiple services, it is equivalent to assigning the responsibility of one service to multiple services at the same time. Sharing user authentication related tables, usersand sessionsso on, so it is more appropriate for a service to handle user authentication related logic throughout the Web service.

This service can be either the GraphQL service itself as a gateway proxy or an independent user authentication service. Each time a user requests, the user will be authenticated by RPC or other means. The user’s authorization function is used. There are some differences with certification.

Authorization

We can choose to add authorization function in the GraphQL service, or choose to judge whether the current user has permission to operate on a certain resource in each micro service. This is actually a trade-off between centralized and distributed. Each has its own advantages. The former leaves the right to authenticate to each microservice, allows them to autonomy, and judges whether the requester can access the latter to modify resources according to their business needs. The latter actually decouples the entire authentication process. The internal microservices unconditionally trust requests from the GraphQL service and provide all services.

The above design is actually done when we only need to provide a GraphQL endpoint. When the business needs to provide the B-side, C-side or management background interface, the design may be completely different.

At this point, if we assign the authentication work to multiple internal microservices, each service needs to provide different interfaces to different GraphQL services (or Web services) and then authenticate separately; but authentication will be performed. The work is handed over to the GraphQL service is a better way, the internal micro-service does not need to care whether the caller has access to the resource, the authentication is handled by the outermost business service, and a better solution is achieved. Coupling.

Of course, full trust in the call of other services is actually a more dangerous thing. For some important business or request calls, the external legal control system can perform a second check to determine the legality of the current requester’s call.

How to implement a complete and effective wind control system is not the main content of this article. Readers can find relevant information to understand the principles and models of the wind control system.

Summary

The design of authentication and authorization is inherently a flexible thing in the system. Whether we use GraphQL as an external interface in the microservice architecture, it is a good choice to hand this part of the logic directly to the exposed service. Because the directly exposed service has more context related to the current request, it is easier to authenticate the source user and its permissions, and important or high-risk business operations can manage risk by adding additional risk control services, or The routing layer restricts the caller of RPC through whitelisting, which can decouple different functions and reduce duplication between multiple services.

Routing design

As a very important part of microservices, how to deal with the design of the routing layer is also a key issue; but similar to authentication and authentication, the design of the Schema ultimately determines how the requested route is done.

GraphQL Schema Stitching is actually a solution for GraphQL in the microservices architecture with routing system. It can process the corresponding request fragments to other microservices through the HTTP protocol when the gateway server Resolve requests. The whole process is not Manual intervention is required, and the corresponding callback is executed only when there is a conflict in the type.

The combination method is equivalent to manually implementing the forwarding request in Schema Stitching. We need to implement the corresponding field parser in the externally exposed GraphQL service to call the HTTP or RPC interface provided by other services to get the corresponding data.

The routing design in GraphQL is similar to the routing design in the traditional micro-service architecture, except that GraphQL provides Stitching related tools to glue Schema in different services and provide forwarding services. We can choose to use this bonding method. You can also choose to obtain the resources requested by the user through HTTP or RPC in the Resolver.

Evolution of architecture

From the beginning of this year, I chose to use GraphQL as a service exposed API. The current service structure has been evolving and changing for half a year. In the process, I have experienced a lot of problems, and I have been working on it once and for all. The service architecture is adjusted, and the entire evolution process can be divided into three phases, from the use of RPC combination to Schema Stitching and finally back to using RPC.

Although in the process of the evolution of the entire architecture, the initial and final choice of technical solutions are all using RPC to communicate, but there are many differences and differences in the implementation details, which is the process of our business becoming more and more complicated. found.

Centralized Schema and RPC

When the whole project was just started, it was decided to use the micro-service architecture for development. However, because the technology stack that was chosen at the time was Elixir + Phoenx, many infrastructures were not perfect. For example, gRPC and Protobuf did not have an official version. Elixir achieved, although there are some open source projects of completed projects, but they are not stable, so the final decision on the RabbitMQ implements a simple RPC-based message queue framework, and through the combination interface provides GraphQL way of external.

RabbitMQ assumes the function of message bus in the micro-service architecture. All RPC requests are actually converted into messages in the message queue. When the RPC is invoked, the service will deliver a message to the queue corresponding to RabbitMQ and continuously monitor the callback of the message. Waiting for responses from other services.

The advantage of this approach is that the queue in RabbitMQ assumes the function of “service discovery”, decoupling the requester from the servant by means of a queue, and routing the RPC request, so the downstream consumers (service parties) can scale horizontally. However, this method can also be implemented by load balancing. Although load balancing is not clear about the load of the servant, the effect of forwarding the request may not be as efficient as the active pull of the servant.

The most critical issue is that the handcuffed RPC framework as a basic service is not mature enough if it is not fully tested and the production environment is tested, and as a language-independent way of calling, we may need to implement the RPC framework for many languages simultaneously. This actually brings a very high manpower, testing and maintenance costs, and now it is not a very desirable method.

If we leave the language aside, using RPC in a more mature language to communicate can really reduce the cost of development and maintenance, but there is another big price. When the business is unstable and needs frequent changes. Internal services often add extra fields to the exposed RPC interface, which also requires the top GraphQL service to do extra work:

Each service modification will result in three related services or warehouses being updated, which is a normal and reasonable thing in the micro-service architecture, but it will lead to a lot of extra work in the early stages of the project. It is also the main reason for our first architecture migration.

Decentralized Schema

Here’s to the center of fact does not mean GraphQL Foreign expose multiple endpoints, but to different GraphQL fieldof decentralized development process, in order to solve the center of the Schema brings together the efficiency of RPC development and practice of official Schema GraphQL In the Stitching solution, we decided to decentralize Schema's management, and each microservice directly exposed the GraphQL request and merged the Schemas of multiple services to solve the development efficiency problem.

Using Schema Stitching to glue different GraphQL Schemas from multiple services into a larger Schema. The most critical component of this architecture is the tool for Schema bonding. As mentioned above, except Javascript Other languages are not supported by official tools, nor are they used in a large-scale production environment, and because we use a relatively small language, Elixir, there is no tool that can be used out of the box.

After evaluation, we decided to make a layer of packaging on GraphQL Elixir implementation Absinthe, and parse the client’s request syntax and semantics, package the corresponding tree of the field into a subquery and send it to the downstream service, and finally the first GraphQL Service combination:

GraphQL total front-end service comprising two core components, and are GraphQL Stitcher Dispatcher, wherein the former is responsible GraphQL to each service request IntrospectionQueryand all Schema obtained bonded into a huge tree; When the client makes a request, Graphql Dispatcher will pass The syntax parses the current request and converts the different fields and subfields into trees and forwards them to the corresponding service.

In the process of implementing GraphQL Stitcher, we need to pay special attention to the type conflict between different services. In the process of implementation, we do not support the type conflict and the problem of cross-service resources, but adopt the method of coverage. The big problem is that the internal GraphQL service doesn’t really know which types in the entire schema are already in use, so it often causes type conflicts between services. We only add prefixes to resolve conflicts when we find them.

Adding a prefix is an easy way to resolve conflicts, but it is not particularly elegant. The main reason for using this method is that we have found that due to the design flaws of the privilege system — it cannot be elegantly implemented when introducing the B-end user. Authentication, so choose to use a simpler method to temporarily resolve the type conflict.

In the development of various internal services, we adopted scopethe way the user has permission to read and write resources have been restricted, whether internal service user before performing the operation will first check the resource request can read and write, and then start processing real business logic That is, user authentication occurs in all internal services .

When our external exposure GraphQL service is only for the C-terminal user when using scopeand let the internal service for authentication when in fact able to meet the demand for the C-terminal interface, but when we need to provide GraphQL or RESTful interface for the end user B, This method of authentication is actually very embarrassing.

In the micro-services architecture, because the database between each service is isolated, for a database record, many internal services can only know which user or users who belong to the current record, which is why scopeit is transferring resources, read and write requests In addition, the source user can make the request processing service determine whether the current source user has permission to access the record.

This conclusion is based on the assumption we made that all the requests received by the microservices actually require reading and writing resources owned by the source users , so we have encountered considerable difficulties in introducing the B-side users. The temporary solution we adopted. that is, the current user scopeto add some additional information and add new interfaces within the service meet the needs of the B-side query, but due to the B-side for the query requires resources can be very diverse, when we need to be different for different query interface It scopeis difficult to express such complex authentication requirements when the permissions are restricted and the users on the B side cannot access the resources of all users .

In this Schema management decentralized architecture, we have encountered two more important issues:

The components for Schema Stitching are not supported by the official or large open source project for the Elixir language. The components of the handcuffs are under great pressure to carry a large service load, and the functions are also very imperfect;
In the case where the internal service does not have too much context for the entire request, once the complex authentication requirements are encountered, the design of the authentication to the internal service will lead to an increase in the coupling between the services — the need for micro-services Constantly passing the context of the request for authentication, while also increasing the cost of development;

Service Grid and RPC

The use of decentralized management Schema has reduced the development work to a certain extent, but we also encountered two unacceptable problems under this architecture. In order to solve these problems, we are prepared to make the following technical architecture. Modifications allow each service to communicate more flexibly:

In the latest architecture design, we use linkerd to handle communication between services. All internal services do not independently authenticate source requests. They are only responsible for providing external RPC interfaces. Here, gRPC and Protobuf are used to interface different services. Management, all authentication takes place in the outermost Web service. The GraphQL service for the C-end user and the Web service for the B-side user respectively authenticate the source request, and then respond to it through authentication. The service initiates an RPC request, and the requested route and traffic are forwarded by linkerd.

Linkerd is an implementation of Service Mesh technology. It is an open source network agent that provides service discovery, management, monitoring and other functions for services without changing existing services. We are in this article. The technology of introducing the service grid will not be launched, and interested readers can find relevant information.

Since the B-side user may involve more query requests, and these requests are very complicated, we can choose to use the library to synchronize the data of other services, implement the corresponding query function inside the service, and of course use the data center or The way the warehouse processes the data is provided to external services for the B-end users.

This kind of service organization is more like a modification of the first version of the architecture. By introducing linkerd to solve the problem of service discovery, routing and governance, some micro-service common infrastructure is handed over to a relatively mature open source project, and authentication is performed. The logic has been moved up to several directly exposed Web services, and the internal services are no longer responsible for authentication. Although there will still be a change in the service interface at this time, it will lead to multiple modifications, but from Now look at this to maintain the cost of the flexibility of the service.

To sum up

From the beginning of the use of GraphQL to the present nearly half a year, in the process of practicing GraphQL in microservices, we found the conflict between the design of microservices and GraphQL, that is , decentralization and centralization .

As a centralized query language, GraphQL should only expose an endpoint in the best practice. This endpoint will contain all the resources that the current Web service should provide, and connect them reasonably, but the microservice architecture is exactly The opposite idea, its original intention is to split the big service into independent deployment services, so when we finally design the architecture, we separate the logic of these two parts, use the micro service architecture to split the service, through GraphQL The microservice interface combines and completes the authentication function while meeting the needs of two different designs.

In the process of architecture evolution, we encountered a lot of unreasonable design, and because we did not foresee the business expansion to bring about changes in requirements, resulting in the architecture can not elegantly implement new requirements; finally choose to use the service grid (Service Mesh) refactors the existing architecture, because microservices related matters should be done by a unified middle tier, and the logic costs associated with re-implementing service governance are also very high. The use of service grids has been with GraphQL. There isn’t much connection, and the GraphQL service is just an interface provided by an externally exposed endpoint combination of internal services. We can also replace the interface with RESTful or other forms, which does not have much impact on the overall architectural design; In fact, when the project is just started, the GraphQL interface should not be placed in a particularly important position. Dividing the boundaries between services and properly decoupling is a far-reaching thing.

In the end, we will find that in the micro-service architecture, GraphQL is actually only a part of the whole link. Perhaps some of the tools provided by the official are related to some problems in the micro-service, but from the perspective of the whole architecture, whether to use GraphQL externally is not It is especially important to decouple the responsibilities between services and provide a reasonable interface to the outside world. As long as the design of the architecture is reasonable, we can introduce a GraphQL service to combine the functions of other services at any time. The advantages are:

Combine multiple network requests into one, reduce the number of network requests between the front and back ends, and speed up the rendering of front-end pages;
Provides a very good debugging tool, GraphiQL , and can generate documents through code, saving document maintenance costs and communication costs;

It has to be said that GraphQL has many advantages as an emerging technology. Many companies are trying to use GraphQL to provide external APIs. Although this technology is not particularly mature at present, it has great potential.

Practice of GraphQL in microservice architecture

GraphQL

Type system

Centralized vs decentralized

Relay standard

Object identifier

Connection and paging

Variable request

Summary

N + 1 problem

Microservice architecture

Schema design

Prefix

Bonding

Combination

Summary

Certification and authorization

Certification

Authorization

Summary

Routing design

Evolution of architecture

Centralized Schema and RPC

Decentralized Schema

Service Grid and RPC

To sum up

Written by Mina Ayoub