An API Gateway is the entry point to access various capabilities of a complex platform via well-defined APIs. To illustrate this concept, lets say you build a website for an electronics store, and design an API /electronics for its homepage. This API, hosted on a single service, returns the list of electronics in the store plus recommended items. Now imagine the store grows over time and you decide to build specific services for consumer-durables and mobile-phones. Also, you decide to build a dedicated recommendation service. Thus, to render the homepage, you now have to integrate with multiple microservices, increasing the total roundtrips made to the backend. If a new category of electronics is added, the whole process of reintegration repeats. An API Gateway can absorb this complexity by connecting to the homepage once and orchestrating the requests to as many microservices as needed. Plus, you can independently evolve your microservices without having to change the client integration.
Risk team in PayPal faced a similar situation — we grew our ecosystem of risk services to create sophisticated checkpoints, but tight coupling with PayPal’s payment clients resulted in friction in migration. We built an API Gateway to decouple our services with clients. This resulted in neat, well defined APIs hosted in a single gateway service, with which one can avail the capabilities of Risk platform, as against disintegrated, individual APIs. Being single point of contact of your platform, its is imperative to design API Gateway service carefully. In this article, I will lay out key principles of an API Gateway and illustrate how these principles are used to design a robust, scalable and performant Gateway application.
1. Scalability and Performance
If your platform serves millions/billions of requests a day, it makes sense to invest in the right technology that can support high traffic without impacting the performance. Asynchronous non-blocking I/O (NIO) based frameworks are best suited for such cases. One of the preferred frameworks is Netty and using it as the HTTP container has been proven to provide the desired performance boost. Typically, the API Gateway makes calls to multiple backend services for an incoming request. If your backend microservices take long to process, the thread resources in your Gateway may get depleted causing starvation and hence will impact the Gateway’s ATB (Availability to Business). Thus, writing the API code in declarative way using a reactive coding pattern is the right choice for such applications. For e.g., one can use JAVA 8’s CompletableFuture as the reactive abstraction. To conclude, for a high traffic application such as API Gateway, its best to use the NIO framework and reactive programming model for optimal utilization of resources which yields in high performance. Following graph compares the latency (in ms) of Gateway application using Tomcat and Netty as HTTP container.
2. Request orchestration
One of the key functionalities of a gateway application is to fan out API requests to many backend microservices. To do this effectively, one needs to consider following crucial aspects:
- Protocol Translation: The inbound request API might be in a particular format which can be different from your downstream services. For e.g., API Gateway receives HTTPS RESTful requests, while some downstream services may expect Protobuf. The gateway server needs to ensure that appropriate connectors and translators are built in.
- Serial/Parallel invocation patterns: API Gateway should allow the invocation of downstream services in serial and/or parallel patterns. Along with that, the support for conditional routing should be available.
- Configurable routing: The routing patterns shall require no new code (or minimal code) to create a new route to a backend microservice. In other words, service routing should be driven by configurations. This ensures that the Gateway codebase stays lean as new APIs are instrumented.
3. Response translation
API Gateway may orchestrate requests to multiple downstream services as we noticed in above section. The client however, may need a single response. In such scenarios, API Gateway shall take the responsibility of merging different responses and dispatching a single, meaningful response. One effective way to achieve this is a lookup table. A lookup table defines the permutations of responses and lays out the final outcome for a particular combination. For e.g., suppose a gateway service calls n microservices, it can consolidate the n responses to one by referring to the lookup table, and then dispatching the outcome which matches the response combination.
4. Enabling Fallback
Fallback pattern is defined as routing the request to a lightweight version of your service in case of the failure in your primary service. This routing pattern prevents complete failure of your business and acts as a last line of defense. If building a lightweight service is too expensive, the API Gateway can simply route the call to a default function, thus enabling your service to provide a best-effort response rather than a complete failure.
5. Rate limiting and Access Control
While on one hand, API Gateway enables your business to expose its operations via APIs, it is also responsible to protect your microservices from traffic bursts and unauthorized access. Building Rate Limiter in Gateway ensures sudden traffic spikes do not hurt your application and hence protects from intentional/unintentional DDoS attacks. Certain APIs in your platform may be sensitive, and you would like to govern the access to them. API Gateway can restrict API access and allow only authorized applications to call your services.
6. Monitoring and Analytics
For any platform, it is crucial to have a monitoring dashboard which displays vital system and application metrics. Since API Gateway is the platform’s facade, it makes sense to build those monitoring and alerting features at this layer. Few examples of important metrics to consider are ATB (Availability to Business), API Latency and API Error Counts which provide insights into the health of the API Gateway itself. Apart from this, API Gateway can emit useful information from the API request and response to an offline store (preferably Hadoop), which can provide detailed business analytics of the API.
API Gateway for Risk Platform at PayPal
In conclusion, in the world of many diverse microservices, API Gateway is a natural choice to expose the capabilities in a well-defined fashion. Various parameters must be considered while designing the API Gateway, including the right technology for scalability, design patterns supporting flexible orchestration/consolidation, fallback services, rate limiters, access controls to APIs and ample monitoring and analytics features. Our team at PayPal has built an API Gateway on these principles which serves as the single point of entry to all Risk APIs in PayPal. At present, it serves multi million API calls in a day and has latency of just 1.4 ms (average) and 3 ms (99 %ile). CPU and memory utilizations are low even at peak traffic and hence, it is easy for us to serve the growing transactions without horizontally increasing capacity.