API Infrastructure at Knewton: What’s in an Edge Service?

Published in

Knerd

10 min readMay 9, 2016

Written by Paul Sastrasinh, Robert Murcek

The Knewton API gives students and teachers access to personalized learning recommendations and analytics in real time. In this post, we will pull back the covers of our API to explain how we handle user requests. You will first learn how to build an edge service with Netflix Zuul, the framework we chose for its simplicity and flexibility. Then, we’ll dive into the Knewton edge service to show you how it improves API simplicity, flexibility, and performance.

What’s in an Edge Service

An edge service is a component which is exposed to the public internet. It acts as a gateway to all other services, which we will refer to as platform services. For example, consider an Nginx reverse proxy in front of some web resource servers. Here, Nginx acts as an edge service by routing public HTTP requests to the appropriate platform service.

An example edge service: Nginx as a reverse proxy for two resource servers. Requests for images are routed to one server, while requests for text are routed to another.

Beyond Routing: Edge Service Architecture

The reverse proxy pattern makes sense if every one of your resource servers is hardened to receive raw internet traffic. But in the typical service-oriented architecture, implementing a simple reverse proxy would duplicate a substantial amount of security and reliability code across platform services. For example, we do not want to reimplement user authentication in each platform service. Instead, we factor these responsibilities directly into the edge service. It validates and authenticates incoming requests; it enforces rate limits to protect the platform from undue load, and it routes requests to the appropriate upstream services. Factoring these high-level features into the edge service is a huge win for platform simplicity.

Building this service was a substantial engineering task. We wanted to leverage our experience developing and deploying JVM-based services by creating our edge service following the same pattern. This made Nginx and Apache’s Lua and PHP-based scripting capabilities unattractive. We would have had to rewrite our standard libraries and debug code in a new paradigm. Instead, we built our edge service on top of Netflix Zuul.

The edge service is Knewton’s interface to the public internet. It is registered directly with an AWS Elastic Load Balancer (ELB), and is responsible for sanitizing and routing requests for the Knewton Platform. To maintain high availability, our edge service runs as a cluster. On startup, edge service nodes register themselves with the load balancer, which then distributes requests across the cluster.

Netflix Zuul

Zuul is a framework created by Netflix to handle their own API and website traffic. The framework is structured around filters, a concept borrowed from the Java Servlet Filter pattern. A Zuul edge service is made up of a series of such filters, each performing some action on an HTTP request and / or response before passing control along to the next filter. For example, a filter might add authentication information to request headers, write an error response to an invalid request, or forward a request to an upstream service for further processing. In this section, we will walk through an example Zuul filter to show you the simplicity of the pattern.

Zuul Filters

We will consider three categories of Zuul filters: pre-filters, route-filters, and post-filters. (Zuul also supports error-filters and static-filters.) Pre-filters run before the edge service routes a request; route-filters forward requests to upstream services; and post-filters run after the proxied service returns its response.

The edge service consists of a series of Zuul filters which work together to write a response for a given request. The route-filters make requests to platform services to retrieve data and update state.

Filters are defined by three pieces of logic:

Filter execution order
Conditional filter execution
Execution logic

Let’s dive into our example. We’ve written this filter in Java, but Zuul is compatible with all JVM languages.

Filter Execution Order

Zuul filters run in the same order for every request. This enables successive filters to make assumptions about the validations run and to access accumulated state. For example, we might store the request’s user ID in one filter and use it to apply rate limits in the next. Zuul filter ordering is defined with two methods: filterType, which specifies whether a filter is a pre-filter, route-filter or post-filter, and filterOrder, which orders filters of the same type.

// Defining the execution order for a Zuul filter in Java@Override
public String filterType() {
   // run this filter before routing
   return "pre";
}@Override
public int filterOrder() {
   // run this filter first among pre-filters
   return 0;
}Conditional Filter ExecutionFilters may be run conditionally for each request. This gives the designer significant flexibility. For example, a rate limit filter may not run for an authenticated user. This conditional logic is factored into the shouldfilter method.// Configuring a Zuul filter to run on every request@Override
public boolean shouldfilter() {
   // Always run this filter
   return true;
}// Configuring a Zuul filter to run on requests from unauthenticated users only
@Override
public boolean shouldfilter() {
   RequestContext context = RequestContext.getCurrentContext();
   return !(boolean)context.getOrDefault("IS_AUTHENTICATED", false);
}Execution LogicThe code to execute in a filter is defined in the runfilter method. Here, we make use of the static RequestContext object to store state for later filters, or to write the response.// Defining Zuul filter logic to record and check rate limits.@Override
public ZuulfilterResult runfilter() {
   RequestContext context = RequestContext.getCurrentContext();
   boolean isOverRate = rateLimiter.recordRate(context.getRequest());
   if (isOverRate) {
       context.set("IS_RATE_LIMITED", true);
   }
   return null;
}All together, this gives us an example filter:// A simple Zuul filter that runs for unauthenticated users and records
// whether a rate limit has been exceeded.
public class RateLimitfilter extends Zuulfilter {@Override
   public String filterType() {
       // run this filter before routing
       return "pre";
   }@Override
   public int filterOrder() {
       return 0;
   }@Override
   public ZuulfilterResult runfilter() {
       // records the request with the rate limiter and checks if the
       // current rate is above the configured rate limit
       RequestContext context = RequestContext.getCurrentContext();
       boolean isOverRate = rateLimiter.recordRate(context.getRequest());
       if (isOverRate) {
           context.set("IS_RATE_LIMITED", true);
       }
       return null;
   }@Override
   public boolean shouldfilter() {
       // should only run if the user is not authenticated
       RequestContext context = RequestContext.getCurrentContext();
       return !(boolean)context.getOrDefault("IS_AUTHENTICATED", false);
   }
}In this way, we build up a modular set of request-processing functionality. This pattern makes it easy for multiple engineers to contribute new features. At Knewton, engineers outside of the API team have committed code for edge service features.For more information about the design and lifecycle of a Zuul service, see this Netflix blog post.Zuul and the Knewton Edge ServiceZuul is an opinionated but barebones framework. While adding functionality is simple, you will have to implement the filters yourself. Now we will explain the most important filters we wrote for the Knewton edge service. You will learn how to reject bad requests, reverse-proxy good ones, and reduce interservice traffic.

Pre-filters

Pre-filters in our edge service validate, rate-limit, and authenticate incoming requests. This enables us to reject bad requests as early in the pipeline as possible.

Rate Limiting

Our edge service is responsible for protecting the Platform from bursts of requests. When the rate of requests from a given IP or user exceeds a specified threshold, the edge service responds with 429: Too Many Requests. This threshold helps to prevent harm to the platform from Denial of Service attacks and other excessive request load.Rate limits are enforced in a pre-filter so that these excess requests will not place any load on platform services. This pre-filter tracks the number of requests made during a one-minute window. If this number exceeds the rate limit, the filter skips further request processing and immediately writes a “429” response.

Logic in the rate limiting Zuul prefilter. For a simple rate limiting implementation with Redis, see the Redis documentation.Rates are stored in memory on each edge service node to provide the lowest latency possible to each request. The rates recorded by each node are reconciled asynchronously using a shared Redis cache. This means that, no matter which node handles a given request, all nodes will eventually acknowledge it. In practice, this reconciliation happens quickly; convergence occurs within a constant factor of AWS's region-internal network latency.

A request to an endpoint is rejected when more than the configured limit of requests have been made within a minute. edge service nodes coordinate request counts through a shared Redis cache.

A request to an endpoint is rejected when more than the configured limit of requests have been made within a minute. edge service nodes coordinate request counts through a shared Redis cache.Surge QueuingLoad on the Knewton API is not constant. Even within the broad patterns of school days and lunch breaks, request rates vary. Rather than passing these bursts on to platform services, our edge service smooths traffic, so that the platform sees smaller peaks. To accomplish this, our rate limiter follows the same Leaky Bucket pattern used in Nginx. Requests exceeding the rate limit get added to a fixed-length queue. The queued requests are processed at the rate limit, ensuring that the platform does not see a surge. If the queue is full when the rate limit is exceeded, these excess requests are rejected with a 429 error. This approach has the added benefit of simplifying Knewton API clients, because requests can be made in reasonable bursts without being rejected. Read more about how we implement request queueing in our next blog post.

A request surge queue sits in front of the rate limiter to smooth out bursts of requests.

A request surge queue sits in front of the rate limiter to smooth out bursts of requests.AuthenticationAt Knewton, we use OAuth to authenticate users. Each API request must contain a valid OAuth token. The edge service rejects requests which do not contain these tokens. This obviates the need for end-user authentication in upstream platform services and reduces interservice traffic by rejecting unauthenticated requests immediately at the edge.

Route-filters

The edge service is responsible for forwarding requests to the appropriate upstream microservice. Once a request has passed the gauntlet of Pre-filters, it is passed on to the Route-filters. Since Knewton services use Eureka for service discovery, we chose to use Netflix’s Ribbon HTTP client to make these upstream requests. Ribbon allows our edge service to automatically discover upstream services, load-balance traffic between them, and retry failed requests across instances of a service, all with minimal configuration.// Creates a Ribbon client with Eureka discovery and round robin
// load balancing
AbstractLoadBalancerAwareClient setupRibbonClient(String upstreamName) {
    ServerList<DiscoveryEnabledServer> serverList =
            new DiscoveryEnabledNIWSServerList(upstreamName);
    IRule rule = new AvailabilityFilteringRule();
    ServerListFilter<DiscoveryEnabledServer> filter = new ZoneAffinityServerListFilter();
    ZoneAwareLoadBalancer<DiscoveryEnabledServer> loadBalancer =
            LoadBalancerBuilder.<DiscoveryEnabledServer>newBuilder()
                               .withDynamicServerList(serverList)
                               .withRule(rule)
                               .withServerListFilter(filter)
                               .buildDynamicServerListLoadBalancer();
    AbstractLoadBalancerAwareClient client =
            (AbstractLoadBalancerAwareClient) ClientFactory.getNamedClient(upstreamName);
    client.setLoadBalancer(loadBalancer);
    return client;
}The Knewton edge service is not limited to endpoint-based routing. Our next blog post will cover specialized routing we have implemented to support shadow services and server-side API mocking.Post-filtersAfter receiving the upstream responses, the post-filters record metrics and logs, and write a response back to the user.

Response Writing

The Post-filter stage is responsible for writing a response to the user. It is worth emphasizing that the edge service must always write back some response. If an unhandled exception or upstream timeout were to prevent this, the user would be left idling. This means catching exceptions and handling errors in all situations, so that the edge service always returns a timely answer, even if that answer is an error code (ie. 5xx response).

Metrics and Logging

Recording the right metrics and logs helps to ensure quality of service of the Knewton platform.  The edge service encapsulates the entire lifecycle of every API request. It records the entire duration of each request and handles every failure. This uniquely positions the edge service to report on end-user experience. Our post-filters publish metrics and logs to Graphite relays and Splunk Indexers. The resulting dashboards, alerts, and ad-hoc queries give us all the information we need to investigate problems, optimize performance, and guide future development work. Watch for our upcoming blogpost on API monitoring for more details.

Conclusion

The Knewton edge service validates and rate limits API requests, authenticates users, routes requests to upstream microservices, and records end-to-end metrics. Building this logic into the edge service simplifies our platform and provides performance and reliability guarantees to our users. We built our edge service on top of Zuul because its filter-based design is simple and extensible, its Java construction was compatible with our architecture, and because it has been proven in production at Netflix.In this blog post, you learned how to build a Zuul edge service and how this design can improve your API. A follow-up post will describe custom logic we built to support surge queuing, shadow services, and API mocking in the Zuul framework.Thanks to all the contributors on the Knewton API Platform team: Paul Sastrasinh, Rob Murcek, Daniel Singer, Bennett Wineholt, Robert Schultheis, Aditya Subramaniam, Stephanie Killian