Reference Microservice Architecture

Published in

SplashLearn Engineering blog

7 min readMar 3, 2021

Authors: Anirudh Bhardwaj, Kapil Gupta, Hardik Patel, Akshat Verma

Overview: This article captures how we design micro services at SplashLearn. There are other operational aspects related to deployment (HA, data security, DDoS), which are outside the scope of service design, but essential as part of deployment strategy that this article does not cover.

Microservices Philosophy @SplashLearn

SplashLearn is evolving from a monolithic architecture to a modular services based architecture. Our primary goals behind this transitions are

(a) Separation of concerns — Our system performs many different functions and we would like each of those concerns to be evolved in separate services independent of each other. We define concern as a broad user capability as opposed to individual steps in a user interaction. Some of the common concerns we define are (a) pricing and catalog (b) payment © learning path creation, (d) subscription management.

(b) Scalability — We should be able to scale the service implementing every concern independently.

(c) Generic and extensible — Our services should be generic and extensible to support additional capability needed within the same concern. For example, our pricing and catalog service is initially designed for digital content but should easily extend (ideally without any code) to support pricing for other services we may want to offer in future.

(d) Functional-only: An important part of our philosophy is to abstract out common non-functional requirements across services and move them out of the service, wherever possible.

Architectural Decisions and Overview

We took some key decisions, while formulating our reference architecture. We list some of them below.

Structured Audit logs in all services — We use a structured audit log for all calls made to every service at SplashLearn. The structured log uses avro format and is implemented in asynchronous fashion by all services. We use Kafka as the channel to push logs from our services to a common Logging Service, which processes logs for all services in a unified fashion.
Common Authentication — We use a common authentication service that attaches a user identified for every authenticated request. Hence, services do not implement any authentication.
Service-based authorization — We have decided to have authorization implemented by all services. This call was taken because the authorization requirements across services was very diverse and abstracting that into a common set of requirements was getting too complicated.
Throttling at API Gateway — We have decided to use an API gateway for throttling as well as implementing any routing rules. This leaves the services free to focus only on their functional logic.

Reference Architecture

Common Services

We have designed a few common services that are used by all other services for implementing common capabilities.

API Gateway

We evaluated a lot of popular API Gateway offerings from various vendors including AWS API Gateway, Kong, Tyk etc. on the following parameters to suit our needs.

Ease of Scaling.
Ease of setup.
Functionality offered.
Ease of Management.
Cost.

Based on above parameters, We started testing out with Kong and AWS API. We did a simple experiment.

We took an existing Splash Learn API that is serverless in nature and put it behind both the API Gateways.

Kong was deployed on a t3.large EC2 Instance using docker. Then we used a load testing tool to send requests to both API Gateway to fetch the result from Splash Learn API.

Here are the results with 1000 RPS.

Based on the comparison, there is not much difference in the two offerings. Due to ease of setup and maintenance we finalized on using AWS Offering.

AWS provides a fully managed API Gateway offering called Amazon API Gateway. AWS being our primary Cloud vendor, using a managed offering from the same vendor provided us flexibility and time saving without managing deployment ourselves. Like all popular API Gateway, it offers complete API Management tools including authentication, throttling, monitoring, version management and many more. Being a managed service, It automatically handles scaling up and down in case there is increase or decrease in traffic.

To setup API gateway we need to configure the routing rules and integration endpoints.

Since we need to consult the authentication service for the validity of the session, we also need to configure a lambda function which contains our custom authentication logic.

All the Requests coming to Splash learn backend API’s will hit the API Gateway first, We have attached an authorizer lambda function that will call our Auth service to check if the incoming request is authenticated or not.

In case the authorizer returns a valid response, the API gateway will call the downstream service and return the appropriate response.

If the authorizer fails to extract the credentials or if the credentials are invalid, the API gateway will directly return an error response to the client.

Authentication Service

Acts as a centralized service to handle the authentication needs of the Splash Learn Web, Mobile Applications and Other services.

Each request that arrives at the gateway will have a session context in the form of encrypted cookies or tokens. Cookies and tokens will be extracted from the request and call will be made for the authentication service endpoint for verification of the same.

In case the verification is successful, API gateway will serve the required resources to the client, otherwise it will be forbidden. Following diagram represents the same.

In addition to verifying the encrypted session context from the cookie and token, we perform additional validations based on the session data stored in cache and db.

Logging Service :

A centralized service to handle the logs generated from the different services. As we add more and more services to serve the required functionality on the SplashLearn platform. We need an effective mechanism to be able to monitor and debug various issues that come up in various parts of the application. This will help in quick debugging of issues and pinpoint the exact cause of the error.

Logging service processes the logs received from various services and stores it in elastic search.

What data to log ?

Logs are generally collected for monitoring and debugging purposes. However, at SplashLearn, we use our structured audit logs for extracting insights about consumer behavior and how well various design choices are performing. Hence, with every API call, we log

Analytics Parameters
Request parameters that allow us to segment the user (e.g., user segment, campaign source, platform)
Response characteristics (user plan type, cache hit/miss, payload size, response time)
Debugging parameters
Timestamp of the request.
Who is responsible for any error ? — Log details about the time spent by the request getting data from db, downstream services etc.
Release build ID
Correlation parameters
Visitor and visit ID
Execution Context ID (or a correlation ID that is common across all services that together fulfil a user request and allows us to stitch a request across services)

Logging Structure

Since logging service will receive logs from every service at Splash Learn, We have a defined structure for each log entry that is pushed to the logging service, so that we can query for specific data or implement some automation for monitoring.

We have decided to use AVRO schema, as it allows for schema evolution.

Each log will have the following context, which will help us identify the issues faster.

EC ID : Execution context Id for the request. This helps to group related log events that occurred as a part of request, tracing the journey of request and what was the initial failure point.

Environment : For which environment, the error has occurred, like production or staging.

Service name : Identifier of the service, where the log was generated.

User Agent : Details about the origin of the request identifying the application, operating system, vendor, and/or version of the requesting client.

Error : Information related to the actual error with code, message and stack trace.

Here is simplified representation of what a single log might look like in JSON

{

“ec_id”: “715eec8f-fefc-45e2-a352–95aa389ddb8f”

“environment”: “prod”,

“service_name”: “ReportingService”,

“created_at:” “2021–02–01T12:09Z”,

“user_agent”: “Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.146 Safari/537.36”,

“error”:{

“error_code”: 500,

“error_message”: “IllegalArgumentException….”,

“stack_trace”: “Caused by: java.lang.IllegalArgumentException: Could not resolve placeholder….”

}

Architecture