Mind-mapping Micro-services Design Patterns — Part 2

15 min readSep 21, 2020

In the previous article, I looked at the considerations to design, deploy, and access microservice-based applications. How good would be an application that keeps failing and is unable to stand the test of time?

Once the application is created, it needs to be maintained, scaled and made secure. Let’s look at the options available to make our application more robust and battle-ready.

1. What is happening within the services?

After the services are deployed and used, they need to be observed to see what is happening within the entire system and ensure everything is going fine. Different pieces/functions of the applications are being performed in individual services and any request/operation may go through more than one service at a time. We need to gather the observability signals which can be queried and analysed later.

The most common way to get a trace of the flow within an application is to log the sequence of code steps performed. Now to get a record of the flow for the entire operation, this sequence needs to be recorded across services for a particular request i.e. we need to aggregate the logs generated by different services. In the case of distributed systems; all services need to be able to write the logs into a common location or a central logging system where the logs are aggregated.

While logs are more detailed and can be general-purpose statements of flow; error situations throw exceptions in the system which become most relevant to diagnose problems. Exception tracking becomes important where we capture the error message and the entire stack trace of the error from the service instance that observed the issue. This is true for traditional applications as well, however, for micro-services, the exception needs to be logged in a central service and de-duplicated to avoid noise.

While both the above mechanisms help in getting traces from all the services in the system; we would need to know what happened to a particular request through all the help. To be able to differentiate the logs of one request from another, we need to instrument the services code to add a unique ID for each external request and this request ID should be passed on all downstream services to include this ID in the log messages for this request operations. This enables true distributed tracing over and above the aggregation mechanisms mentioned above.

The user performed operations also need to be tracked in addition to the service logs. User behavior is critical to be tracked as an audit log. Recall from the previous post, that if the system is designed as domain entities; the actions on the entities could be emitted as domain events and collated as event sourcing. In that scenario, event sourcing can also serve as an audit log.

What is happening within the microservices?

2. Are the services behaving well?

Now that we have information about what is happening in the system, we can identify what we should care about most to keep a check on the health and performance of the system. Based on the information gathered using observability techniques, we can determine the expected levels on certain aspects of the system and check if those thresholds get breached. This is the monitoring aspect of the system.

We need to be able to detect whether a running service instance can handle requests. To assess the condition or the health of the service, we need the service to expose an API endpoint that can return the health, referred to as Health Check API. This API endpoint can report the health of the network connections to the system, the physical status of the host itself, or some specific aspect of the application running on the host. This API endpoint could be called by the service registry or the load balancer before calling the service or by a dedicated monitoring service itself.

Another aspect of monitoring relates to knowing whether the expected levels of the service are being met. This requires collecting data and application metrics about individual operations. The metrics can be further aggregated for reporting and alerting. A metric service can collect data from the application services using either the pull model or the push model.

3. How to make the services more resilient?

While the user may take some action when a failure occurs, it would be much better if the services could be proactively made more resilient to be able to gracefully handle the error and/or recover from the failure themselves.

The first thing that we would like to do it to limit the blast radius of any failure and not let failure in service(s) take down our entire application.

Bulkhead

In general terms, a ship has bulkheads or partitions that prevent the entire ship from drowning by letting the water fill in only in the damaged sections. Similarly, this pattern isolates the services into pools, so if a few of them fail, it only impacts that pool and the application can continue to run via the other pool of services. Service instances could be partitioned based on consumer load, availability requirements, or business needs. It would be a good idea to partition the resources like connection pools along with the services. Also, VMs and containers offer a good deployment mechanism when partitioning services like this.

This mechanism can be combined with other patterns to make the services handle faults in a more sophisticated manner.

A lot of times, the errors could be transient. Instead of isolating the failed service or its pool at first failure, the stability of the application can be improved by adding a Retry pattern. When an application detects a failure from service; it could use one of the following strategies to rule out a transient failure

Retry — This is generally helpful if the failure detected is unusual or rare. Immediately retrying the same operation on the same service is likely to result in success
Retry after delay — This is generally helpful if the failure detected is due to an issue that may likely get resolved in some time like connectivity issues or busy services. The application should wait for a suitable (potentially configurable) amount of time and retry the operation.
Cancel — While this may not sound like a logical retry mechanism, it is helpful in cases where the failure doesn’t seem to be transient or unlikely to succeed on a retry, e.g. login failure due to incorrect password. In such cases, it is best to immediately cancel the operation and report an exception

Depending on the service and the failure, the application could specify the retry strategy, the number of retry attempts, and/or the time gap between retries. When changing business data, the retry operations should use the Idempotency pattern to avoid the same action being performed twice.

Beyond a certain number of attempts, the retry mechanism may impact the responsiveness of the application. If the request fails for a certain number of times, it is best to take that service/pool of services out of operation and prevent the requests from going to that service. But how do we know when to bring that service back into action? The retry pattern alone cannot do that.

An application can combine the retry pattern with another pattern called the Circuit Breaker pattern. This pattern is different from the retry pattern where it prevents the application from trying to perform an operation that is likely to fail. This is to not waste the CPU cycles when the application failure is likely to take some time to get fixed. This pattern draws its name from the electrical switch that protects an electrical circuit from damage caused by a fault in the circuit by interrupting the flow of current. Similarly, the circuit breaker pattern means protecting the service with a circuit breaker object that acts as a proxy for the service to be protected and monitors the failure. Once the failures reach a certain threshold, the circuit breaker trips, and further calls to the operation result in an error. The proxy object waits for a timeout period (the logic is customizable) and beyond that point, if the failure appears to be resolved, it will allow the application to invoke the operation.

Hence we can use the retry pattern to invoke the operations through the circuit breaker pattern and have the retry logic be sensitive to the circuit breaker exceptions where retry attempts can be dropped if the circuit breaker indicates that failure isn’t transient. Combining these two patterns provides a comprehensive approach to finding faults.

Another scenario could be where a service is running properly and a flood of requests could take that service down during peak load. To make the service resilient to such failures, we need to control the burst of requests coming to the service and convert it to a manageable load for the service. To do this, we can use the Queue-Based Load Levelling pattern where we use a queue that can take the burst of messages from the calling tasks/applications and the service can pull the messages from the queue at a pace it can process them. This ensures maximum availability as the application can continue to post messages to the queue even if the service isn’t available or isn’t processing messages. The number of queues and the number of services can be varied as per the demands.

Too many services in play can cause conflicts and contention of resources, especially when there could be multiple instances of the same service. This configuration cannot be avoided (this is what autoscaling provides), however, some conflicts can be handled by selecting one of the instances as the leader so it can assume responsibility for managing the functioning and shared resource access of other instances. It could additionally play the role of an aggregator of complex processing by other instances if needed. This is the Leader election pattern. The process of electing the leader needs to be resilient to failures. Using a single dedicated process as a leader is a straightforward process, however common leader election algorithms like the Bully algorithm or Ring algorithm can be used. The instances may need to keep monitoring the health/heartbeat of the elected leader. If the designated leader terminates unexpectedly (e.g. autoscaling down of the system) or becomes disconnected from the system (e.g. network failures), it’s important to detect a new leader.

4.How to make the services perform well & scale?

Keeping the services up and running isn’t sufficient. They need to be responsive to requests, perform operations within the desired time, and be able to handle an increased load without impacting the former. Scalability needs to be handled at all levels — compute instances, data store, messaging infrastructure, load balancers, etc otherwise any component that doesn’t scale can bring the entire system down.

Sharding

When storing and accessing large volumes of data, a single server can face limitations due to storage space, computing power, or network bandwidth. The data should be distributed across partitions called shards. Each shard follows the same schema but holds a distinct set of data, so it is like a complete store in its own right. The partitioning logic should be based on the application and/or queries logic to help improve the performance of the queries but they can always access data across shards. The shard is defined based on some attributes of data which is the shard key. There are 3 common strategies to use when selecting the shard key and distributing data across shards

Lookup — In this strategy, the sharing logic creates a map of routes for requests to reach the data. A common way to store data in a multi-tenant situation would be to store data for each tenant in the same shard. Each shard could be physical storage or could map to a virtual partitioning that could be changed easily without impacting the application.
Range — This strategy groups related items in a shard, e.g. orders grouped by month, users located in the same timezone or region, etc. This allows a query to be able to return the related data from a single shard. However, this strategy does not allow for proper balancing of data across shards.
Hash — This strategy allows us to evenly distribute the load across shards and thereby avoid dis-balanced shards or shards with a lot of loads. The sharding logic uses a hash function to identify the appropriate shard to store the data and/or retrieve the data.

Cache-Aside

While the data store can be scaled using shards and can help the same go beyond a physical store; performance can be further improved if frequently accessed data is stored within the application itself into a cache. But how do we keep the cache data completely consistent with the data in the store? This needs the caching systems to implement read-through and write-through/write-behind operations. As soon as the data is written to the cache, it is automatically written to the store as well. However, if the data is changed by an external process, then consistency cannot be ensured. Common considerations while implementing this pattern are keeping the cache duration limited and similar to the one needed by the application, evicting the items from the cache, data consistency, and keeping the cache in-memory.

Index Table

In addition to saving the frequently accessed data, it would help to create an index on the fields that are frequently accessed. While relational databases support secondary index out of the box, such a support isn’t available in the NoSQL databases. The same can be emulated for them by creating an index table, that keeps the secondary index and the location for each of the items, sorted by the secondary index.

This can be combined with sharding to improve performance, especially when the Hash approach is used to shard the data, by keeping the location of the shard against the secondary key.

Materialized Views

If the source data is not in a format suited for desired queries result set or the query needs joining tables or combining data entities, it would improve query efficiency and application performance by generating pre-populated views, called materialized views, over the data in one or more stores. These views might not be fully consistent with the original data though.

If using an Event Sourcing pattern as a store of only the events that modified the data, all the events need to be examined to determine the current state. In such a case, using a materialized view to save this current state of the entity would be helpful. Any new events will cause the view to be updated.

Priority Queue Pattern

Message Queue applications can support multiple consumers which work concurrently on the messages received on the same channel. While the system can be scaled up and down by increasing or decreasing the number of consumers, that still handles all the messages at the same priority level and processes them in FIFO order. It may help applications to improve their performance by running some relatively critical tasks faster than others. That can be achieved by associating a priority with the requests/messages and have the message queue process them in order of priority. An alternative is to have different queues for different priorities and have a separate/larger pool of consumers associated with higher priority queues.

Throttling Pattern

The load on applications/services can vary over time, based on the number of active users and/or types of activities being performed. It could lead to situations where there could be a sudden spike in an activity placing an extreme load on resources. While autoscaling is generally enabled in cloud services to handle the increased demand on the services, it may take some time for the new resources to be provisioned to scale up the system. During the sudden spike, the system may exceed the capacity of the resources provisioned by then and perform poorly. In mechanical terms, throttling is controlling the flow of fuel/power to an engine. A similar approach can be applied to microservices where we allow applications to use resources only up to a limit and throttle when that limit is reached. This is in contrast to autoscaling but ideally should follow through it. The two approaches can be combined to help keep the application responsive.

The system should monitor the use of its resources and throttle requests from user(s) if they exceed the threshold. Queue based load levelling and Priority queue are mechanisms that help in implementing throttling.

5.How to keep services secure?

Security is the most critical aspect for any application; irrespective of whether it is a monolith or micro-service, whether it is deployed on-premise or in the cloud. In the case of microservices-based solutions, the importance of security increases as there are multiple points of access, and if they are deployed in the cloud, the importance of securing the services becomes even more.

This is a vast domain in itself and there are several aspects to security, primary ones being who can access the services, what can they access within the services, and how to keep the data safe within the services.

Who can access the services?

To keep the services safe, we need to ensure that only valid users access the service and do only what they are allowed to do.

The services need to verify the identity of the user/requestor and assess if it is authorized to act. A user/requestor can prove its identity by providing its credentials, most common being user name and password. The end-user facing service (as we had seen in the previous post, API gateway would most likely be the first point of entry) would check the validity of the user credentials. As the user accesses other services now, it is an overhead for each service to validate the user credentials. To simplify this, the API gateway authenticates the request and issues the user something called Access Token (the most commonly used format is a JWT token). The user can then present this token to any service it wishes to access and based on that, it can be authorised to operate.

Users may need to work with different applications, possibly with different organisations as well. For authentication purposes, the user will need to have a specific credential stored in each service which could be different for each service as well. This is complex for the user to remember credentials for each service and complex for the services as well as they would need to have complete user management functionality including password management and reminders to reset the password. What if there could be a third service that could store a single credential for each user to use across all services and also perform user and password management! Delegating user authentication to an external provider is called Identity Federation. This third party acts as the Identity Provider (IdP) which generates token containing claims to identify the user. The token could be passed to a Security Token Service (STS) which augments the token based on some predefined rules and this token can then be passed to the application to allow access to the user. This pattern facilitates single sign-on within the application or business-to-business applications.

Some applications handle a lot of sensitive data and need a higher level for protection from malicious attack or safety to be able to perform mission-critical operations. It would be a good idea to add an extra layer of protection that would perform more validation on the incoming request in addition to just authenticating the request. This is called a Gatekeeper and it is used to validate and sanitize the request and upfront reject any requests that are not valid. Adding this additional layer may cause some performance impact, however decoupling the trusted host from the gatekeeper saves the sensitive data from being compromised by unrestricted access. This pattern is like a firewall in a common network topology.

In contrast to the Gatekeeper pattern, there could be scenarios where it may be more desirable to have the clients have direct limited access to a resource e.g. data download requests straight to the data store. This could be in cases where the data is in a remote store or datacenter and it may be cost & performance saving to access the store directly rather than going via the application or concurrent uploads/downloads where it saves the application from handling data transfer. However, will the resource (e.g. data store) itself would be unable to handle authentication and authorisation of the request? The approach to this is to provide the client with a time-limited access for specific predefined operations via a key/token called the Valet Key which the resource can validate and at the same time restrict access to the resource’s public connection. It is possible and recommended to configure the key to have limited access scope and invalidate the key after the operation is complete.

Putting it all together

Mind-mapping all the patterns needed to address the concerns discussed so far:

Mind-mapping Micro-services Design Patterns — Part 2

Written by Ruchi Tayal