How Kroo maintains sanity in distributed systems — Part 1
This article has been co-authored with Daniel Zurawski. Keep an eye on his page for the upcoming Part 2.
Here at Kroo we are on a mission to create the world’s greatest social bank. To tackle this challenge, we have opted to use microservices and we have a lot of them, adding roughly one new service every week. As per the classical definition of a microservice, each service manages its own business domain, data, and storage.
Many organisations understand the benefits that microservices can provide. They force you to think more precisely about the domain you are modelling, especially if you are strict with the principle of only allowing any given service to have one area of responsibility. Another important benefit is that they allow individual teams the freedom to manage their own continuous integration and deployment in isolation from other teams.
Despite this, many organisations struggle to effectively make the transition from a monolith to microservices. Common anti-patterns include:
- Data “leaking” out of the service which is authoritative for it and growing stale in the consumers that have redundantly stored it
- Services not being adequately prepared for new requirements, leading to an ever-increasing complexity
We will have a look at why this is and what we can do to fix it but to give us some context, let’s first review our understanding of RESTful services.
Understanding the Richardson Maturity Model
The Richardson Maturity Model describes 4 levels of maturity that an API can go through as it becomes RESTful. Let’s look at an example of a blog that stores its articles in a microservice called article-service
. At the penultimate level, the creator(s) of the API have embraced resources and HTTP verbs to act on those resources by exposing endpoints such as:
This is all well and good but to really leverage the power of REST, you must take this one step further and provide Hypermedia Controls in your API, which is the definition of the highest level. In practice, this means that the API provides clear ways to traverse and act upon the resources it exposes. For example, given that I have found the articles resource /articles/123
, how do I add a new comment to it?
Introducing Hypertext Application Language (HAL)
So what is Hypertext Application Language? Hypertext is the simultaneous presentation of information and the controls to branch out and interact with related information.
One documented way of implementing this abstraction for JSON APIs is captured in the following Internet-Draft. I suggest that you skim through the draft but in a nutshell, it describes how to define your resources. Specifically, it specifies that all resources should have a first-level property called _links
in which we can find associated resources and/or operations. In the blog example described above, the /articles/123
resource would look like this:
Let’s take a closer look at these links described in this resource:
- The self-link is probably self-explanatory. It provides a way of coming back to the resource you are currently inspecting at a later time.
- The comments-link is more interesting, it will take you to the comments of the article. The reason this is so powerful is that we have now removed the guesswork of figuring out where to go to find the comments. Furthermore, we now have the freedom to change the underlying href for the comments if we, for example, decide to break out a separate
comment-service
that will hold the comments instead. When talking about distributed systems, this is referred to as location transparency. - The final link is the discovery-link, where does that take us?
The discovery resource
At the core of the API is the discovery resource, which provides all the ways a client might want to interact with the resources in the service. This makes our entire microservice discoverable. The following example illustrates an output from the root URL of our article-service
:
Resource locations are agnostic from the HTTP verbs supported. You can assume that you will be able to perform standard REST operations, leveraging verbs such as GET, POST, DELETE, etc.
It is the responsibility of the backend to notify the consumer of any such constraints on a resource. This should be implemented in accordance with the RFC, leveraging existing HTTP error codes, such as 405 — Method Not Allowed.
Since the discovery-resource is so essential when traversing the API, we put a link to it in all sub-resources so that the user of the API always has a way of returning to the root of the service and maintain their conversation with the API without getting stuck on a leaf node.
Communicating with URLs
Let’s expand further on the example of actions for a blog article. From the /articles/123
resource described above, we can POST to the comments-link to create a new comment. One common question that people encounter at this stage is “How do I describe some associated resource when I create a comment?” Assume that we want to show the author of the comment underneath it, like this:
This means that the comment resource needs to have knowledge of the comment author, which is a separate resource. The typical way of doing this is to pass the ID of the author when creating the comment so that we can do a lookup of the author resource which contains the name:
The problem arises when trying to answer the question “How do I find the author resource?” Let’s assume that the user that is referenced by the authorId resides in a user-service
. If the author is only referenced by an ID, every consumer of the API must implicitly know that there is such a thing as a user-service
and worse yet, where it is located. Typically this is handled by a sprawling web of configuration and implicit knowledge of the resource paths within those services. How can we do better?
Let’s communicate with the full URLs instead! If you instead reference the author by its fully qualified href in the POST request it is completely trivial to look up the resource when needed.
All you have to do is perform a GET to the URL and there you go, you have all the information you need about the author and you’re guaranteed that the information is always up to date because you’re fetching it directly from the source of truth. This solution isn’t completely devoid of challenges either but it is still a significant improvement. In Part 2 we will explore some of the challenges that this model introduces, as well as potential solutions for them.
What if we have different types of authors?
Imagine now that we have two different types of authors for comments in our system. There are regular users defined in the user-service
and there are admins that are defined in the admin-service
, both of whom are free to comment on blog articles. In the naive model where one is passing around the ID as a way of identifying the author, a commonly occurring model is that the API adds another field, a type. In this model the POST request might look like this:
The problem here is that we have now massively increased the complexity. All systems that will need to do a lookup of this author now need to know all of the possible author types and where they are located. Surely there’s a better way?
Fortunately, there is! The model where we communicate with the URLs solves this problem for you immediately. Imagine that the POST request instead looks like this:
Now the consumers no longer need to know that there are different types of authors and we’ve effectively achieved polymorphism across microservices. It is easy to validate that the author resource that is being passed fulfils some given interface, for example, that it has a “name” property. Simply follow the link and check, and if it does then it can be accepted with no additional handling required to support new types. For completeness, let’s see how this comment resource would look like and specifically how it would display its associated resources:
Conclusion
Hopefully, this article has helped illustrate why Hypermedia Controls are so useful when designing distributed systems using RESTful microservices. In Part 2 we will dig into concrete examples of how we have applied these theories at Kroo and how it has helped maintain sanity as our platform grows.