Keycloak for securing production systems with multiple micro-services
In this article, I won’t be diving deep into installing, configuring and integrating keycloak with a specific programming language, rather will be focusing on the whole idea of having an Identity and Access Management Platform as a formidable security feature by your side.
Firstly, let me start by mentioning that you should not need Keycloak for small, single service based application with minimal scaling. You can go with other language specific token management libraries(e.g. implementing jwt), backed by simple user databases. Keycloak, or any other full fledged solutions(like Auth0), must be used when your solution comprises multiple services/applications working independently, like in a microservice environment.
(Skip if you already understand SSO)One of the important components to consider before writing an authentication and authorization layer is to answer the question — do you need a Single Sign On solution? To understand Single Sign On(SSO), let us take the example of google.com. When you sign in to your gmail account(assuming you have an account), firstly you will notice that the url changes from gmail.com to accounts.google.com, where you are asked to enter your credentials. After you log in successfully, you are redirected to gmail.com as intended, where you maybe send some mails etc.. Suddenly, a colleague asks you to send him/her a document which is fairly large in size(say 1Gib). You being a smart individual, decide not to attach it in an email, rather upload it to some cloud storage, and share with the other individual that document. You decide you will do this via google drive. Now, when you enter drive.google.com in your browser’s url tab, you would expect drive.google.com to verify your identity by asking you your credentials, but to your surprise you find yourself logged in without having to authenticate yourself. For a moment you would wonder if google, or maybe your browser is keeping track of your passwords and auto entering them on similar websites; but as you come to your senses realizing that google won’t do anything that stupid duh! And you seem to enjoy this magical little feature as a convenience.
What really happened was that when you first signed in to accounts.google.com, certain tokens were set in your browser cookies. These cookies are readable by all urls with common domain google.com. So when you visit any other site which is a subdomain of google.com, you will see that you don’t need to verify your identity again. This is achieved with the help of SSO! Single Sign On is a paradigm of the authentication world, where your identity is verified by a central entity only once, and others follow.
Now that we have all heard a good story with familiar experiences, lets take a graphical view of actual systems before getting into more details.
We can see in the above diagram, that the user from its browser is trying to access 3 different applications, which belong to the same software backend. The user initially signs in to the software by getting redirected to a different authentication application(may be in a different domain also), which is returns back a authorization code to the client sitting on application 1/2/3(where the user actually wants access). These sitting at application 1/2/3 exchange this code for access tokens from the auth server, on behalf of the user and also in turn verifying their identity(else auth server would not have sent a code in the first place). Any subsequent calls made to the applications involves this access token passed by browser(via cookies, headers etc., whichever medium is suitable), and verified by the client sitting on the backend applications(connected to the auth server), hence validating the identity of the requesting party(browser in this case).
Lets dive deeper, taking an example of a story sharing website backed by a microservice backend architecture. Lets say our website has 3 individual service components(apart from the frontend) namely:
- Search Engine
- Create/Read/Delete Stories
- Image Server
Now, look at the following diagram to understand what it would be like to host this website with the backend behind an ingress(a path based load balancer)
- NOTE: Read about load balancers, in depth understanding is not required to understand the following
In the above diagram, we can see that the browser is logging in to the /auth endpoint(which hosts keycloak), gets the access token, and passes it to every subsequent request to the other /story,/search,/image endpoints. These endpoints in return verify the token via keycloak client adapters, and in case the token is valid, allow them to actually proceed with processing the request. This is one example of how SSO helps ease our lives.
Before we proceed, I would like to clarify some points and explain a few of them. The flow described here is browser flow, where the client sitting on the server redirects the user to the identity provider, gets back an authorization code, exchanges for an access token and returns to the browser, which is used in all subsequent calls. This is more specifically called OAuth protocol, and the flow described above is the Authorization Code flow. These are other flows also, out of which I will be discussing one called the implicit flow. I should also point out here that authorization can also be included in the verification process, to check for user permissions alongside identification. I will discuss more on authorization later.
- NOTE: Please read more on OAuth to understand the token flows in detail.
We are now at an interesting junction of our article, where we have set up a trustworthy authentication/authorization from user side. But what about our server side? Even our servers may be dependent on each other for processing! In our example, lets introduce another microservice which will be dependent on one of the other microservices. Lets build a plagiarism detection feature into our application. Now, for plagiarism checker to work, we may need the combined abilities of story(for read) and search microservices, leaving only the text processing part to the plagiarism detection microservice. Refer the following diagram:
You can see in the above diagram that Plagiarism Detection server passes tokens to both Story and Search servers, without which both of the servers would return an unauthenticated error code. But where will this token come from? Think about it, this is a server running in the backend without any user interaction. So based on the above browser flow, there is no user to actually sign in to the Keycloak server and initiating the first token exchange!
This calls for another type of OAuth flow(as I mentioned earlier): the Implicit Flow. You can see the arrows above the Plagiarism Detection server exchanging a token via the client. This is allowed by Keycloak’s confidential client, which allows direct exchange of client credentials for service account dedicated to this client. So, now we have a client adapter sitting at a server, fetching access tokens from Keycloak based on its client credentials(which is actually linked to the service account of the client). The permissions on these access tokens are the same as that of the service account of the client. And these tokens are evaluated by the other servers(in this case Story and Search servers) just like any other access token with certain permissions.
A point to note here is that as we have set up this entire thing, we have essentially established a No-Trust-Policy, which is very crucial for a production system, and prevents calls from unintended services, even if they are part of the same system. Read about token audiences to restrict tokens not intended for that client.
Keycloak has client adapters for various languages [https://www.keycloak.org/docs/latest/securing_apps/#openid-connect-3]. For languages that do not support official adapters or for extra functionality over present adapters, explore OpenId Connect apis and then check Keycloak apis, which follows the norms. Build the additional functionality based on apis which are quite simple to use once you know what you are doing. Keeping track of active tokens and sessions(refresh tokens) are really important. The adapters manage the cache to an extent, but I would really recommend a custom solution to implement token cache at client side(note client here is w.r.t. keycloak server, i.e. client can be story/search server, or the ui also).
A caution to exercise is token management in case of implicit flows. Because servers can keep fetching new tokens for every call made to another keycloak protected server, it is important to keep track of active tokens, refresh tokens and not fetch tokens unnecessarily.
Hope this article gives you a fair idea of how to use Keycloak with microservices. I will write some followup articles on setting up Keycloak, its configurations and in depth exploration of it’s capabilities:)