Efficient single sign-off and distributed session control with Redis sets
When the Bionexo team made the decision to start the dreaded Full Platform Rewrite® about five years ago, it soon became clear, according to the spirit of the times, that the path to be taken was to employ microservices principles to build the new architecture.
Among many other powerful ideas, microservices suggest "verticals” to approach the separation of concerns, with each service taking full responsibility for a bounded context, resulting in loose coupling between services and high cohesion within each service. With that being put to practice, by the time I got here (in the end of 2015), we already had a few full-blown services, each one working as an application with its own database, user interface etc.
Although, to the end user, our services look like one (as they have the same UI/UX design), they actually have individual domain names like login.bio, platform.bio, and dashboard.bio, which are of course bought independently. That means each application not only has its own session handling mechanism (cookie-only, Redis etc.), but is actually forced into having it by the fact that there are no shared cookies between domains. That restriction is obviously good, as it forces decoupling, but also comes with the challenge of providing a single authentication mechanism for all of these services.
Well… so far, nothing particularly challenging, as OpenID Connect exists precisely to solve that problem. Actually, we already had BioID, our authentication service, providing authentication on top of OAuth from the beginning. Single sign on was pretty easy to implement, then.
Now, with single sign on, we had a situation where, each time a user clicked on a link leading to an application X for the first time, that application would go through the OAuth (actually, OpenID Connect) flow to make sure the user is already logged in to BioID and to create its own session. The user could have active sessions with some of our services, but not with others. Besides, each service is responsible for its own sessions.
How does single sign-off work in that scenario? Suppose the user logs out of BioID. It can actually do that from within Service A, for example, so we could make Service A destroy its own session too. However, how do I make Service D do the same, considering BioID and Service A don't even know Service D exists and that session is actually stored in a cookie? I could notify Service B, which would reach to its session store, but Service D can't reach to the users' (possibly offline) device!
Sure, we could implement OIDC Session Management, as you may be thinking. However, BioID is a Rails application and we didn't want dozens of services hitting it at every request to validate sessions (not because BioID can't scale, but because it would be expensive to do so). We also didn't want to rewrite BioID in another programming language, as that wouldn't be practical.
Taking these constraints into account, our first idea was to use a centralized, shared Redis database, with a globally unique session ID for each user's session in the company. BioID could easily generate that session ID as soon as the user would sign in. But then came other requirements, the main one being that no user can have more than N concurrent active sessions. If a user signs in from somewhere else while having N active sessions, one of the old ones must be dropped. Of course we can still do everything with Redis alone, but now we had a fair amount of business logic to be implemented somewhere (preferably not duplicated in all of our services!).
That's when we decided to build Heimdall: an internal, special-purpose service that is our "global" session control solution (not a session manager, please). Being a lightweight service implemented in Java, Heimdall can easily be hit all the time by all of our services, using tiny infrastructure. So, what does Heimdall do?
- Stores a session ID for each user session within Bionexo's platform as a whole.
- Can store arbitrary JSON data in the "global" sessions, if necessary. This is a good option to have, though we certainly don't intend to abuse it (so far, it was only used in a temporary experiment that collected and compared browser fingerprints from different services).
- Makes sure each user has at most N simultaneous sessions within the Bionexo platform as a whole.
- Allows dropping of a set of sessions when necessary (all sessions for a specific client company, all sessions in the platform etc.). We actually use that functionality in a number of cases.
- Can expire and extend sessions, according to usage, providing an API that can be queried for checking session validity.
- Stores temporary session tokens, used to retrieve sessions (more on that below).
Since every session is represented by a UUID, the sessions for each user are stored in a Redis set, with the user ID as the key to access it. This makes sure we are storing only unique sessions IDs (what wouldn't be guaranteed with a list and would be overkill with a hash).
Now, how do we "share" the session IDs between services in the first place? Since we use OAuth, we could just transfer the session ID in the redirect URI, along with the authorization code. But we didn’t want the session ID appearing in query strings, as that would make it easier to steal sessions.
To avoid transferring the session ID to the front end, we employed a short-lived token (stored in Heimdall for 60 seconds), used as a key in Redis to retrieve the actual session ID. That token can be transferred in query strings and be used by our services to grab the session ID from Heimdall.
The authentication and session ID sharing flow is represented below. To simplify implementation by the many different services, we developed client libraries for several programming languages, using retries, timeouts, and several fallback strategies if Heimdall becomes unavailable (what I don't believe has ever happened).
If you have any questions or want to share other solutions to the problems addressed here, feel free to express yourself in the comments below.