Scaling with Hodor, our new authentication system.

Oren Levitzky
Fiverr Tech
Published in
10 min readNov 7, 2021

For a few years, I had this article in mind, I wanted to share the amazing story of how we re-invented our user authentication system. But, just like you binge your favorite TV series, time has passed pretty quickly; Game of Thrones seasons came and went and the momentum has passed. Finishing our last milestone just recently, with a new dedicated authentication microservice, gave me another good reason to write this down.

User authentication is probably the first thing you will want to tackle when starting to build a new application. However, it is also the last thing in which you will want to invest your time, which is the reason why we all tend to use 3rd party libraries that do everything for us with the minimal effort possible.

In this article, I will share the thoughts and progress behind developing Hodor, Fiverr’s authentication system: why it is really necessary, what were our biggest challenges and why we called it Hodor.

Prologue

Before I dive into our journey I want first to go over the definition of user authentication.

User authentication is a security process that covers all of the human-to-computer interactions that require the user to register and log in. In simple words the authentication process asks each user “who are you?” and verifies their response.

There are a few authentication factors for verifying a user identity:

  1. Something the user knows — Password, PIN, captcha.
  2. Something the user has — ID card, mobile phone, physical tokens.
  3. Something the user is or does — Fingerprints, face/voice recognition, DNA.

Authenticating a user usually uses one factor, such as password in login forms, but today most modern applications require multiple factors to make it safer (a.k.a 2FA). For example, receiving a code to your mobile phone.

Obviously, there is much more to know about user authentication and how to implement it, but this should be enough for understanding our journey.

Chapter 1 — Hodor to the rescue

A long long time ago (even before Game of Thrones began), when Fiverr was still a small startup with big dreams and with a little less traffic than we have today, our focus was not on how to authenticate users but how to create the biggest freelancer marketplace there is. This is the beginning of our timeline which will eventually force us to migrate to Hodor.

Until a year ago (2020), Fiverr’s main programming language was Ruby with a very big Rails monolith application. Web application frameworks such as Rails have a lot of pros and cons but the main advantage, as I see it, is to make our lives easier. Together with an open source community (Ruby Gems in our case) our lives become much easier and all we have to do is just to pick the right library and install it.

Now, we all need some sort of user authentication right? This can be really seamless and the only thing you need to do is to pick your favorite authentication library and change its configurations.

The process of choosing the right authentication library had already been done a while before I joined Fiverr, resulting in choosing a well known library named Authlogic. For those who haven’t had the chance working with this library, I will say this — it’s magical. However, magic is not always that good. Why is this library good you ask? First, it handles password creation, hashing, encryption, reset & brute force attacks. It maintains user sessions and expires them when needed. And why is this magic bad? Well, basically because it is magic and we, the developers who use it, don’t really understand what it does and how it can affect us in the future.

When talking about user authentication the main aspect you don’t want to ignore is security. It is arguable if once you choose an open source library you should be familiar with every implementation detail, but as for user authentication you just can’t afford ignoring that.

Authlogic worked with simple token management. That of course is a familiar way of identifying a user and authenticating him/her but it does not guarantee that everything is safe. The way it worked is that every user receives a unique dedicated token which represents his identity on every client-server transaction.

Over the years we wanted to take extra steps for ensuring our users are safe as possible, having the following as our guidance:

  • A token should be time based.
  • A token should be unique and hard to guess.
  • A token should be hashed once stored in the client side.

These goals forced us to keep editing the library as some features were partial but it still was not a valid reason for us to develop a new authentication system which will take a lot of time.

After a few years of amazing growth, and a few months after I joined Fiverr, we faced our first major production issues — DB performance and scale. In our system, Authlogic stored several attributes on the relevant SQL table for managing the user session. This resulted with a few additional SQL commands and indexes which needed to be rebuilt on every updated record, locking each one for a longer period of time. Even though we were using Authlogic with only one particular SQL table, we saw a direct impact on other tables sitting on the same database. As we continued to grow, having more and more users using our platform simultaneously, we started to face DB connectivity timeouts, making our scalability a big problem.

We again altered the relevant code to make sure our DB is free of issues but that was not easy. We worked too hard to fix a ‘magical’ solution to meet our scale. At that time, it was clear that Fiverr needed its own authentication system. We had had enough with magic.

Chapter 2 — Hold the door.

I find naming a really fun, challenging and creative part in any development cycle. Especially when it’s a widely used and a long term system. Luckily, Game of Thrones’ 6th season was about to end with Hodor saving Bran while holding that big door. For a week all I heard was “Hold the door… Hold the door.. Hodor”. And then it came. Holding a door is just like authenticating — you control when it’s valid to open it. So now we have it, Hodor will be our authentication system and will hold the door when validating users.

Before starting the development cycle, we had to figure out on which flows and on what scale Authlogic was being used behind the scenes. We took the first few days just for monitoring to understand the complexity of our monolith application. Just like Bran impersonations, this one wasn’t trivial.

Our plan was roughly:

  1. Develop Hodor in our monolith application — zero to one in the fastest way possible. This was only for web traffic.
  2. Move to a shared library — In order to apply Hodor logic on all of our gateways — web and mobile.
  3. Enable Hodor on mobile app traffic — Same logic, different configuration.
  4. Move to a dedicated microservice — To encapsulate Hodor logic in one place and to decouple it from our monolith application.

At this point, 5 years ago, Fiverr was adopting its microservices architecture (V3) and theoretically we could have jumped straight to point #3. However, the combination of Hodors’ scale (authenticating on every request, remember?) and Fiverrs’ tech stack (Ruby, HTTP) would have resulted in big latencies, slowing our site speed.

The first part, and the most important one in my opinion, of user authentication is to understand how we are going to implement the identification phase of a user and with what security measures so that it will be impossible to hack. We defined our validation process to have the following key aspects:

  • Token uniqueness — To generate tokens with sufficiently high entropy which should be impossible to be guessed.
  • Token expiration — To automatically invalidate inactive sessions.
  • Token integrity — To prevent the token from being modified.
  • Encrypted user information — To identify the user without exposing sensitive information client-side.

Like most other authentication solutions in the industry, we also chose to use JWT as our client token solution. In simple words, JWT gives us a way to store any data we want and sign it with different algorithms.

For each authentication session we create a signed JWT token with a few identification details needed for validation. The signature process (which uses a secret salt) is what makes everything so secure. I must emphasize one thing regarding JWT — the payload is available for anyone to read since it’s just base64 encoded. The idea is to protect who can change/create a token and not who can access it. Make sure you don’t expose any sensitive information in JWTs. Now we have all we need for authenticating our users.

Having the ability to “open the door” (create a new session), “hold the door” (authenticate) and “close the door” (invalidate) was the starting point for our deployment process. Our mindset was to keep the process a bit longer in order to be sure our users were not affected and kept logged in:

  1. Generate Hodor tokens only for new logins — In order to authenticate with Hodor, we first had to generate and store the tokens on the client side. This was just a preparation step and nothing was changed in the old architecture.
  2. Silent authentication and monitoring — Quickly enough, once we had enough Hodor tokens to work with, we added our authentication logic but without taking any actions. We wanted to silently gather information and validate our logic.
  3. Promoting Hodor to be our main authentication logic — After we fixed our issues having positive monitoring insights we started to authenticate with Hodor first while still fallbacking to the old Authlogic system. We had to do so since still most of our users didn’t have Hodor tokens.
  4. Migrating Authlogic users to Hodor — Now that we were using Hodor for new sessions and had enough confidence, we could start migrating millions of our users to Hodor. We added temporary code to generate new Hodor tokens for our active Authlogic sessions until we reached 100% of Hodor traffic. Our users were now controlled by the great Hodor.
  5. Authlogic removal — Probably the most fun part of the process. Once Hodor handled all authentication logic in Fiverr, we could clean up our code and remove all Authlogic related code.

What a journey! We completed the migration in about 2 months and more importantly we didn’t affect any existing flows on our platform. Moreover, we now have a better system which can be changed to suit our future needs, more secure and blazing fast — Each authentication took us only 2ms!

Chapter 3 — The life of Hodor

Just until recently (2021), we had different gateways for web and mobile app traffic. Our next challenge was to align all platforms to use Hodor. To accomplish that, we decided to develop Hodor as a gem, a shared library.

The library development was pretty straightforward as we already had Hodor logic written and working. The main modifications were to make some of the logic configurable and agnostic:

  • How to store our tokens in the client side
  • Where to take user details from
  • What is the expiration period of a token

We first migrated our web traffic to use Hodor as a library and after a few configuration changes we replaced the authentication system for our mobile platform as well. We took the same approach as we did when first migrating to Hodor — silent validations, Hodor as the main authentication system and finally legacy code removal.

Fiverr was now protected by the great Hodor on all platforms. Mission accomplished? Not so fast.

Over the years, new features and security enhancements have been added to Hodor but the main challenge we had was adding another authentication factor (two factor authentication in simple words). We had to add a new state for our sessions and with additional data. We had two choices — either add the data on our JWT payload (accessible for everyone) or create a new hidden state. As this information is more sensitive and we didn’t want to expose it we decided to create a new state (with MongoDB) and connect it to every JWT with storing a new simple hashed identifier. This way, all the needed authentication details will be accessible only by our application.

By now the pros of Hodor as a library were clear — It was easy to modify the code, fast delivery to production and most importantly, no magic or surprises.

We reached the final chapter of our journey and just like Game of Thrones’ final episode, the anticipation was worth it. We, at the Platform core group, came up with a new innovative architecture (V5) which migrates all Fiverr gateways into one. This helped us to develop Hodor as a service. One microservice to authenticate all traffic whatsoever.

As I mentioned earlier, this phase couldn’t be done in the past since our tech stack was too old, not that scaleable and not that stable for handling so many requests per second. What also made it happen was the decision to move from Ruby to non-blocking languages such as Kotlin and Typescript.
We chose Typescript and NestJs as the tech stack for our new microservice, giving us more tools to make our flow better with a less error-prone code (unlike Ruby, Typescript is strongly typed). We took advantage of NodeJs concurrency when authenticating, resulting with a better performance than we had in our Ruby library. Bottom line, the new authentication process takes almost the same time even though we are now making additional network calls.

Our story ends here, for now at least, and even though we didn’t have any ice-spitting dragons I hope you guys had fun. My main takeaway is that it’s ok to start your application with a 3rd party authentication system and with less focus, BUT you should recognise when it’s the best time to build your own ‘Hodor’ system or else it will be too late.

Fiverr is hiring in Tel Aviv and Kyiv. Learn more about us here.

--

--