Distributed state.. is hard

I had always planned to use AccessPass to scale across many servers in an autoscale environment. If you do not know what AccessPass is check out my earlier post, but in short it is a full authentication solution based on revokable tokens, without having to reach to a external database.

In the earliest versions of AccessPass all of its token state was managed in a GenServer. Thats fine and all except that solution would scale horribly as each request to verify a token would be serialized through a GenServer. Good luck handling token verification on your SAAS that just blew up on reddit. In the next iteration of changes to AccessPass I changed the token state to be maintained in local ETS tables. This is a much better approach to scaling to many users as the requests would no longer be serialized through a GenServer.

…but I still had not accomplished the task I set out to in the first place. A distributed authentication solution based on tokens that are instantly revokable. For those not familiar with token based authentication I should probably mention why that is such a hard task.

Why is fast instant revokable tokens hard to do?

Normal token based solutions usually use some sort of token generation library like Guardian. With this you can ensure that a token given to your server is the same token it gave out in the first place, unchanged. You would also set an expiration on the token so that it eventually is invalid and a user would need to request a new token. This is VERY fast because there is no need to call a database for authentication and it works across servers as long as they share the same secret_key for Guardian.

The issue for most people with a solution like this is you do not get to revoke tokens after they are already assigned. From Guardians github:

When using tokens, depending on the type of token you use, nothing may happen by default when you revoke a token.

For example, JWT tokens by default are not tracked by the application. The fact that they are signed with the correct secret and are not expired is usually how validation of if a token is active or not. Depending on your use-case this may not be enough for your application needs. If you need to track and revoke individual tokens, you may need to use something like GuardianDb

When using GuardianDb you are now back to taking a speed hit for hitting a database. GuardianDb is an absolutely perfect solution for most people but I wanted to try and use some elixir things to try and solve all of these issues.

Enter AccessPass.

  1. AccessPass is token based(kind of, the token is actually the key to its data)
  2. AccessPass is fast as it does not need to hit an external database.
  3. AccessPass is instantly revokable as it is based on ETS.
  4. AccessPass uses refresh_token/access_token based setup.

…but as stated above AccessPass could still not be used across servers. A must in any modern server setup.

Solution and issues along the way:

Obviously because I am using the awesome language Elixir I knew I could find a solution to cross node state build in.

…Enter Mnesia.

Mnesia is a distributed version of ETS essentially. Sounds perfect right. I should just be able to switch everything in AccessPass over that is currently stored in ETS to Mnesia. Well that is not quite how things ended up working out. Turns out Mnesia is awesome(don’t let anyone tell you otherwise) but it is also quite old in it’s ways.

Mnesia is not made out of the box to handle dynamic membership of nodes. That sucks because there goes my autoscaling plans. Well it turns out that is not quite true. Mnesia has everything needed to handle dynamic node membership it just doesn’t do it out of the box.

I decided to build a library to handle dynamic node membership so when servers are added to a cluster of elixir servers they will automatically join the Mnesia schema and copy any existing tables. The library is called syncm and the following are a couple snippets of how the internals work:

The above is how SyncM handles SyncM.start(). This is in a way equivalent to starting a mnesia schema and starting mnesia. :check_nodes_and_join receives a list of all current connected nodes in Node.list. If none are there it knows it is the first of its network mesh and starts up mnesia/creates the schema. Though if there are nodes in its Node.list it will then use multi_call to call the first node in the list requesting to join their node mesh for Mnesia.

The above will handle a nodes request to join. It will take the remote_node and add it to the Mnesia schema using change_config. Then it will copy all existing tables from the node that was requested and “give them” to the requester.

Using the couple tidbits above you can call SyncM.start() in any application and it will start syncing Mnesia across nodes that dynamically join the node mesh. Pretty cool right!

So part one of my issue is complete. I have a more modern use for Mnesia so that I can use it with dynamically adding servers.

The next issue I ran into is Mnesia, even though it has ETS under the hood, does not bind 1 to 1 to most ETS calls. I did not want to force users to use Mnesia if they had distribution configured off. In that case I wanted to use normal ETS. I wanted the following to return the same value regardless if it was Mnesia or ETS under the hood.

  • insert
  • new
  • delete
  • match_object
  • match_delete

You can take a look in this folder to see the differences in calls between ETS and Mnesia.

Testing it out:

I don’t want you to take my word for it so here is a quick way to show that everything is working in a distributed way.

First set up AccessPass like I went over in a previous post with a new phoenix project. Make sure you have both the distributed config and sync config set to true.

add the above in the base of your phoenix application.

run

PORT=4001 iex — name n1@127.0.0.1 — erl “-config sys.config” -S mix phoenix.server

in one terminal window and

PORT=4002 iex — name n2@127.0.0.1 — erl “-config sys.config” -S mix phoenix.server

in another terminal window.

You can now call http://localhost:4001/register on one port and then call http://localhost:4002/check with the token returned by register. You will notice that your token worked on the second instance of phoenix running.

Yay, how exciting. Now AccessPass is a distributed token based solution that hits all the checkmarks for me.

Here is the repo to star/follow. Please do throw any issues/pull requests as I plan to continue to improve the library.

Also check out this blog post by Jonathan Merriweather for some additional information. That post was quite a help this past week.

Until next time. ✌️

--

--