Event sourcing in practice with Event Store and F#

In this post I am going to present my work and thoughts (it’s very much work in progress) on how I am handling that most mundane thing: password resets.

This article is structured in two parts, the first one introduces Event Sourcing briefly and discusses pros and cons of it, and some specific problems and solutions, some generally applicable. The second part concerns the actual implementation of the password reset system.

Part I

What is Event Sourcing?

Event sourcing is, in the words of Martin Fowler to:

Capture all changes to an application state as a sequence of events.

Events are ordered, meaning that Event 3, then 2 and then 1, will not necessarily result in the same state as Event 1, 2, 3 in that order. Events are always in the past tense, and thus capture things that have already happened (you can read why here, but it has to do with DDD and Ubiquitous Language) — in Slavic languages that would be the perfective.

This means that commands cannot be directly modeled, instead we can model requests. Indeed, we do not know if a request will proceed successfully, but we can be sure it happened, whether its outcome was successful or not.

Good event practice is to store deltas, that is what has changed, such that an event can be undone as well. The example I keep seeing is an accounting system, where a typical entry in a double accounting system could be: subtract 5 from account 1234 and add 5 to account 5432. Unfortunately our events tend to be somewhat more complex and anyway it’s unclear that we always need to know previous state. For example we don’t necessarily need to know all passwords a user ever had, so we will relax this requirement.

Events are append-only, which means that we can never delete an event. This is when I thought, oh my god, this is just like Facebook, the internet never forgets! Be that as it may, it has the great benefit of providing us with an automatic log of everything happening in our system. This is also another reason why command events would be unfortunate, what if they fail?

The events are recorded in streams each of which has a unique name, for example “password-reset” or “password-reset-34”. Creating streams is very cheap and so we can create as many as we would like. The event store has a funny property in that it is allowed to mix different types of events in a single stream, which can then be queried together, which means that the “password-reset-34” stream can be used for all password reset events for user #34.

Why use Event Sourcing Pattern?

Event sourcing presents with several pros and cons and I will try to list some here. First the bad stuff:

  1. Event streams are not straight forward to query.
  2. It makes communication a bit cumbersome. First you have to define the event, think about naming and then you have to wire the whole thing up.
  3. You have to deal with side-effects in some kind of explicit way, if you want to be able to rewind (which probably you do, else event streams don’t make a lot of sense). It seems there are several ways to deal with this and I’m going to discuss them a bit.
  4. You get to think a lot about where to store data and the relationship between event store and regular database. When trying to build a franken-system with an existing legacy database, you get to play on hard. Yay!
    The specific problem you encounter a lot is that state is generally captured in the database, but ES lends itself to storing state in the event store, which means dealing with what happens when data changes. How do you deal with queries from the DB, which are supposed to give the same result when replayed? Really the event store, is inherently temporal, whereas most normal databases have no concept of time as such.

Now to the good stuff:

  1. Since queries are difficult, you will probably structure your data in a way that makes it easier (and faster).
  2. Communication is difficult because it has to happen through a well defined interface. This also means that you have to actually think about what you want to do. It sucks, I know.
  3. Side-effects must be dealt with explicitly. I know you are thinking of Haskell and monad hell, but it does not have to be that bad. You just have to employ some kind of strategy, which is good, because presumably there will be fewer errors and data integrity will be better. You are forced to think: “Okay, what happens if this event is replayed hmm, that would not be good, so I should do xyz”.
  4. This expands on pt. 2, but since there is only a well defined interface for communications, you will have to properly decouple your services.
  5. You get an audit log for free. Want to know everything that happened in the system on July 25th 2016? If the system is designed correctly, you should be able to rewind the whole system to the state of 10:04 that day. Wow!

So how do we deal with the points? That’s what the rest of this post is going to be about.

Handling side effects

When a service starts and it begins listening on event streams you have to specify from which number event you want to listen. In principle you could start listening at the first position in the stream and just replay all events, but if they have side effects, this is not going to work. We could of course just assume that all events in the stream have finished processing, but if there is some kind of delay in event processing, we will end up skipping events to be processed.

In my implementation I use a generic request/response pattern instead. The listener has a handler which either produces a positive or a negative result, which then leads to emitting either an event with the final part named “Processed” or “Error”. In this way we know exactly how far we got “last time” and can pick up at the right spot.

The compromise

I mentioned that we are going to implement this next to a legacy system with a MSSQL backend. That means that we have split state, with some stored in the MSSQL database and some stored in the event store. Since we am willing to sacrifice


Part II

System overview

The system is a C#/.Net monolithic application with a MSSQL backend. We will be looking at “plugging” in these services:

  • Gateway — a web api gateway which generates events based on requests and possibly waits for other events to be generated (like errors or successes)
  • Account service — takes care of everything to do with user accounts, creating them, deleting them, and (surprise) password resets.
  • External message dispatch service — Handles external dispatching different messages, like SMS, email and push to phones. It has a simple templating engine for localized emails.

Our case for today is password reset. Users forget their passwords all the time and we do not want to know or set their passwords for them, so we need them to be able to reset it themselves, using their email.

The flow looks something like this:

User’s password reset flow

How to model this? In ES we always send an event after the fact, so an event is never a command in itself (though it may indirectly trigger actions). This is practical because it removes any tight coupling between services, as it reverses the relationship between caller and callee, in a way it is the opposite of a remote procedure call (RPC) and closer to the classical publish-subscribe model.

The problem with this of course is that debugging can be difficult — there’s no step in, when in principle any action could have been triggered because of any event. That is, to know which actions are triggered by any event, you need to know who is listening on the relevant stream.

One interesting idea in ES is rewind and replay. If you only store data in the event store (or at least it captures all changes), then you can rewind the whole system to any point in time. This is not a property which I expect that we would make use of, since it requires a lot of engineering, and I don’t expect that it would necessarily be useful for us (at least not enough, to make the extra cost of having to contain everything in the Event Store). It’s possible that some services will have the property of rewindability (for lack of a better word) if we see that the business case is there.

Event model

The flow above is handled by three different services:

For the purposes of this case there are 5 events:

  • PasswordResetRequested — sent by the gateway when a user has requested a password reset through the web API.
  • PasswordResetTokenGenerated — sent by the account service when it has generated a token for password reset and a templated email can be sent. Contains the token and email.
  • PasswordResetEmailSent — sent by the external sender service after seeing that the token is generated and sending the password reset email. Contains the email address.
  • PasswordResetPasswordSetRequest — sent by the gateway when the user submits a token and a new password.
  • PasswordResetCompleted — sent by the account service when the password change is completed.

I am still trying to figure out exactly what the perfect naming scheme is, but I am trying to go with something like [general functionality][specific action], but it tends to get a bit Java Enterprise Bean Roasting TM-like.

Event flow

  1. First the user visits a page, lets call it /requestPasswordReset and sees a SPA app, with an email field (and maybe a CAPTCHA) and button, the pressing of which, causes a web api request to the Gateway, which then emits a PasswordResetRequested event containing the email address to the stream password-reset-[emailaddress]. This will be the stream for all events relating to the reset of this user. 
    Now we have a few options for user feedback — this is where ES sucks a bit, I think — as of course events are not really request/response: 
    * We can build request/response on top, and just wait for -Processed or -Error event to be emitted and then respond to the client. Stream semantics make this easy. 
    * We can do some kind of polling, where the client requests the gateway repeatedly for updates. “Are we there yet?”. “How about now?”. “I need to pee”. You get the picture.
    * We use a bi-directional technology, like SignalR, to let the client subscribe to updates. Putting SignalR on top of ES, would allow the clients to directly subscribe to events, which would make it easy to build the coolest dashboard ever!
    I hate the 2nd option, and the third takes more code, and as we will certainly need to do some kind of synchronous stuff at some point, we are going to implement the first option for now.
  2. The PasswordResetRequested event is seen by the account service, which looks up the user in the database, and generates a token (some large random number) in response. It then emits the PasswordResetTokenGenerated event containing the token, user language (you can maybe guess why) and email address.
  3. The PasswordResetTokenGenerated event is seen by the External senders service, which prepares an email to the user in the language which the user speaks, containing the token. It then emits a PasswordResetEmailSent event.
  4. The user receives the email, and clicks on the link containing the token. This opens up another SPA page containing the two password fields we all love. The kind where you must enter your password twice to make sure you did not screw it up. When you have entered them and click submit, the Gateway generates the event PasswordResetPasswordSetRequest.
  5. The PasswordResetPasswordSetRequest event is picked up by the account service which looks up the latest PasswordResetTokenGenerated event in the stream password-reset-[emailaddress]. If the tokens match between the PasswordResetPasswordSetRequest and PasswordResetTokenGenerated, the password is updated. Once the write is complete, we emit PasswordResetCompleted.
    This is a delta event, because we are actually changing state and this means we must first query the database for the existing value, then update it and save both. So it’s also reversible, which is cool, but I am not sure we actually want to let users go back to earlier passwords. But we can do annoying things like prevent recycling of passwords for example.
  6. The Gateway picks this up, and a happy user can now log in.

Error handling

I indicated earlier that, if something goes wrong, we can emit a -Error type event instead of a -Processed type. This is going to be the error handling we need, so either it goes well and we proceed to the next stage or it does not and then we just report back to the user that it failed.


References

https://geteventstore.com/ — The event store database which is used.

https://martinfowler.com/eaaDev/EventSourcing.html — Martin Fowler’s post on Event Sourcing. Be careful though, Mr. Fowler does not seem to discriminate between events and commands. Events are things that have already happened and they are the only things we want in our event stream (it’s also a log, remember?).

http://thinkbeforecoding.github.io/FsUno.Prod/index.html — A nice ‘log’ about designing a Uno game in Fsharp with Event sourcing.

http://www.ben-morris.com/refactoring-large-monoliths-to-microservices-strategies-risks-and-pragmatic-reality/ — Concise article on how to refactor monoliths to microservices