CQRS + Event Sourcing (Making SharePoint Decent Again, Maybe)
SharePoint is a powerful tool that does offer a lot of extensibility, which one could argue is it’s biggest flaw. Regardless, I wanted to investigate how modern day distributed architectures could be used around a tool like SharePoint to provide a better more stable developer experience, but also provide scale and performance on top of a tool that lacks the latter.
A recent opinion I’ve developed when working in SharePoint is the idea of keeping as little of your application data as possible in SharePoint. I think you should keep the data that enhances SharePoint features in content types i.e., managed metadata for search refinements, but all other domain data should exist elsewhere. One strategy I’ve used to achieve this starts with treating SharePoint as its own streaming microservice such that other services can react to state changes accordingly. This article will approach the idea of building a microservice like architecture backed by an event store utilizing CQRS and event streaming.
Let’s take the example of a complex workflow that captures the hiring process from a candidate submitting an application, interviewing, and down to the offer or rejection notice. We will use SharePoint to store documents such as the applicants resume, the results of the applicants interviews, the offer letters, etc. At each stage of the process it may be necessary for a manager or recruiter to review a document or push the candidate to the next stage of the process. To round things out, just for the sake of complexity, let’s say these documents can exist anywhere in the SharePoint farm and are not necessarily centralized.
The intention of this exercise is not to build a fully functioning talent management tool end to end, but rather point out how we can achieve some of these goals using CQRS and event sourcing on top of a SharePoint document library. Before we get into the design, I want to introduce the concept of CQRS and the Event Store. In the next article, I will take a more detailed look at implementing the tool and provide some code samples.
CQRS (Command Query Responsibility Segregation)
Let’s talk about Twitter. On an average day the amount of reads vastly outnumbers the amount of writes (I assume). Even if that is not the case, optimizing the ability to load tweets (building timelines, notifications, etc) versus posting a new tweet would not necessarily benefit from using the same strategies i.e., optimizing reads will not follow the same approach as optimizing writes. So why should we use the same code, share the same database, or even share the same infrastructure? Another example could be Netflix. Netflix certainly has a read heavy application as the amount of new movies that get uploaded are vastly outnumbered by the amount of streaming content. This is also a good example of how uploading content and streaming the content are very different beasts and certainly warrant different architectures if not different applications and teams. Enter CQRS…
The main idea behind the CQRS pattern comes from the inherit differences in what it means to query a domain object versus issuing commands to change or create one. In other words, imagine a CRUD operation being split up in such a way that the Reads (Query) are designed and scaled independently of the Creates, Updates, and Deletes (Command). The basic flow of a CQRS application would be as follows:
- Some actor issues a command (i.e., a user submitting a form).
- The command engine will process this command, run the business validation, update its persistence stores, etc.
- Once the command has been processed, the command engine broadcasts this “event”.
- The “event” is consumed by any service that is interested. This could be a reporting engine that is displaying dashboards in real time or this could be a view model engine responsible for rendering client apps, or a caching service keeping its data up-to-date, to name a few.
- The read model then can easily handle the queries with miminal work to transform data, i.e., loading a page
- Alternatevely, the generated event could push the update straight to the client over a websocket connection for realtime notifications if required by UI.
At a high level this pattern is not much more complicated than that. The real idea behind CQRS is not necessarily to impose a lot of constraints, but rather just the opposite in the sense it breaks down a potentially large application into 2 smaller, independent applications. It also creates these sort of hooks through events that allow for decoupled extensibility.
For more details on CQRS, I would suggest reading Martin Fowler’s explanation here.
As I mentioned, CQRS is a very generalized, loosely defined way of breaking down an application. With that said, a common architectural pattern that has evolved is the idea of event sourcing. We mentioned that the Command side of the application can have its own database and should be optimized for writes. This is were the event store comes into play. In order to understand event sourcing and the event store, you have to give up your preconceptions of the traditional approach of storing state in relational databases. In the world of commands and events, data is persisted not as a “point in time” representation with columns and join keys, etc, but rather an audit of events that when replayed in sequence will derive the state of the domain object at any point in time. This audit of events acts as an append-only log such that new events are automatically added to the end. For the SQL DBA’s out there, this means in the simplest form, your database will only ever have one table! Alternative solutions include Kafka, Azure Event Hub, or AWS Kinesis to name a few.
Note: I call out these alternatives because SQL will most likely be overkill and may even require a fair bit of tuning to achieve the same scale as these event streaming technologies. These alternative options are designed specifically as append-only logs with pub/sub capabilities built right in. Also I have very limited experience with Google’s cloud offerings so I’m not sure what their specific offering would be but Spotify has a great engineering blog discussing their migration to Google. The article can be found here.
Comparing the above image to what we initially defined for a more general CQRS system, you will notice the separation of data stores for the read side from the write side. Also we are now calling the write side database an “event store” while the read database can be any representation of data it requires or even many different projections of the same data across several data stores (relational database, reporting data cube, no sql db, cache tables, etc). This one-to-many relationship of the event store to read stores is where the event sourcing approach becomes a powerful tool. You now have the ability to have a single source of truth represented in a very flat structure that can be transformed and re-shaped into many other data stores in an efficient and scalable way. I do want to mention that this shift will create a lot of redundant data. This is okay! Storage is cheap. And remember, that no matter how much data is replicated by all these read stores, the event store hold the single source of state. What that means is if any one consumer becomes stale or corrupted, you can rebuild the read model from scratch by simply just replying the events!
Let’s expand on all this theory with a simple restaurant example and how a system could be modeled to handle a point of sale application:
- Patron enters the restaurant and is seated. Triggers the Tab Opened event.
- Patron orders drinks. Triggers the Item Added to Tab event.
- Patron orders dinner. This also trigger the Item Added to Tab event.
- Patron pays the check. Triggers the Tab Closed event.
If at any point, I needed to run a report on the tab or print out the check, I would just replay the events starting with #1 on down. The traditional CRUD approach would be inserting a “tab” record into one or many tables to store the relational data and subsequently perform the updates in place as it changes. There are two subtleties I’d like to point out that may have been lost in my oversimplification of this scenario:
Point number one…
Looking at the Item Added to Tab event, you may ask how you know what item was added or what the price of the item is. Keep in mind, every event can contain metadata in its payload. The idea is that you don’t clutter up the event with data that is not relevant or could be stored in another event. So in this case the Item Added to Tab event could look like the class below. Once serialized, this would exist inside the event as json and can be deserialized on the consuming end.
If you noticed, I don’t keep track of the total dollar amount of the tab, only the price of that single item that was added. Hopefully now you can see that if I replayed these Item Added To Tab events I can calculate the total amount.
Note that the above code is intentionally structured in such a way that the ItemAddedToTab class is immutable. In event sourcing, it’s an important concept to understand that by nature an event cannot change its “shape” after it was created. In other words, an event cannot be partially changed and any necessity to “change” the data would require a new event.
Point number two…
I briefly mentioned earlier about how other services can listen to events. Thinking through a few scenerios in the restaurant example, imagine a ticket needs to be printed in the kitchen whenever the Item Added To Tab event takes place. This in turn would allow the cooks to start preparing the menu item. You could take this one further and differentiate a food menu item vs a drink menu item such that the bar service could follow this same pattern but only for drink items. Let’s say the front of house staff service needs to keep track of open tables…it can listen (subscribe) to the Tab Opened event to mark that table as occupied. Then when the tab is closed (Tab Closed), you can mark that table as available again.
I lied, I have a third point…
You may (or may not) be asking…what happens when the wait staff accidentally keys in the wrong menu item? If you remember from my first point, I mentioned the idea that events are immutable and permanent and can only be changed by other events. So the short and simple answer to your question is to create a new event, call it Item Removed From Tab, and use this event to counteract and correct the Item Added To Tab. Not only does this provide a way to adjust the tab, you also have the added benefit of audit logging to track when the wait staff is removing items off a tab.
The product manager in me sees this as an opportunity to better the product through usage reports. If I see lots of these Item Removed From Tab events, I have to start asking why so many items being incorrectly rung up? Is the UI/UX causing this and can I optimize to prevent these accidental errors? On the flip side, management may use this to keep an eye on the transactions from an accountability perspective. Either way, this inherent traceability opens up a ton of capabilities.
Let that sink in
We covered a lot and I would recommend looking into both CQRS and Event Sourcing in more detail as I have only scratched the surface. In the next article, I will go into a formal design exercise of how we could use these patterns to implement the aforementioned talent management tool. Before I conclude, I do want to mention that CQRS and Event Sourcing are a part of a much bigger conversation in the topic of Domain Driven Design, and a lot of the times the benefits aren’t always tangible. Rather, when we start think in terms of event driven design and domain driven design, we start to gain a much deeper understanding of our own business process. It’s almost like a form of self reflection that allows the design of your apps to focus more on the business problem we are solving rather than adhering to the limitations of a single technology. At the end of the day, we as technologist need to deliver value…not technology.