Building a realtime server backend using the Orleans Actor system, Dotnet Core and Server-side Redux
A radically different web server architecture that turned my twenty years of experience in building websites upside down
Microsoft Orleans markets itself as “A straightforward approach to building distributed, high-scale applications in .NET”. Orleans offers an improved actor model that makes it possible to architect applications in a radically new way that is still undiscovered by most developers.
This is the third and final part of a series about an exploration of modern web application architecture, RROD (for React, Redux, Orleans and Dotnet Core). For the introduction, see here.
Microsoft Orleans Distributed Virtual Actor System
I have worked with Microsoft Orleans for about two years now. It has taken me quite some time to understand how to use it. Now that I do, I never want to go back to traditional web server architecture. I built my first “database-driven website” in 1995, and helped build many more in the years after that, but a lot of the knowledge that I built over those years I consider now deprecated. Actor systems turn traditional web server architecture completely upside down: instead of getting all the data to the webserver and executing the logic there, distributed actor systems execute the logic where the data is. An Actor system makes sure that every data element / actor object lives in only one place (one machine in the cluster), so there is no duplication of data, which simplifies things dramatically. Actor objects run in a single thread, simplifying things even further. Finally, Actors can call or notify other actor objects of changes, effectively pushing data, through a graph of actor objects, to the user.
Once you understand how an actor system works, the whole idea of gobbling data together to render a page on the webserver by executing database queries, mapping to objects, caching data, dealing with conflicts caused by data duplication, locking, and doing that over and over again for every web request, seems very cumbersome. Web servers do an awful lot of complex data manipulation that can be avoided completely when using the Actor model.
Actor or Grain? The original Microsoft Orleans developers have opted to not overload the term Actor for their implementation, as it is deliberately different from the original Actor Model definition and they probably wanted to avoid an academic turf war. Orleans uses the term “Grain” to indicate a virtual actor. I use both terms interchangeably.
Microsoft Orleans Basics
This project is about modern application architecture, based on Dotnet Core. Microsoft Orleans is a large codebase, that is used for running some big projects inside and outside Microsoft. The current version runs on the .NET 4.5 Framework. It makes heavy use of binary serialization and it uses Windows API’s to manage memory and CPU load. This makes it non-trivial to create a cross-platform Dotnet Core version. But the people working on the Orleans project are very capable and work on the Dotnet Core-compatible version 2.0 has progressed steadily. The second pre-release came out recently. I am using this preview version in this project.
Setting up a solution to use Microsoft Orleans requires a project structure consisting of a minimum of four projects:
- OrleansHost — this project buids an executable that starts the Orleans runtime (the “Silo”) hosting the Grain objects. You can start multiple hosts, they will start working together and share the load.
- Webapp — the web server application project. This project connects as a client to the OrleansHost process(es), over TCP.
- GrainInterfaces — this project contains interface definitions of all Grains, and is referenced by both OrleansHost and Webapp.
- Grains — this project contains the implementation code of all Grains and is deployed with OrleansHost.
During development, the OrleansHost and Webapp processes can simply be started and debugged together on the developer machine. In production, they will usually run separately on a cluster consisting of multiple machines, offering reliability and distributed resource management. Of course, a small application could still run on a single machine, without the reliability but still benefiting from the real-time architecture and high performance.
In the introduction I said I didn’t want to be bothered with caching anymore. However, several AspNet Core libraries need a distributed cache. In a scale-out architecture, you’d normally use the Redis cache implementation that is provided as the default. As an illustration on how one can use Orleans, here is my complete distributed cache implementation for Dotnet Core, using Orleans instead of Redis:
The GrainInterface class declares four functions needed to implement the cache (Get, Set, Clear, Refresh). The Grain implementation just wraps the cachedata in an Immutable (a performance optimization to avoid copying) and keeps a reference to it in a member variable.
The OrleansCache class then implements the IDistributedCache interface that Dotnet Core defines in Microsoft.Extensions.Caching.Distributed. It uses
GrainFactory.GrainClient.GetGrain<ICacheGrain<byte>>(key) to get a reference to a CacheGrain identified by the cache key and then stores and retrieves data as a byte array. Grains can be generic types, so we can directly use the byte array with the CacheGrain. Registering this cache service at startup by calling
services.AddSingleton<IDistributedCache, OrleansCache>() gets you a fully scalable, production ready distributed cache for Dotnet, that other services (such as Session State) will automatically use.
This kind of code is easy to understand and trivial to debug. There is no need to install and manage Redis, think about redundancy, manage an extra connection string, learn another API, learn another query language if the cache needs to be smarter. It’s all built into the technology stack.
No Big Deal?
Ok, now you know how to do CRUD style actors on Orleans. But this is almost the same as using a database and building with Entity Framework or a Micro ORM. Actors offer performance and scalability advantages, but using them this way brings nothing revolutionary.
The fun part begins when you realize that this data layer can have logic in itself, you can program it in C# (or F#) and make it talk back! When you start thinking about the patterns this enables, I hope you will see why I made some big claims in the introduction. Using stateful objects and (observable) streams in the back-end enables an architecture that is very different from the data-driven request-response based architectures we are used to.
There is a Redux.NET library on nuget, mainly aimed toward building application user interfaces. The base implementation of Redux is only about 50 lines of code. I used that code to implement Redux-style event sourcing for some of my Grain objects in Orleans (there is also the official Orleans EventSourcing package but that is currently not very useful as it lacks storage support). For the ReduxGrain I wrote some custom storage logic, using Azure Table Storage, which is fast, reliable and very inexpensive, and it’s also the default storage mechanism that Orleans uses for normal Stateful grains.
The advantages that I hope this architecture will deliver are:
- Easily build real-time user interfaces, where the user automatically receives updated data, as it happens. Users can have multiple browsers or devices open, and see the same data on every device.
- Use hardware resources efficiently, and make hosting costs scale linearly (instead of exponentially) with the number of users.
- The general benefits that come with Event Sourcing architectures in domain modelling, testing, auditing, traceability and storage.
- Getting Redux-style “time-travel” in Dotnet on the server, using edit-and-continue and the Immediate window.
I think this project mostly achieves these benefits. Some more work is needed to make it work offline (add redux-offline and sw-precache). Also, edit-and-continue in Dotnet can only handle limited code changes, meaningful changes often require an application restart.
Server-side Redux Actor
- When a button is clicked client-side, an AsyncAction is kicked off that dispatches the Action client-side (updating the view) and also posts a request to update the state at the server using a Web API call.
- The web server receives the request, gets a reference to a
CounterGrainobject (identified by the web session ID) on the Orleans Server, and calls the equivalent method there.
CounterGrainis subclassed from
ReduxGrain, which is a common base class for grains in this project. It uses an Azure Storage Table to store every Redux Action it ever got. On activation of this Grain, the stream is replayed to get the current state, after which new Actions are processed, added and stored.
- The user can refresh the browser anytime. When this happens, the page is rendered server-side, using the current state from the grain. Orleans keeps Grains in memory, once activated, for two hours by default (or until the Orleans runtime decides it needs to free up more memory). This means that reloading data on the client does not result in database hits on the server; the grain is in memory already and it always has the latest state.
- Updates after the initial page render happen client-side, using the same React based component.
- The Server can initiate updates at any time. In a more complex scenario another user might cause the updates. To simulate this, I added “Start” and “Stop” buttons that operate on a server-side timer managed by Orleans. When started, the server will increment the counter every 3 seconds.
- When a timer tick happens, an IncrementCounter Action is generated on the server. It is processed by generating and dispatching the action to the Store, and saving it to storage. On grain activation we registered a subscription on the Store that will Publish the new State to an Orleans Stream “ActionsToClient”.
- Every client keeps an open WebSocket that subscribes to server updates for its session. The corresponding SocketHandler on the web server subscribes to the Orleans Stream “ActionsToCient” with the id of the session. Actions can be published to this stream from any Orleans Grain. They get passed to the WebSocket and dispatched to the client-side Redux State, resulting in an automatic update of the view.
Here is a schematic overview of the flow:
And here is what this looks like to the user:
The MVC Controller receives the
/incrementcounter request, gets a reference to the Counter Grain for the current session and calls the corresponding method, awaiting the result.
The Redux reducer that processes the action server-side in C# is almost the same as the one in typescript working client-side: it processes the action and returns a new resulting state.
ReduxGrain base-class takes this reducer function as a constructor parameter, together with an injected storage handler of the correct state type, to implement the
Dispatch() method takes an action and updates the State, using the Reducer. The
WriteStateAsync() function is functionally equivalent to how normal Stateful grains store data in Orleans. This makes it easy to change normal Orleans grains to Event Sourcing grains. ReduxGrain implements its own version of
WriteStateAsync() that appends its internal list of unsaved actions to an Azure Storage Table using a partition-key based on the grain id and a row-key based on the timestamp.
The methods inside
Dispatch() executes the reducer which should be a pure function without side effects. Grain methods can dispatch actions and execute side-effects, such as saving to storage, publishing events to a stream and starting timers. The architecture could be taken even further by moving those methods into another type SagaGrain that would manage long processes.
In the StartCounter case, we start an Orleans Timer, after which the registered method will be triggered every 3 seconds.
If you look into the source code of the RROD project you might notice that I experimented with different ways of sending actions between the browser, the web server and the Orleans server: using WebSockets or Web API, using Orleans Streams or direct RPC, passing Command objects to a generic Process method or using separate Web API calls with typed data. All these methods work; I eventually settled on using plain Web API calls when going from Client to Server, the controller method will just forward a request by calling the corresponding interface method on the Orleans Grain. The
Data flowing back from the server to the client uses Action messages that flow through a WebSocket connection that is opened when the client starts. The flow is made fully asynchronous, using Orleans Streams. Making secondary messages asynchronous in Orleans avoids deadlock scenarios that can otherwise happen easily in Actor systems.
WebSockets and Dotnet Core
I used SignalR in the past with great success and I wanted to use it here too. However, a cross-platform AspNetCore version of SignalR is not available yet. It is currently being built by the AspNetCore team. I first tried to use a prerelease version, but could’t get this to work together with currently released versions of Dotnet Core. Then I found this WebSocket manager middleware for AspNet Core (by Radu Matei) that worked, so I went with that. There is no fallback for browsers that don’t support websockets, and it’s somewhat low-level, but it works well enough for passing simple json objects. When SignalR becomes available, it can easily replace the current code.
The resulting Counter page works as designed: Updates can be initiated from the client and from the server. Page rendering works server-side and client-side, and the page is updated automatically when the state changes. The complete action history is saved in an Azure Storage Table.
If you start at the Home page, navigate to the Counter page, increment the counter you’ll see that the Counter page is rendered client-side. View-source will show an incorrect count. Do a refresh while on the Counter page and you’ll see that the page is re-rendered server-side, and you see the correct current count in the html source.
Deleting the session cookie and refreshing the page will generate a new session, resetting the counter to its initial value of zero. Don’t worry if you start a timer and then leave the grain behind: Orleans will let the grain run for some time and then deactivates it automatically when it doesn’t receive input anymore. If you want a grain to persistently reactivate, you can use an Orleans Reminder.
The only limitation of an Orleans-based architecture that I ran into is that it provides no built-in way of searching for data, or doing interactive analysis on data. Modelling actors in a smart way can often avoid the need for an index, but sometimes you just need to search through a big pile of static data and loading all of it in memory as actor objects is not feasible. This can only be efficiently implemented with an index. I used Elastic Search or Azure Search as a addition to my Orleans projects to fill that gap, for instance to provide an faceted search and navigation feature. If you need powerful reporting then PowerBI is another good addition to Orleans.
At Microsoft Research, where Orleans is developed, they apparently ran into this limitation too, and smart people there have been working on built-in actor indexing. They tell me their code will be released sometime soon, just like support for distributed transactional actors, a limitation that I didn’t encounter myself yet. So even these limitations may go away eventually.
There are several other solutions that make reactivity and real-time processing on the server possible. Meteor, Firebase and RethinkDB are just a few well-known technologies that can bring real-time data to your application, usually over websockets. They all offer some way of subscribing to updates as they happen in the back-end and let you connect these events to business logic.
I prefer the Actor based approach because I think it’s simpler and more easily scalable. With actors, everything uses the same programming language, all the notification/subscribe logic is in the actor and you can make it do anything you want. It’s just a few API’s, there’s no fancy infrastructure and no bottlenecks where all updates must pass through.
This architecture makes it possible to start a small project with a single server running both the web front-end and Orleans back-end, even running multiple “microservices” on a single server is possible. Deploy updates in under a minute using “dotnet publish” and Yams or Docker.
Yet this architecture also delivers the full power and scalability of heavyweight distributed architectures. It is possible to scale up to a cluster with hundreds of machines, supporting hundreds of thousands of concurrent users, serving all of them updates in real-time. Fast and easy deployments remain in place, even at scale. Just run the Yams or Docker deploy command at the end of your CI pipeline.
Some new parts of this stack (such as Orleans 2.0 and AspNetCore SignalR) are still work-in-progress, but their current .NET versions are already great, the .NET Core versions will be done soon enough and they will be awesome.
Best of all, this is all open source, free and cross platform. It can run anywhere. Just clone the code and run it. This stack, I think, has a bright future.