Conquest of Distributed Systems (part 4) — The Now and the Vision

Serge Semenov
6 min readNov 18, 2018

--

Distributed systems are essential today and will prevail in the future, and the more you dive into the theory, the more you may realize how complex they are. As Jim discovered in parts 1 and 2, even with a wide spectrum of available cloud services it is still challenging to build a scalable and fault-tolerant system in a meaningful amount of time.

Unfortunately, knowledge gaps will inevitably lead to occasional system failures, which reflect in hectic repairs and ultimately end-user dissatisfaction. It is hard to imagine every single software developer going Jim’s path of mastering distributed systems — a truly successful team consists of experts with diverse skills. Therefore, a reasonable solution would be an attempt to make complex things effortless and available to everyone and leave the rest to experts’ discretion.

Dreams Come True

While Jim depicts async methods as Actor Model (part 3) and gives us necessary clues, his imaginary character vanishes, and we have to return to reality. The story is not over though!

After a couple of years of prototyping a real framework emerges that seamlessly integrates with .NET runtime and controls the execution of state machines compiled from async methods, saves their state and resumes on demand. This framework shows that the same code can run with the regular in-memory runtime, can save execution on local disk, and also can run in the cloud.

Same code, no changes at all.

Code from part 3 that can indeed run as an Actor without any changes

The project is called merely “D-ASYNC” — “Distributed Async functions”. This framework works as an extension and plugs on top of an application without any pre or post compilation steps, although there are some rules and restrictions.

The idea is to automatically create proxy types at runtime for your services (classes) and other external services (interfaces), so whenever you call an async method, it does not get executed immediately, but instead get scheduled as a message. Then with some sophisticated execution control of underlying state machines, the framework commits all accumulated actions when state transition finishes.

The primary purpose of this framework is to make programming language abstractions a first-class citizen of distributed applications, translate async methods and other artifacts into a platform-agnostic set of intents, and let the platform handle message passing, persistence, and other aspects. As an end-user, you can either build your platform for distributed systems, adapt the one you already have developed, or ideally use an existing one to save the effort and focus on the core of your business.

Beyond Serverless

Today’s FaaS (Function-as-a-Service) and other serverless implementations act as a simplified host for your application with a variety of connectors that trigger them. But in most cases, these are still NFRs (non-functional requirements) which don’t convey the business intent.

What if we can write code that focuses on business logic without writing code for the ‘infrastructure’ layer? It is possible. The “D-ASYNC on Azure Functions” demonstrates how you can deploy services with a simple code push to a source control system. The services are defined merely as classes and interfaces and run in a distributed auto-scalable manner.

Runs in-memory only? Also runs in the cloud in a distributed manner. No extra NFR code.
“D-ASYNC on Azure Functions” demonstrates the vision beyond serverless

What can be easier than referencing one NuGet package and declaring your types as distributed services?

And most importantly, how much time did you spend learning how to write reliable distributed services by looking at the code above?

Why distributed workflows?

It would be naive to think that distributed systems are a synonym to cross-service workflows. However, this is the most common problem that developers solve over and over again. Workflows directly correlate to the business description of a product and help to simulate the perceivable causality of the world we live in.

Simple “when-then” logic can be viewed from two angles. In addition to commands that tell other actors what to do, sometimes we need to apply the Dependency Inversion Principle at a service level and emit events instead, which can have zero or more subscribers. Events help to decouple services and shift the decision making from one side to another one. Such events are still the part of a workflow, and programming languages like C# have a built-in support for them.

Reactive Event-Driven Design works well with the Actor Model

It is worth mentioning the consistency problem once again. Many production issues could have been prevented if we had atomic transactions when committing entities to a database and sending events/messages at the same time. Lack of consistency is one of the most common traps, especially for starter developers. But it is achievable with the actor model with a cost of the performance. Other solutions like Event Sourcing perform better, but you lose the productivity due to the complexity of implementation and inability to see the workflow.

More Than a Tool

If we step back for a moment and forget about C#, async methods, and Actor Model, what we have is a set of complex problems that require complex solutions. And those problems are not going to disappear magically any soon, albeit more and more companies implement service-oriented design as they grow. Most of them will have to go the similar path of trial and error along with a steep learning curve. In long-term, such hurdles inflict high cost of development in regards to both finances and time to market.

Programming languages evolve and should reflect modern development needs. Why not create a new one specifically for distributed computing then? The problem is in adoption and ecosystem. There are too many mature libraries, frameworks, tools, knowledge, and experts around most popular programming languages. A new language would not stand a chance, at least within first ten years or so. Thus it is beneficial for everyone to extend what we already have today, something that has been proven over time, something that is well-known.

For these exact reasons we use LINQ (Language-INtegrated Queries) for example, even though every single underlying provider has its restrictions, despite the performance overhead, regardless if it breaks at runtime when you try to use a custom function inside a query. It is incredibly productive, easy to learn, and easy to fix.

No Company Starts with Microservices

But most successful ones end up doing them, what usually accompanied with a painful monolith breaking down. Besides, as demonstrated in part 2, you may re-write a lot of existing code for the only reason — to make new services run reliably in a distributed manner. And unfortunately, you cannot make a big design upfront simply due to business uncertainties. Even if you make your code perfect, it is going to erode as business requirements change.

With Domain-Driven Design and language-integrated distributed workflows, instead of re-writing the code you can group few entities, put them into a separate project, create a new service. And in most cases that will preserve the same contract, because your code already represents the business logic and is free from extra code needed only for infrastructure’s sake. No significant re-writes means faster software evolution. Yes, there will be challenges, but this is still much easier than traditional approaches. With proposed language-integrated ‘infrastructure’ you would skip implementing most of the Infrastructure Layer of the Onion Architecture:

Transparent inter-service communication without a need for dedicated infrastructure code

Such design can help to naturally evolve software design along with the increasing complexity of a product and growth of an organization.

The proposed approach can help to build cloud-native distributes applications from day one. No more extra overhead upfront. Just write the code almost like for a desktop application.

Afterword

If you are wondering what happened to Jim and his team, well, he decided to keep writing unreliable RPC-style code and deal with occasional failures to deliver features in time. But with the discovery of a new approach, he wanted to slowly migrate the system without re-writing most of existing code. As for a real world ‘Jim’ and his team, their story has just begun. It is up to you to conclude the rest.

--

--

Serge Semenov

‘I believe in giving every developer a superpower of creating microservices without using any framework’ — https://dasync.io 🦸‍♂️