Build a Virtual Power Plant With Akka

Using Akka and the Actor Model to create a distributed power plant

Published in

The Startup

12 min readJul 19, 2020

Around the world, renewable energy use is on the rise, as those alternative energy sources could hold the key to combating climate change. However, with its unpredictable nature, renewable energy generation brings in new technological challenges. Unlike with the old-style centralized fossil fuel generation, in which supply could be turned up and down according to demand, renewable energy generation can’t be forecast precisely. This unpredictable nature of both power generation and power consumption can lead to times of high supply not aligned with times of high demand. To face this problem, Lithium-ion batteries are being used, such as this mega-battery built by Tesla in South Australia. As power consumption is low and generation is high, batteries are able to store the inflating power for later use.

Tesla’s engineers have expanded this solution by introducing the concept of a Virtual Power Plant. In addition to giant centralized batteries, Tesla can aggregate the power produced by many individual households to form a distributed power grid. As rooftop solar systems coupled with Powerwall residential storage batteries are installed at each house, a cloud-based network is able to keep track on the batteries’ status, charge or discharge them, and to run real-time analysis on data, enabling to optimize power distribution and even manipulating the local power market.

In this post, we will outline the implementation of a simple distributed Virtual Power Plant by using Scala, Akka, and the Actor Model, as we explain the basic ideas behind the Actor Model.

The Platform

We wish to implement a cloud-based service that allows us to connect power-generating units (such as houses) to the network, and to query, at any given moment, the power levels of all houses connected to a given geographical region.

Since a power resource can be a house, but it can also be a physical power plant such as a solar farm or a windmill farm, we use the generalized term Resource to describe any power-generating unit.

In the above, a controller sends an HTTP request to connect a specific resource to a specific region in the system. The platform receives a continuous stream of Telemetry data via a dedicated WebSocket, which specifies the current battery level of that resource. The platform supports queries about battery levels of all connected Resources of a given Region.

The Actor Model

The system is expected to receive and process streams of real-time data that are produced by many different power-resources, and which are produced simultaneously. Handling those computations in a single thread of execution would result in unusably slow applications, memory problems, crashes, and timeouts. To support such the concurrent nature of such a platform, there is a need for Multithreading.

Locking is hard

With Multithreading, it is hard to guarantee a true encapsulation of shared resources. As threads can be interleaved in an arbitrary and non-deterministic way, it is easy to introduce rase-conditions and to reach a corrupted state. The only way to eliminate those risks is by coordinating those threads, usually with Locking.

While Locking ensures the validity of the execution, it comes with a heavy price: Locks dramatically limit parallelism, as threads often cannot work in parallel. Furthermore, it is easy to introduce deadlocks and livelocks, as coordinating between a pool of threads is complex in nature. Another considerable complexity with Locks is with Horizontal scalability — When it comes to coordinating across multiple machines, locks are also being distributed. Unfortunately, distributed locks are several magnitudes less efficient than local locks and usually impose a hard limit on scaling out. Distributed lock protocols require several communication round-trips over the network across multiple machines, which dramatically increases latency.

Writing concurrent systems using traditional low-level mechanisms such as locks and threads is difficult. The resulting software is often hard to understand and maintain.

Enters the Actor Model

The Actor Model is ideal for thinking about highly concurrent and scalable systems. The Actor Model provides a higher-level alternative that is much easier to reason about. In the actor model, we consider our system to be composed of Actors, computational primitives that have a private state, that can send and receive messages, and perform computations based on those messages. The system is composed of different actors who communicate via messages and never share memory.

“The Actor Model represents the intent of OOP in a purer way. The way classical OOP is implemented in many languages simply doesn’t take into account parallelism and introducing multi-threading “breaks” that paradigm. If nothing else I believe actors are simply a better implementation of objects.” source

The main benefit of using the Actor Model over classical Object-Oriented Programming is that it is a better way to model Concurrency and Distribution. Actors have no shared state, which eliminates situations of race-conditions. Thus, designing and implementing concurrent systems becomes simpler. Actors also provide location transparency, in the sense that the user of an Actor does not need to know whether that Actor lives in the same processor on a different machine. This makes it simple to write software that can scale with the needs of the application. It also gives the runtime the freedom to provide services like Adaptive Load Balancing, Cluster Rebalancing, Replication, and Partitioning.

Implementation with Akka

Why Akka?

Akka is an open-source middleware for building highly-concurrent, distributed, and fault-tolerant systems on the JVM. Akka embraces Erlang’s model for fault-tolerance, colloquially known as “let it crash.” This fault-tolerance model works by organizing the various Actors in a system in a supervisor hierarchy such that each component is monitored by another, and restarted upon failure.

The Actors hierarchy tree

In Akka, an actor always belongs to a parent. Typically, we create an actor by calling context.actorOf(), which injects the new actor as a child into an already existing tree, as the creator actor becomes the parent of the newly created child actor.

One of the main design challenges for Akka programmers is choosing the best granularity for actors. In practice, depending on the characteristics of the interactions between actors, there are usually several valid ways to organize a system. Nevertheless, we can follow some basic rules which can help us choose the most appropriate actor hierarchy. In general, we would prefer a larger granularity of Actors. A finer granularity should be considered in the following cases:

A Higher concurrency is required.
There is a complex communication between Actors, and those Actors have many possible states. This communication can be represented with a separate Actor.
The state of an Actor is large and it makes sense to divide into smaller actors.
An Actor has multiple unrelated responsibilities. Using separate actors allows individual actors to fail and to be restored with little impact on others.

Back to our system, we can use the hierarchical nature of Akka Actors to represent the relationships between power resources and their regions, as each Resource is represented by Resource Actor, which is, in turn, a child of a corresponding Region Actor.

Having resources that are modeled as individual actors allows to isolates failures of one resource actor from the rest of the resources in its region, and increases the parallelism of collecting battery level reading. Similarly, representing each region as an individual actor allows us to isolates failures that occur within a region and to increases parallelism in the system, as regions can run concurrently and can be queried concurrently. To simplify the implementation even more and to enable further parallelism, we use a dedicated Actor to handle each Query request.

Representing a Resource

Let’s start by implementing the most basic unit of the system — A Resource. Each connected Resource is represented by a specific Actor. A Resource Actor should be able to collect and report the Battery Level of it’s associated power resource. Therefore, the Resource Actor’s Write Protocol should consist of a RecordBatteryLevel and a ReadBatteryLevel messages, where the RecordBatteryLevel holds the Battery Level value that should be recorded, and the ReadBatteryLevel allow external entities (other actors) to request the last recorded battery level. Accordingly, the Resource Actor's Read Protocol should consist of a BatteryLevelRecorded and a RespondBatteryLevel messages, where the BatteryLevelRecorded indicates to senders that their request has reached its destination, and the RespondBatteryLevel holds the returned Battery reading. The protocol can be expressed as follows:

In the above, the Resource Actor properties include a regionId and a resourceId which identifies the resource and its associated region. The Actor receives also a socket endpoint through which it receives a stream of telemetry data.

It is worth mentioning the requestId argument. A Query Actor is expected to send queries to multiple Resource Actors which belong to its associated region. Since the Resource Actors' response time is arbitrary, it is important to correlate requests and responses, in the sense that the sender could match the incoming values with their associated outgoing requests.

We implement the Resource Actor as follows:

The connected webSocket is expected to send RecordBatteryLevel messages to the Resource Actor, and, as a response, the Resource Actor would replay with a BatteryLevelRecorded message. Upon a ReadBatteryLevel request, the Actor will return a RespondBatteryLevel message containing the last battery reading, which is stored in the lastBatteryLevelReading variable.

Representing a Region

As mentioned, a Region Actor represents a geographical region of resources. Upon requests, this actor should register a given resource by creating a corresponding Resource Actor. It should also track which Resource Actors exist in the region and removing them from the region when they are stopped. Finally, it should handle incoming query requests.

In the above, we describe the Region Actor’s protocol. The RequestResourcesList and RequestAllBatteryLevels messages are used to query the Region actor for its list of connected resources and their last reported battery levels, accordingly. The battery level value is represented by BatteryLevelReading.

An important consideration here is that each resource is represented by an actor that can stop during the lifecycle of the query, due to a disconnection or failure of some sort, and that new Resource Actor might join the group during the lifecycle of the query. The most simple approach to handle this dynamic nature of resources is to ignore any actor that joined after the query has begun, and if a handled actor stops during the query, we will treat the failure as BatteryLevelNotAvailable or a ResourceNotAvailable.

Another consideration is that Resource Actors might take a very long time to answer or be trapped in an infinite loop. To cover such a scenario, we should define a timeout for each query and a ResourceTimedOut response.

Next, we implement the Region actor:

In the above, the Region actor reacts to RequestTrackResource messages by either returning the associated Resource Actor or by registering a new resource, depending on whether the resource is already registered in this region. For this purpose, we use a simple mapping resourceIdToActor between Resource identifiers and their matching Actors. The createRegionQuery method is used to query the Region Actor, and the createResource method is used to create a new Resource Actor when necessary.

Adding a backoff layer

We implement the createResource helper method as follows:

Instead of creating the Resource Actor as a direct child of the Region Actor, we add an additional backoff layer. This is a common supervision pattern — the supervisor delegates tasks to subordinates and therefore must respond to their failures. When a subordinate detects a failure (i.e. throws an exception), it suspends itself and all its subordinates and sends a message to its supervisor, signaling failure. here, we create a backoff supervisor which will start the given Resource actor after it has crashed due to some exception, in increasing time intervals. We also add a 20% “noise” to vary the intervals slightly.

Querying the Region Actor

It is time to implement the createRegionQuery call, which allows us to retrieve all available battery readings which relate to that region.

As mentioned earlier, upon an incoming query, we would like to snapshot the currently registered Resource actors and to handle them while ignoring any additional actor that joins the group during the query lifecycle. A naive approach would be to handle the query as part of the Region Actor. With this approach, and since new queries may arrive before the previous query has ended, we would have to handle separate snapshots of actors to support those different queries. This is exactly a situation in which it is better to split the Actor to more dedicated actors, where each newly created actor is responsible to handle an incoming query. In that way, we dramatically simplify the implementation, reduce dependencies, and increase parallelism.

In the above, calling the createRegionQuery creates a new RegionQuery Actor, which is responsible for executing the query. To support the snapshotting approach, we pass down the actorToResourceId as a snapshot of registered Resource Actors. Similarly to before, we also pass down the requestId in order to match requests with results. Next, we pass down a reference to the associated Region Actor, in order to allow the communication between the Query Actor to the Region Actor, and a pre-defined time limit to handle any timeout scenarios.

The Query Actor

Finally, we can implement the RegionQuery Actor, which is created by a Region Actor upon an incoming query request and is responsible for collecting the battery readings of all associated actors. In correlation to the previous section, we define the RegionQuery Actor properties:

Next, we define the Actor. Once a Region Query Actor has been created, we wish it to send a ReadBatteryLevel message to each Resource Actor in the snapshot. This can be done using the Actor's life-cycle hook preStart, which allows us to run certain logic before the Actor starts.

The above actor responds to RespondBatteryLevel messages arriving from various Resource Actors. In order to address any timeout issues, we use the scheduler to schedule a message that will be sent after a given delay. The Timeout message will be sent by the RegionQuery to itself. Therefore, we define the waitingForReplies call as follows:

In the above, upon an incoming RespondBatteryLevel message arriving from a Resource Actor, we add the incoming result to the results collection, as follows:

For each incoming response, we check whether all Actors have replied back. In case that some actors haven’t responded yet, we continue to wait for the next response, otherwise, we send the results via the RespondAllBatteryLevels back to the Region Actor. This is done by using the context.become and context.stop which allows us to treat the Actor as a State Machine.

One Actor to rule them all

We create the ResourceManager Actor, which receives requests to track a Resource within a specific Region:

The ResourceManager Actor can be implemented as follows:

Similarly to the RegionActor, the ResourceManager maintains the list of various regions. Upon an incoming RequestTrackResource message, it passes down the message to the Region Actor. In case that the Region does not exist yet, it creates a new Region Actor.

This is it! We have implemented the Actors tree necessary to represent the relationship between Power Resources and their Regions. Let’s expose a simple REST API to allow other services to interact with the system.

Exposing a REST API

With Akka, we can easily create an HTTP server to expose a REST API. The first endpoint to implement is the POST connect-resource which allows external services to register a new resource into the system. The request should hold the identifiers of the Resource and Region, and also the WebSocket's endpoint, through which the resource's telemetry data should be received.

Upon a connect-resource request, we use the ask messaging pattern to send a RequestTrackResource message to the Resource Manager Actor:

The next endpoint to implement is the GET region-battery-levels which allows external services to query the latest battery readings of all resources connected to a given region.

Upon a region-battery-levels request, we use the ask messaging pattern to send a RequestAllBatteryLevels message to the corresponding Region Actor. As before, we use system.actorOf to retrieve the associated Region Actor:

Finally, we create the HTTP server with the above routes:

Take it further

In the above, we have described a simple way to use the Actor Model and Akka in order to process IoT data in real-time, while discovering some of the fundamental ideas behind distributed processing. As a next step, we might want to increase parallelism by using Cluster Sharding — A core feature of Akka that allows distributing actors across several nodes in the cluster without having to care about their physical location in the cluster.