The Actor Model & Inter-Shard Communication on ETH 2.0
There is much buzz around Ethereum 2.0 — a large re-imagining of how Ethereum works, which is especially focused on scaling. One of the primary strategies is to shard the state. Sharding comes with its own set of trade-offs, and leaves several open questions around how smart contracts will communicate across shards.
While we have the hood up, this is perhaps an opportunity to revamp Ethereum’s programming model to something that’s clearer and better reflects people’s intuitions about ownership, state ownership, and code autonomy. The actor model conceptually is both a solution to cross-shard communication, and fits Ethereum’s system of smart contracts very well.
In general, the actor model appears to be a nice fit for smart contract programming, helps clarify security implications of collaborating smart contract calls, and may even help with concurrency on ETH 2.0 at multiple levels of the stack.
An interested group met online to discuss some of these ideas. Below is roughly what I presented there, with a few additions from the ensuing discussion.
A Note on Terminology
A lot of people dislike the term “smart contract”, not just because it’s long to type, but it conjures misleading assumptions of their trade-offs. To be fair, there is some history behind the term, but that’s a level of nuance that the casual blockchain user won’t generally dig into.
I propose calling them simply “actors,” in line with the actor model described here. This is something that I already tend to do in my head. As we will see, there are many parallels between the two.
A Mini-Primer on The Actor Model
The actor model is closer to Alan Kay’s original description of object-oriented programming than what we actually ended up with in C++, Java, and Ruby.
Actors are not totally unlike objects, but you do need to think about them differently. Like objects, actors may or may not hold internal state, and communicate with message passing. On the other hand, they execute totally independently and concurrently, and live at an explicit address. In order to communicate, actors need to know the exact address of the recipient, and an appropriate message to send them.
Most systems have a pull-based system where messages are cached in a queue (“mailbox” 📬) until the actor decides to handle them. This mailbox system flips the control from the caller to the recipient — it’s not a function invocation on an object; it’s truly a “message” (i.e. an event that can be reacted to if the actor decides to.)
What Actors Do
- Pull messages from its mailbox queue
- Decide what to do with a message (including ignore)
- Update its internal state (if any)
- Send messages to other actors at addresses that it knows about
- Await response from others (typically with a timeout)
- Create new actors
An Example Flow
In the above diagram, we start with the actor “👨🔬”, who has some state (📈) that only 👨🔬 can interact with directly. 👨🔬 also has a mailbox (📨) filled with queued messages (✉️).
- 👨🔬 needs some information from 👩🚀 to continue his analysis
- 👨🔬 knows 👩🚀’s address, and sends her a message (🤔 ✉️) asking for data
- 👨🔬 continues with his work (by default doesn’t wait)
- 🤔 ✉️ lands in 👩🚀’s mailbox (📨)
- 👩🚀 is busy collecting spacewalk data
- …some time passes… 🕐 🕑 🕒
- 👩🚀 is now ready to handle a message, and goes to her 📨 and retrieves 🤔 ✉️
- 👩🚀 gathers the needed data from her state (📊)
- 👩🚀 sends some raw data (📚✉️) to a data cleaning service (👩💼)
- 👩💼 goes through the same 📨 process, and gets to 📚 ✉️
- 👩💼 has no state, and only cleans up data
- 👩💼 sends the cleaned and formatted data (✨ ✉️) to 👨🔬
- 👨🔬 eventually handles ✨ ✉️, and updates 📈
Note that the arrows in this case go in one direction only: they’re not function calls with a
return; they’re simply messages propagating through the system. An actor may need to wait for a message to be sent back, but that can be handled with a flag in state (possibly with some helper function) and a timeout in case the other actor takes too long (it’s explicit that this is a concurrent system that no actor actually controls.) Ideally an actor doesn’t care about where the response comes from, but just that it gets a sensible response.
Similarities to Today’s Smart Contracts
- Need to know the address to send message to
- Can generate new smart contracts / actors
- Can decide what to do with a message
Differences from Today’s Smart Contracts
- Smart contracts are fully sequential per-transaction
- Actors are concurrent-by-default
- Smart contracts have no control over if they’re invoked (by default)
- Actors imply independent agents with no direct control of each other
Example Actor-Based Systems
A Primer on Ethereum 2.0 Sharding
Ethereum 2.0 is still being specified — so many details are likely to change — and much of the information on current plans is fragmented across several resources with uneven levels of freshness.
There’s a bit more nuance than this, but essentially a key characteristic of Ethereum 1.0 is that all events are totally ordered: events and EVM operations always occur one after another in some well defined sequence. This has the advantage that it’s essentially a very simple, single-threaded world computer. The downsides are that single-threaded machines are inefficient for many tasks: most Ethereum applications do not depend on each other, and could conceptually run in parallel. Today we sequence them for security reasons: the deeper you bury a transaction in confirmations, the more likely it is to be more hardened and stay in the chain.
most Ethereum applications do not depend on each other, and could conceptually run in parallel
Another reason to shard state is to not have every node hold all of the ever-increasing state. This is not an uncommon thing to do with traditional databases. There are also a few other proposals to handle state side, chief among them being the state fee proposal. Some of these changes (like state fees) will likely cause much of how we build smart contracts today to be done differently — for instance, tokens and NFTs getting broken up into smaller components literally controlled by their holders.
Sharding is where the state is broken up into parts that can live on different machines across a network. The idea is that shards without dependencies in a transaction could execute in parallel, and each node could only have to maintain a smaller amount of our ever-growing state. In the initial stages of ETH2.0, there is a fixed number of shards (1024), giving a stable topology and buy time to develop more scaling research (if the network is growing exponentially, 1024 shards only buys us so much time.)
AFAICT there is still a globally-defined total order of events (Ethereum will remain a chain, not a DAG). The finalization process still sorts the events into a single order. You can still get determinism without a single ordering (for a restricted set of cases), but there’s really no harm in flattening the call graph after the fact.
It should be noted that we do have some limited forms of concurrency today, with transactions executing in true parallel and getting ordered after the fact. What we lose is guarantee of event occurrence: your completed transaction may or may not have actually occurred on the correct history. I’ve heard it likened to quantum mechanics, so calling it “quantum concurrency” seems apt 😛 In a sense, sharding maintains this style of concurrency, but with different trade-offs.
Fitness of the Actor Model for ETH 2.0
There’s two layers that the actor model may be viable: at the shard layer, and at the application layer.
It would be nice to have a single approach for both on-shard and inter-shard communication.
Actors can be used to model both sequential and parallel processes, and could even potentially be used on ETH 1.0. It implies very safe assumptions about how the system works: you’re not invoking a function on an friendly object for the answer, you’re sending a request to an independent actor who may or may not be friendly. Further, most applications today assume that they will exist within a single transaction lifetime, but this is often not the case — removing as many implicit assumptions we have about latency and event ordering as possible is an application security win.
you’re not invoking a function on an friendly object for the answer, you’re sending a request to an independent actor who may or may not be friendly
If we take sharding to an extreme end, each smart contract would live on its own shard. The actor model scales directly with this type of growth: it’s a programming model designed to scale horizontally. Plus, the explicit concurrency / runtime event indeterminacy assumed everywhere is a nice safe assumption.
What could this look like?
Communicating between shards can be handled many different ways. Since one of the goals of sharding is more parallel execution over different machines, we are automatically back in traditional distributed systems land. Waiting on a synchronous call from one shard to another seems wasteful, especially since each shard will likely be able to handle its own messages. It seems likely that notifications of new transaction roots and call arguments would be passed asynchronously, and placed into the appropriate order by the shard, and any network-wide ordering would be coordinated separately, or handled with some ordering algorithm.
This seems like a very natural fit for shards to adopt the actor model for inter-shard communication: treat each shard as an actor with an EVM or eWasm engine as the handler function inside to do the internal state transitions (they’re all just functions, after all). This doesn’t address the many questions of network coordination, but is a nice implementation detail that — if made explicit — leads to very reasonable assumptions about how event propagation can happen across the network.
As an aside, IMHO Ethereum smart contract programmers should never need to know which shard some code is on: these are details that should be completely abstracted away. Ethereum has this beautifully abstract, enormous address space. Because math is powerful, we can treat the entire network as one continuous system, and do all routing under the hood. This approach also gives the network flexibility to re-shard without breaking anyone’s code: either by breaking down shards into smaller chunks, moving code and state between shards, replicating data redundantly onto multiple shards (something akin to RAID 5).
Actors aren’t a silver bullet. Unsurprisingly, the biggest issue is the conflict between the blockchain’s deterministic event ordering versus distributed system’s event locality and general drive towards as much control independence as possible (but no more). While the shard, VM, and/or a coordinator node can guarantee priority and a global sequence of events (via some deterministic selection mechanism), in general the actor model makes no claims about message ordering. Making this explicit to the programmer at the application layer is a great, but is less ideal if we need a system-wide total order.
“Train and Hotel” Problem 🎟
The hotel and train problem is a simple example of needing to coordinate state between several otherwise independent contracts. These may live on different shards, which makes things more challenging (especially in a blockchain context.)
To book our trip, we need to do three things:
- Request a train ticket
- Request a hotel room
- Check that both succeeded (logical
We don’t need to make any claims about ordering, other than that both 1 and 2 happen before 3.
If these smart contracts
revert on the same shard, it’s very easy to roll back the state on the current shard. Message passing gives a nice way to partially ordering events to unlock in-transaction concurrency.
What happens when 🚋 is on another shard, but that shard was on the wrong branch of history, and thus gets rolled back after confirming the train to 🤖?
All of the involved shards must be rolled back. In general, this is not a problem: yes there’s a need for synchronization, but that’s what happens when you have a state dependency. We may emit rollback or prune events along the transaction’s shard call chain.
Some applications rely on there being an upper bound on the “latency” of cross-shard messages (e.g. the train-and-hotel example). Lacking hard guarantees, such applications would have to have inefficiently large safety margins. [emphasis in original]
The problem that may arise is synchronization costs: there’s latency communicating between shards. A collaborator shard may have already moved on to the next request, and both transactions are now on the wrong branch of history. The likelihood of this scenario increases with the number of shards and length of time between synchronization.
The naïve solution is locks: all of the collaborating shards are forced into synchronization at exclusion of other requests. Other than losing some of the performance edge that sharding gives (since the shards are now essentially a sequential system with higher-latency between smart contracts), this potentially opens the network up to attacks that require locked coordination between many shards.
Replicating data between shards has issues with data being stale, and thus on the wrong branch of history. While replication can get something akin to horizontal scaling and (potentially) edge computing, the coordination costs and potential for being on the wrong branch will also increase. Luckily, if restricted to only the
revert events, this is fine since we’re pruning that event and everything after it.
The universal scaling law shows that scaling has not just diminishing returns, but that there’s a point where performance actually drops off. This is specifically when there is need to coordinate; truly isolated systems don’t have this drop off.
The actor model itself isn’t a magic scaling solution: if you have data dependencies, you have coordination costs. It does provide a nice way of think about messages between smart contracts, and a unified model for on-shard and cross-chain computation. Given the roadmap, I believe that it makes sense to move smart contracts to the actor model. It provides a nice way of think about messages between smart contracts, and a unified model for on-shard and cross-chain computation. It also nicely models network calls between shards, including queuing, buffering, and back-off. We can and should begin adding these concepts to Ethereum today, to prepare the community for more powerful features later.
Brooke is the CEO & Chief Scientist of SPADEco, a software R&D consultancy dedicated to open source. Our public projects are all under the FISSION brand, including FISSION Codes which bring standardized messaging to Ethereum smart contracts today.
We’re also running a conference in Berlin on April 16th, called RUN EVM. Join us!