About Service Lifecycles in Exonum

Exonum
13 min readFeb 13, 2020

--

This week, we have introduced “service lifecycles” into Exonum. This is one of the largest updates to Exonum to date and reflects on the growing conversation around business logic for blockchains.

We believe that software should be constantly evolving to adapt to the needs of users and its environment. Lehman and Belady pioneered this idea in their laws of software evolution — if a software does not adapt and improve, it will progressively become less satisfactory. In short: new functionalities are always needed to satisfy the growing needs of the customer, bugs are found and need to be fixed, suboptimal design/coding decisions should be periodically reviewed and updated — among other changes.

In light of this ethos, we have taken the time to review specifically how business logic is being used in Exonum and how it can be evolved securely with a service lifecycle without sacrificing performance, transparency or auditability.

A note: Business logic on blockchain is packaged into components often called “smart contracts.” We prefer to use the term “services,” which you will see throughout this article.

About Smart Contracts / Services

Smart contracts in many blockchains (for example, Ethereum) are immutable — once a smart contract is deployed on the blockchain, it cannot be modified. There are good arguments for smart contract immutability for public, permissionless blockchains. Immutability makes it easier to interact in low-trust environments. For example, a malfeasant smart contract author cannot replace smart contract code to scam its users. In the immutable smart contract model, ad-hoc upgrades can be implemented on the individual contract level, which is often the case for more the complex and modern Ethereum DApps.

However, the immutability requirement of these contracts means they must be flawless upon production. In contrast, in permissioned blockchains, flexibility, maintainability and keeping a competitive edge are all higher priorities. The trust brought by immutability can be achieved through other means such as auditability and transparency controls — while still allowing for the participants to evolve the system.

The following table illustrates differences between permissionless and permissioned chains in terms of features described above.

The unique requirements of permissioned blockchains form the foundation of the new service lifecycle in Exonum. We wanted to allow for the evolution of Exonum services using the same conceptual framework that would be applied to an ordinary web app (for example, data migrations). At the same time, Exonum keeps the service lifecycle and interaction with services safe and secure, and all lifecycle events fully transparent and auditable.

Services and Artifacts

To understand the service lifecycle in Exonum, one first needs to understand services. Exonum allows you to develop business logic in several programming languages, which is encapsulated in the concept of runtimes. How services are defined is fully determined by the runtime; the Exonum core logic uses runtimes as a proxy for all interactions with the services. Consider two primary runtimes supported by Exonum:

  • In the Rust runtime, services are placed in crates (usually a single service per crate). A service needs to be linked with the runtime at the compile time in order to be deployable. In other words, Rust services are difficult to add on the fly; for a new service to be added after a blockchain is started, node binaries should be recompiled and swapped for each blockchain node.
  • In the Java runtime, services are packaged into JARs, which can be dynamically added during a blockchain’s lifetime. The consequence is that Java services are slower.

Because of the runtime abstraction, it is possible to support new programming environments, and to define new runtime-specific abilities without changing a single line in the core framework! For example, both Rust and Java services have full-fledged REST APIs (Rust services may use the same tools to implement other HTTP-based interfaces, such as WebSockets), despite the core framework not having a single line of code dedicated towards it.

To allow for service representations that may vastly differ among runtimes, Exonum splits services and artifacts. Services represent instantiated business logic, and artifacts are sources which services may be instantiated from. Using an analogy from object-oriented programming, artifacts are like classes and services are instances of these classes. It is allowed (and desirable) to reuse artifacts among blockchains and on the same blockchain. Services instantiated from the same artifact have independent data and may have different configuration influencing service behavior, but their logic is shared by design.

Artifacts follow semantic versioning — a version, together with the artifact name, are built into the artifact ID. This allows your blockchain to reason about artifact compatibility much more productively. In fact, services may depend on other services and can use familiar caret or tilde requirements to check dependency compatibility, allowing for statements like “My service depends on exonum.Token@¹”. (Dependencies largely rely on ad-hoc mechanisms as of now; please see the discussion of future improvements at the end of the article.)

Basic Artifact Workflow

Exonum assumes that deploying an artifact could take a long time, for several reasons including network I/O (for example, downloading the corresponding package from Web or IPFS) and/or non-trivial computations (for example, compiling an artifact from sources). In the worst case, deployment may even result in a local failure (such as a compilation error).

Controlling deployment under these circumstances is a non-trivial task, and it likely doesn’t have a one-size-fits-all solution. Fortunately, the Exonum core is not responsible for deployment — instead, according to the separation of mechanism and policy design pattern, deployment and other lifecycle events are controlled by a fully customizable supervisor service[AO1] . While the core implements artifact deployment, it does not decide when to initiate deployment or when to consider it (un)successful; both these policies are defined in the supervisor. Since the supervisor is just an ordinary Exonum service, its commands to the core are fully auditable and are replicated onto all nodes in the blockchain network; the consensus algorithm itself guarantees that the lifecycle events are invoked in the same order on all nodes and have the agreed outcome. Additionally, since supervisor commands are usually given as a result of transaction processing, they are guaranteed to be properly authorized.

A typical supervisor implementation (such as the reference supervisor from the Exonum monorepo) will require the artifact deployment to be authorized by the blockchain administrators. The reference supervisor has two settings in this regard: in the simple setup, the deployment may be initiated by a single admin, while in the decentralized setup, a supermajority of more than two thirds of admins is required. Once the deployment is initiated, it will be performed in background by each node in the network. Nodes can then report the local deployment results via transactions to the supervisor service; based on these reports (and possibly the deployment deadline), the supervisor makes the final decision whether to consider the artifact deployed globally.

Once the decision to deploy an artifact is relayed to the core (that is, the deployment is committed), the node will block deployment if it has not finished locally; the node will not process any further blocks until it has the artifact deployed. The supervisor should be configured so that such “lags” occur only seldomly, since they hurt network liveness. For example, the supervisor may wait for confirmations from all validators (and possibly even auditors) and/or set a waiting window after receiving the last necessary confirmation.

The deployment workflow guarantees that the artifact is deployed for all nodes in the network by a definite blockchain height; commitment acts like a “thread join” for the deployment task. Thus, operations regarding the artifact after commitment will be treated in the same way by all nodes. One of such operations is service instantiations, which we cover below; without an artifact deployment being committed, it is impossible to instantiate services from it.

If the decision to deploy the artifact is not made, the nodes continue to operate as usual. Presently, a node will not unload the artifact (we plan to introduce this functionality in the nearest future) but will “forget” about it if restarted. In contrast, the node will re-deploy all committed artifacts on restart; this is expected to not take much time, since runtimes are encouraged to cache deployments.

The diagram below illustrates successful deployment workflow in a network with 3 validators and one auditing node. The deployment is initiated by a validator and then voted for by the other validators; in this example, the supervisor is configured so that the approval of all three validators is necessary to start deployment. After the last vote, all nodes (including the auditor) start deployment in the background. Once all validators have reported successful deployment result via a transaction to the supervisor, the supervisor commits to the deployment. Since the auditor has not finished deployment by this time, the auditor node brings deployment into foreground and blocks until it is completed.

Basic Service Workflow

Once a service artifact is deployed, it can be used to instantiate services. Like deployment, the supervisor controls instantiation (in particular, its authorization) and Exonum core executes it. In contrast to deployment, instantiation is fully synchronous — all long-running operations have been performed during previous deployment. The service artifact receives an instantiation request, which may contain service-specific configuration (such as the ticker and the decimal precision for a token service, or injection of service dependencies). If the artifact processes it normally, the service is considered instantiated. The artifact may return an error (for example, if the service is misconfigured, or if the required dependencies are not instantiated on the blockchain). In this case, the service is not considered instantiated.

A similar division of responsibilities between the core and the supervisor is used to stop an existing service instance. Once stopped, the service does not process transactions and in general does not interact with the outside world. Stopping services is not that useful on its own (except for marginal cases, for example, if a vulnerability was discovered in the service logic), but it serves as a stepping stone for the next lifecycle event — service migration.

Generally, migration in the Exonum context means updating the version of the artifact associated with an existing service instance. In the simplest case, migration just updates service logic (but not its data). Some scenarios include:

  • patching a bug in transaction processing logic
  • adding a new transaction to the service
  • removing the transaction from the service (naturally, this is a breaking change; it could be necessary, e.g., if the transaction logic is irredeemably flawed)
  • changing runtime-specific service APIs, such as HTTP endpoints for Rust or Java services

In this case, the workflow is simple:

  1. Deploy a newer service artifact.
  2. Stop the service.
  3. Assign the service to the newer artifact.
  4. Resume the service.

Since steps 2–4 do not involve long-running or asynchronous tasks, they can be performed very quickly (in consecutive blocks — order of seconds in terms of real time). A supervisor implementation can even batch them as a single operation, needing only a single authorization.

Data Migrations

A more complex — and more interesting — case for the service lifecycle involves data migration. Data migration is ubiquitous in real-world applications, and we think that blockchain business logic should be no different. While the use of Protobuf as the recommended serialization format may allow Exonum services to avoid data migrations in some cases, sooner or later migration becomes unavoidable — and Exonum provides a well-defined, safe interface for it.

The diagram below shows the possible states of Exonum services and transitions between them:

Like artifact deployment, data migrations are performed in background by special pieces of business logic — migration scripts. Since the Exonum storage engine is non-relational, migration scripts are written using the same programming languages the main service logic is written in. A script essentially creates data collections (lists, maps, sets) for the new service version and fills them based on the existing service data. The migration is non-destructive; the new data is placed in a separate migration namespace and cannot be accessed by active services until the migration is complete. Retaining data from the old service version does not require any actions in the script. If an old collection needs to be removed, it is marked with a tombstone (the term is taken from RocksDB and other log-structured storages).

Unlike the simple case, it is instrumental that the service instance is stopped during data migration. If a service remains active, its data may change during migration, which would likely result in logically inconsistent migrated data. In fact, stopping the service in the simple case is done for the uniformity with the data migration case (as we’ve discussed, the service downtime in this case may be minimal).

Exonum provides tools to ensure that migration scripts are fault-tolerant (that is, can resume if a node is stopped while the script is executing) and at the same time make progress. A migration script can dump changes to the database; to store information about script progress, scripts may use a special temporary storage namespace (a scratchpad). Scratchpads are especially powerful when used within persistent iterators — iterators over a collection in the storage, which automatically remember their current position.

To ensure that migrated data is the same on all nodes, Exonum utilizes automatic state aggregation. The contents of all top-level hashable collections within the migration namespace is aggregated into a single hash value, which thus commits to the entire migrated data (or its important part, anyway; like with the ordinary data, Exonum does not forcemerklization of the entire blockchain state, recognizing that there may be reasons against it, such as performance or legal). If the resulting migration hashes are the same on all nodes in the network, we can decisively infer that the migration data is the same, too.

In summary, the data migration workflow is organized as follows:

  1. Precondition: Ensure that the service is stopped.
  2. Fetch a migration script from the newer service artifact. The artifact may return a special no-op script (this is how simple service migrations are organized). The artifact may also signal an error (for example, the initial service version is too old, so the artifact doesn’t know how to migrate from it).
  3. Start executing the migration script in the background.
  4. Once the script is finished locally, remember the aggregated migration hash.
  5. Using the supervisor, report local migration hash and determine if there is consensus among nodes as to the migration outcome. (The exact policy depends on the supervisor.)
  6. If there is consensus among nodes, commit the migration outcome. At this point, nodes that did not finish the script locally will bring it to the foreground and block until it is completed. If the script is completed with a different outcome, the node stops. (The node administrator can then retry remove roll back the migration locally, so it is retried again.)
  7. Replace old service data with new one. At this point, the service returns to the “stopped” status. The service may be resumed, or more migration scripts may be applied to it to further update the data layout.

If nodes do not achieve consensus on step 6, or the supervisor finds another reason to consider the migration failed, the migrated data is removed, and the service returns to the “stopped” status.

The diagram below illustrates workflow for one successful and one unsuccessful data migration. The supervisor in this example is configured to start a migration when authorized by a single validator (that is, no voting is required). The first migration is finished successfully, and the service data is replaced with the newer version as a result. The second migration fails because two validators report different migration outcomes, and the supervisor is configured to command migration rollback in this case. The second migration script is still being executed on validator #3 when this happens. This script is aborted since its outcome will not be used.

The described data migration workflow is modular and flexible:

  • It is possible to perform several data migrations in a row without resuming a service. This mirrors a common workflow with migrations in relational databases / ORMs, in which migration scripts are linearly ordered and each script is applied to any service instance exactly once.
  • It is possible to perform one or more data migrations and then decide to resume a service with an intermediate version (for example, because the latest data migration has failed).

Conclusion

The Exonum service lifecycle allows you to safely evolve the business logic deployed on your Exonum blockchain. And because the lifecycle policy is implemented outside the Exonum core, the policy itself can be flexibly adjusted during blockchain evolution. And since interactions with Exonum services are transparent and auditable, the lifecycle is guaranteed to satisfy these qualities by design. Finally, the Exonum storage engine allows to fearlessly migrate service data, ensuring its agreement among all nodes in the network. (The design of the storage engine is a good subject for another article.)

We are continuing to streamline and improve the service lifecycle as well — here are a few improvements in our Exonum roadmap:

  • Support a more complete artifact lifecycle, such as unloading artifacts
  • Support more service states, such as frozen states (a state in which a service cannot change its data, but it can still interact with the external world via read-only interfaces, such as HTTP API)
  • Support notion of service dependencies
  • Provide an interface description language for services.

Our team is working hard to integrate these Exonum features into Exonum Enterprise, our Blockchain-as-a-Service platform that allows any business to launch a blockchain in just 5 minutes. You can request a trial of Exonum Enterprise today on our website: https://exonum.com/enterprise

--

--

Exonum

Bitfury’s open-source framework for building private and/or permissioned blockchains.