Upgrading the Internet Computer Protocol

DFINITY
The Internet Computer Review
10 min readDec 27, 2021

An overview of how protocol upgrades to the Internet Computer’s layers are triggered and executed.

By Stefan Kaestle, Senior Researcher (Performance) | DFINITY

The Internet Computer Protocol combines the compute capacity of special node machines running in data centers around the world to create a seamless and scalable environment for smart contract software and data. Because the platform can serve interactive web content to users, it can be used to build websites, enterprise systems, mass-market internet services, pan-industry platforms, DeFi, and much more using smart contracts.

At first glance, upgrades might seem like a simple problem. However, upgrades in traditional blockchains have shown that it can take from months up to years to realize protocol upgrades. First, there has to be agreement by the community that a new version of the protocol should be adopted. Next, the point of time at which the upgrade should be applied has to be agreed upon, and finally, all client software needs to be upgraded to support the new protocol version.

Because this is such a difficult undertaking, protocol upgrades have traditionally often been realized as a fork of the chain. The Internet Computer blockchain aims to mitigate this by:

  • Providing an integrated governance system in the Network Nervous System (“NNS”), which can be used by the community to vote on whether a protocol upgrade should be executed or not.

Once the governance system triggers an upgrade:

  • The Internet Computer Protocol has built-in support for scheduling upgrades, meaning that each subnetwork’s consensus layer autonomously decides when an upgrade should be executed based on agreement between subnetwork participants.
  • The Internet Computer’s nodes have built-in support for downloading and applying upgrades without human intervention, if instructed to do so by the governance system. An upgrade package contains the entire software stack needed to run a node. After verifying the package content corresponds to the version the community voted to run, nodes then reboot into the new version of the protocol autonomously.

In fact, several major upgrades have already been applied to the Internet Computer since launch with very little user-perceived downtime per subnet. A complete list of all subnetwork upgrades so far can be displayed on ic.rocks using filter “NNS Function” set to “Update Subnet Replica Version.”

There are several challenges when designing and implementing a solution to protocol upgrades. The fact that the Internet Computer blockchain runs a distributed and decentralized protocol, makes it difficult to determine the current state of the system. Further, there is no global time, as nodes might experience clock drifts and might be arbitrarily far behind other nodes.

This article describes our solution to these problems and how we have been able to achieve the following goals:

  • Allow arbitrary changes to the Internet Computer Protocol.
  • Preserve all state.
  • Minimize downtime.
  • Roll out upgrades autonomously.

Before diving into the technical details, let’s discuss how upgrades are triggered.

Triggering upgrades

In the Internet Computer, there is a component called the registry, which is implemented as a canister smart contract in the Network Nervous System subnet (“NNS”). Essentially, it stores all configuration information for the Internet Computer. It is implemented as a versioned key value store, where each mutation shows up as a new version in the registry.

The network is upgraded on a per-subnet basis, and each subnet has a record in the registry that indicates which nodes constitute the subnet and what protocol version they should be running. In order to trigger an upgrade, the network simply changes the version information for the subnet that it wants to upgrade. Whether or not a subnet should be upgraded to a certain version is decided by votes in the governance system rather than by individuals.

Note that the registry contains the desired configuration, and the NNS subnetwork does not actually know which version is running in a subnet.

Executing upgrades

The Internet Computer Protocol is structured into four layers:

Internet Computer protocol upgrades are responsible for upgrading these four layers. Canister upgrades have a different mechanism for upgrades, which is described in our developer guides.

The Internet Computer executes state machine replication in each subnetwork. The idea of state machine replication is that the state is guaranteed to be identical on each honest node if each state transition is triggered by a sequence of ordered inputs and the transition function is deterministic.

In the Internet Computer, the input to the state machine are update calls sent by users as well as inter-canister messages from other subnets and the state among other things includes the state of the Canisters hosted on the Internet Computer. The order of inputs is guaranteed by consensus, and the state machine being executed is the message routing layer, execution layer, and the Canister code.

In order to build state at height h, we have to take the state at the previous height (h — 1) and apply input messages from block h to that state. As previously seen, the input in these blocks was agreed upon by consensus. For state machine replication, the code we are executing needs to be deterministic when processing a block at height h.

Upgrades can change the state machine, including the execution and message routing layers. Hence, the network needs to make sure that upgrades are running at the same time everywhere, relative to the block height.

Upgrades can also change details in the consensus layer, e.g. how notarization happens. Hence upgrades must be executed at the same block height, as otherwise different and potentially conflicting notarization schemes might be used to notarize blocks of the same height.

Finally, upgrades can also entail protocol-breaking changes to networking details during those upgrades, which might make it impossible for nodes of different versions to communicate. In order to reach consensus, all honest nodes need to be able to communicate with each other, which again implies that upgrades need to be executed at the same block height on all node machines.

Note that nodes are not going to arrive at a block height h at the same physical time, because the Internet Computer is a distributed system that does not have a global time.

In theory, some nodes might even be really far behind, e.g. several weeks, which means that for some period of time, the nodes in a subnetwork are going to run different Internet Computer versions, and the network needs to be able to cope with that.

The overall process so far looks as follows: There is Subnet A, which is running Internet Computer version 1 (“v1”). Then an upgrade is triggered to v2 at registry version r. Nodes in subnet A will eventually agree to use that new Internet Computer version at a registry version r at a certain block height h.

Nodes running v1 will create blocks and compute state up to and including that height h, and nodes running v2 take over at height h+1.

Between states h and h+1, state needs to be handed over between the two versions, i.e., there needs to be a snapshot of the state at height h.

Note that since these two versions are clearly separated, we can imagine that the network runs even a completely different consensus algorithm starting from version h+1, because there is never direct communication between the v2 protocol and the v1 protocol.

In the Internet Computer, there is already a concept that we can use to do exactly that: a catchup package (“CUP”). They contain all relevant information required for consensus to resume from it. Moreover, the CUP for height h refers to a registry version which in turn indicates which Internet Computer version is to be run height greater than h.

They are signed by a subnet, so their integrity can be verified by means of the subnetworks threshold signature. One requirement for the CUPs in the context of upgrades is that they have to be readable from both the old version (v1) and the new version (v2), so the CUPs need to be backward and forward compatible.

One challenge is that the network needs to make sure that each node in the subnet runs the correct Internet Computer version. All honest nodes must participate in version v1 until the handover CUP is created, and then join as a v2 node and start producing blocks as v2. If some of the honest notes run an incorrect version of the Internet Computer, the entire subnet could get stuck.

In order for a node to decide which version it should run, it first queries the registry to find out which subnet it should join. With that information, it also finds out who the peers in that subnetwork are. As we will see later, this information doesn’t have to be perfectly up to date. Then the protocol asks all peers what their latest CUP is. Finally, the highest of all the CUPs that have been received can be used to determine the Internet Computer version that is running in that subnetwork at the moment.

Since the IC is a decentralized system and building CUPs is a collective effort by multiple nodes in the subnet, it is not known which of the nodes participated in creating the most up-todate CUP. In order to tolerate f Byzantine nodes, 2f+1 nodes must sign a CUP for it to be valid. Consequently, if fewer than f+1 of the nodes are queried for CUPs, there is no guarantee that one of them has the most recent version of the CUP.

In the worst case the CUPs a node processes are outdated and an upgrade is not immediately detected. CUPs supported by less than 2f+1 nodes are always ignored as in this case the CUP’s signature cannot be verified successfully. As discussed earlier, nodes in a subnetwork might be running different Internet Computer versions. To avoid incompatibilities between versions, CUPs are fetched from peers over a separate communication channel using an endpoint dedicated to the exchange of CUPs. The peer-to-peer layer is not used for this purpose. That allows the logic for exchanging CUPs to be kept relatively simple, and makes it easier to keep it backward and forward compatible.

Consensus Details

Now that we have seen how the CUP is used to decide which version of the Internet Computer should be running, let’s have a look at how that CUP is actually created.

There are a series of blocks from 27 to 30, and a series of states (in this case, 27 and 28). In block 30, consensus chooses to use a new registry version r, which triggers an upgrade. Consensus now knows that it has to build a CUP at that height referring to the new registry version r, but it cannot currently do it because it did not yet compute the state at height 30.

Before the state at height 30 can be computed, block 30 needs to be finalized. In order to reach finalization on that block, the protocol produces blocks without ingress messages until there is a finalization for a block larger or equal to 30. Those blocks don’t contain ingress messages to avoid further changing the state beyond height 30.

Once a block of height >= 30 is finalized, state 30 can be computed and finally certified. With that, the system has all of the information necessary to build a CUP for that height.

Honest nodes retain their state during upgrades. The new IC version may choose to convert that state as a post-upgrade step, if they wish to, otherwise the state from the previous version can directly be used by nodes after the upgrade is complete.

During the time between the creation of block 30 until the upgrade is executed the user experience is affected. While the CUP for height 30 is created, query calls continue to be available, but no further updated calls can be accepted since nodes of v1 are not allowed to create non-empty blocks after height 30 any longer.

After the upgrade CUP has been signed the subnetwork also becomes unavailable for query calls for a short duration, as the upgrade needs to be installed and VMs rebooted to apply that upgrade.

Overall, the downtime during upgrades of subnetworks is in the order of a few minutes (currently around two minutes).

This CUP can then be used as a handover point between the two versions.

We also need to make sure that artifacts from v1 will not spill over to v2, as otherwise we could end up with multiple non-empty blocks for the same block height, which would be incorrect and possibly lead to a fork in the chain. We address this by annotating each consensus artifact with the protocol version number it was produced with.

Note that there are multiple blocks of the same height here, because the system needs to produce further blocks in the version that it wants in order to reach finalization. However, the blocks in v1 are empty, so they don’t further modify the state.

Conclusion

We have presented a solution to Internet Computer protocol upgrades, which reuses mechanisms that already exist in the Internet Computer. It allows the network to roll out patches in rapid succession, even including protocol changes — all of which can be done with little user-perceived downtime.

____

Start building at smartcontracts.org and join our developer community at forum.dfinity.org.

--

--

DFINITY
The Internet Computer Review

The Internet Computer is a revolutionary blockchain that hosts unlimited data and computation on-chain. Build scalable Web3 dapps, DeFi, games, and more.