PoCo Series #3 — Protocol update
What is PoCo and why is it needed?
iExec is building a decentralized Cloud platform where application providers, dataset owners, owner of computing resources (workers) and users can value their assets using the RLC token.
While the platform can be compared to existing ones, the fully decentralized nature of iExec implies that no single agent needs to be trusted and that those agents require incentives to contribute correctly. In this context, PoCo (proof of contribution) is a protocol, developed by iExec, which describes the interactions between the different agents and leverage features such as staking to build the required incentives.
The first episode of this series gave an outline of the protocol. In the second episode, we had a deeper look at the consensus mechanism, where we discussed the use stake and the (non-) effectiveness of different attack scenarios. In this third episode, we will describe an updated, more detailed, version of the protocol along with all transactions involved in achieving consensus.
What is PoCo’s place in the iExec platform?
The iExec platform requires two entities in order to work:
- A marketplace where agents propose their resources and where deals are made using the RLC token.
- A distributed computing infrastructure based on the middleware XtremWeb-HEP.
PoCo acts as a link between those two entities. When a deal is sealed, PoCo initiates a consensus which will validate the different contributions made by workers in the middleware. When consensus on the result of the computation is reached, PoCo triggers the relevant transaction which takes place in the marketplace.
As stated previously, different agents have different roles and different incentives. Before describing the protocol itself lets first list those agents:
- Workers: They are individuals or companies who own computing resources and are willing to make them available for the computation of tasks against payments in RLC. Similarly to blockchain miners, they want a simple solution that will make their computer part of a large infrastructure that will take care of the details for them.
- Worker pools: Worker pools organize workers contributions. They are led by a scheduler, who organises the work distribution. They can either be public and federate resources from anyone or private and try to optimise the management of specific hardware. While not doing the actual computation, they receive a fee for the management of the infrastructure. They compete to attract workers, which they do by achieving an efficient management which guarantees the income of workers.
- App providers: They deploy applications on the iExec platform. Those applications can be Dapps using the full potential of blockchain-based decentralized Cloud or legacy applications which could benefit from the iExec decentralized Cloud. They can make their applications available for free or ask for a fixed fee for each use of their application.
- Dataset providers: They own valuable datasets and are willing to make them available, in a secure paradigm that protects their ownership, against payments in RLC.
- Users: They are individuals or smart contracts paying for the execution of tasks, with or without specific datasets, using the computing resources of workers. They want to make sure that the results they receive are correct.
- The iExec Hub & Marketplace: This is a smart contract, deployed by iExec, and without privileged access. It acts as an escrow for the different agents’ stake and ensures the security and transparency of all transaction in the iExec ecosystem.
The iExec Hub & Marketplace decentralization, security and confidence are ensured by the blockchain technology. All others agents are considered as potentially malicious. The design of PoCo’s oversight of all transactions between the agents is done in such a way that it creates strong economic incentives to behave correctly. This makes iExec much more than other conventional Cloud providers by giving it the capability of organising a trusted computing platform on top of an infrastructure of untrusted agents. Not only is this trust building process an interesting feature to have, it is essential to providing any result to blockchain users and smart contracts.
What would a “nominal” execution look like?
Building the environment
Prior to a user jumping in and requiring some computation to be performed, the environment must first be populated with the relevant smart contracts.
I. The first step is the deployment of the iExec Hub and Marketplace. This is a smart contract, on the blockchain, which is linked to the RLC smart contract. It will be the entry point for the following transactions.
Why do we need this: the iExec Hub and Marketplace are the heart of the network. They will manage the stakes and keep track of actors history. This (auditable) smart contract, without any privileged user, ensure the stability and continuity of the iExec network.
II. At this point, all actors can deposit RLC on the iExec Hub. Funds deposited on the iExec Hub can be locked for when staking. This is also where all rewards are deposited. Funds that are not actively staked (locked) can be withdrawn at any time.
Why do we need this: Staking is a key part of the PoCo. The iExec Hub acts as a public escrow, which ensures transparency and security over the stakes.
III. The worker pool is created and owned by a scheduler by calling the createWorkerPool method on the iExec Hub. This method will create an instance of the WorkerPool smart contract, owned by the scheduler, which is where is scheduler will be able to set relevant parameters as well as managing his workers.
Why do we need this: The WorkerPool smart contract stores scheduler specific settings, which are required in order to handle the stake and payments. It also stores all information relative to tasks executed on this worker pool and is used as entry point for the workers’ transactions.
IV. Workers wanting to join a workerpool can call the subscribeToPool method on the worker pool’s smart contract. This will verify that the worker follows the requirements set by the scheduler (minimum stake and reputation).
Why do we need this: In order to avoid sybil attacks we have to make sure no attacker can flood a workerpool with malicious identities. This is prevented by requiring the workers to have a minimum stake and/or reputation. At this point some stake will be locked. This stake cannot be seized by anyone, and the worker can unlock it at anytime (by unsubscribing). Even If the worker is evicted by the scheduler (presumably because of a bad behaviour) its stake will be unlocked.
V. Application providers register their application by calling the createApp method on the iExec Hub. This method will create an instance of the App smart contract, owned by the application provider, which contains some parameters of the application. Among those parameters, the application cost cannot be modified. This cost is a fixed amount which will be awarded to the application owner for each execution of the application.
Why do we need this: The App smart contract stores information, such as usage cost and the owner’s address, which are required in order to handle the stake and payments.
VI. Similarly, dataset providers register their application by calling the createDataset method on the iExec Hub. This method will create an instance of the Dataset smart contract, owned by the dataset provider, which contains some parameters of the dataset. Among those parameters, the dataset cost cannot be modified. This cost is a fixed amount which will be awarded to the dataset owner for each use of its dataset.
Why do we need this: The Dataset smart contract stores information, such as usage cost and the owner’s address, which are required in order to handle the stake and payments.
Making a deal in the marketplace
1When a worker pool has enough workers ready to run some computation, it will create market orders in the marketplace using the method createMarketOrder. This market order describes the category of resources they are offering as well as the trust level used by PoCo to certify the results (which has an influence on replication and therefore cost). The market order will be valid for a specified number of user request (called work orders).
Why do we need this: We want the schedulers to ensure a minimum income so that miners and other hardware owners can become ressource providers on the iExec platform knowing they will make a profit. We also want competition between the worker pools so that user can get the best price possible.
2A user with a request (work order) will select, in the order book, the offer he finds most relevant. This is a choice between the resources he required and the price he is willing to pay. By calling the method buyForWorkOrder, a user can answer a specific scheduler’s ask and provide the details of the required execution. This function both closes a market deal, which is an agreement between a user and a worker pool, and instantiate the work order with the app, dataset and parameters that the worker pool will now have to run.
Why do we need this: A deal is a commitment from both sides (scheduler and user). Stakes are locked by the iExec Hub and an instance recording the advancement of the work is created to ensure traceability.
At this point, PoCo locks the stakes and initializes a consensus instance, specific to this work order.
Computation results and consensus
3Now that a work order has been emitted, the workerpool is in charge of providing a valid result to the user. The first step toward having a valid result is the designation of workers. The scheduler will have to call the allowWorkersToContribute method with the reference of the work order and a list of workers authorized to contribute to the corresponding consensus.
(Note: when using Intel SGX challenges for extra security, this is the moment where the cryptographic challenge is initialized).
If at any point in this section the scheduler believes more contributions are required to achieve consensus, he can call the allowWorkersToContribute method in order to summon more workers.
Why do we need this: In order to avoid attacks, we have to prevent multiple workers from coordinating and purposively submitting erroneous contributions to a work order’s consensus. As discussed in the previous article of this series, the workers participating in a consensus must be selected randomly. The scheduler is responsible for this random selection. This decision is recorded to avoid any conflict between scheduler and worker.
4Workers which have been selected in by the scheduler have an on-chain proof of their involvement. This grants them access to any of the relevant datasets. After having run the application with the parameters required by the work order, they have a result archive (RA) containing the data produced (stdout, stderr, output files …). This result archive will not be pushed on the blockchain at any point but will be represented by its hash R = sha3(RA).
In order to submit its contribution to the consensus, the worker will submit a ‘signature’ of the computed result using the contribute method. This ‘signature’ consist of two values:
- H = sha3(R),
- S = sha3(R ^ sha3(WorkerAddress))
H is a representation of the result R that is identical for all workers which have computed the same RA. S is used as a signature. It prevents workers to just publish values of H they see being submitted by other workers. Once R is revealed (later), it will simply help check that the worker knew the value of R contributed to before it was revealed. S cannot be copied from another worker as it would fit this worker’s identity (address).
(Note: when using Intel SGX challenges for extra security, the user will also have to provide cryptographic proofs that the values of H and S submitted here were produced by an enclaved iExec worker).
Why do we need this: We have to ensure workers have actually done the work when submitting results. This mechanism ensures that in order to make a profit, workers have to know R at the time of their submission. Therefore, it prevents them from simply copying another worker’s contribution. Since the first episode of this series, we realised that using the hash of the worker address is as secure and easy to check while avoiding the need of a random seed generation and the issues coming from eventual collisions.
5Once enough contributions have been made, and that consensus among them has been reached according to the Sarmenta’s formula, the scheduler locks further contributions and reveals the consensus value (H) using the method revealConsensus.
Why do we need this: The scheduler is in charge of orchestrating the consensus. As the Sarmenta’s formula is too complex to be executed in a smart contract, the scheduler computes it and reports the results. Again, those results are public and auditable.
6Workers are now summoned to provide, using the reveal method, a value of R that matched both the value of H that achieved consensus and the value of S which they provided when contributing. Failure to do so will prevent them from gaining any reward for their work.
Why do we need this: This is the second part of the 2 steps contribute/reveal. This step confirms the validity of workers contributions at step 4.
7 When all contributions made to the consensus have been validated, or after a certain amount of time to avoid deadlock from non-responding workers, the scheduler closes the consensus by calling the finalizeWork method. This call unlocks all stakes and rewards / penalize all actors depending on their role and contributions. The user knows has a fully auditable proof that consensus has been achieved on the result he receives. Optionally a smart contract specified by the user can be called to inform of the results availability.
About application determinism
For consensus to be achieved, the value R submitted by different workers following a correct execution the same work order must be identical. This is straightforward when dealing with deterministic applications for which RA is consistent between runs. On the other hand, non-deterministic applications, for which RA can legitimately be different between runs would cause all sorts of issues. Such application will not be supported by iExec’s version 2 and will be addressed later. There are many approaches to solving this and making PoCo compatible with those applications. This should be further discussed in another article of this series.
How does PoCo handle non-nominal cases?
While the nominal case is straightforward, we must also design solutions for each potential mishap or malicious contribution during PoCo. While orders in the marketplace can be cancelled without penalty, once a deal has been closed between the parties and stakes have been locked there is no turning back.
- Protection against malicious workers is ensured by the allowWorkersToContribute (which prevents any worker from joining) and the 2 steps contribution/reveal process which has been discussed earlier.
- A timer enforces a time limit for the workers to call the reveal method. This ensures fairness (a scheduler cannot finalize the consensus too fast, penalizing workers) while preventing any worker from blocking the consensus.
- If at least one of the workers who made a contribution toward the reached consensus has published the corresponding result, this result is recorded and will be locked when finalizing the work order, regardless of other workers calling the reveal method. While one could say that the other workers not proving their initial contribution was valid negatively impacts the consensus, there are two arguments for our approach:
→ The use of allowWorkersToContribute prevents coordinated attacks. Even if an attack on the consensus manages to go through the allowWorkersToContribute (in case of a 51% attacks), attackers would make more profit by all revealing the same result rather than letting the scheduler take a percentage of all seized stakes. Therefore non-revealed contribution assumed not to be a form of attack. As there is a strong economic incentive against such contribution we can assume they are caused by unstable workers. We strongly believe that as long as workers responsible for those contributions are not rewarded and see they stake seized (like for bad contribution) they can remain valid in the computation of Sarmenta’s formula.
→ The result has already been published, and reopening the consensus for new contribution is not possible has the hypothesis of the 2 steps contribute/reveal doesn’t hold anymore.
- If none of the workers have revealed a valid result during the reveal period, all workers who have contributed to this consensus (and were supposed to reveal) see their contributions invalidated and their stakes seized. The scheduler can then reopen the consensus. At this point, the scheduler can either call in new workers or initiate the reveal of another result depending on Sarmenta’s analysis of the remaining valid contributions.
- When a deal was closed in the marketplace, the scheduler committed to providing a validated result in a given timeframe. If the scheduler cannot manage to achieve consensus by the end of this timeframe, the user can redeem the failed consensus. He and all the workers who have already contributed get their stakes back and the scheduler is penalized.
- If a worker believes a scheduler is not managing its pool correctly, it should leave and go work for another one, which it can do without losing the reputation it got working for it. As the reputation is stored in the iExec Hub, and is associated to a wallet rather than a physical machine, a worker can move around the network while keeping the reputation gained earlier.
- If a scheduler believes a worker is malicious, it can ban it from its pool. Still, banned workers keep all their rights on the consensus they were already called to contribute to.
Why is blockchain essential to implement PoCo?
In the first article of this series we discussed the basics of the protocol, and how to build trust on the result of a computation using public channels. A key element for this validation scheme to work is to actually ensure all contributions public and auditable. We also want all transactions to be triggered, based on the validity of the contributions, by an independent and decentralized entity.
For all these reasons, we believe the iExec Hub, which stores all information, should run on-chain and should not be controlled by any single entity. As building reputation is a strong incentive, with financial consequences for the worker, we want the workers to be confident that no one will tamper with their reputation, and that they will be able to keep it regardless of what the scheduler they used to work for becomes. Also, we think that using the iExec Hub as an escrow is the only way to guarantee the safety of all the staking required by PoCo.
Of course, this use of blockchain transactions and smart contracts has a cost, which we are currently benchmarking and optimising. In the long run (after v2) it would make sense to move to more scalable and cheap options such as a sidechain or a private blockchain.
PoCo and governance in the iExec platform
PoCo defines a set of communication patterns that the actors in the iExec platform are to follow. They have been built in order to achieve consensus on off-chain computation and rely on the properties of smart contract enabling blockchains (we need the iExecHub to have the right property) as well as on the economic incentives of the actors (bad behaviour comes at a high cost). Still, it is important to notice that PoCo also assumes that actors will be active and responsible individuals.
Some actions taken by actors in the iExec platform cannot be disputed, and no on-chain court is implemented. However, anyone has the ability to watch the relevant transactions, openly recorded in the blockchain, analyse them, and see if actors behaved badly. The social reputation of actors such as the schedulers is key to our platform. If a scheduler was to make bad decisions, users would be able to notice it and, as a community, stop using the services it offers. Applications could also publicly recommend it’s users not use such scheduler. With the workload decrease, workers would gain less and would end up working for another worker pool with a better reputation.
Blockchain technology offers mechanisms for decentralized trust and asset security. Still, the freedom to choose which scheduler to work for, or which scheduler to buy computation from gives a lot of power to the community. Rather than removing any sense of responsibility from the users of our platform, we put them in a situation where their individual choices really matter and contribute to shaping the world of decentralized Cloud.