tldr; Building a simple responder, for Bitcoin or Ethereum, is hard and sometimes non-trivial. Some projects are trying to build this piece of critical infrastructure and it will be surprising if any of them can get it right.
At PISA Research, we are building financially accountable responders for both Bitcoin and Ethereum.
What is a responder? A third party who is responsible for watching for an on-chain event and then relaying a transaction on the user’s behalf. Other names include watchtower.
If all goes well, responders will help alleviate the user online requirement. In other words, the user can safely go offline knowing a third party will watch and respond to an on-chain event on their behalf.
At first sight, it looks like an easy thing to build. So in this blog post, we’ll cover some of the interesting insights and technical difficulties we have faced ranging from fairly rewarding a responder, managing multiple wallets and even something as simple as storing jobs.
Let’s get started.
Fairly rewarding responders?
When the wider community discusses the concept of a relayer/responder network, full of independent responders, the first reward mechanism that typically pops ups is the on-chain bounty approach.
This vision has three goals:
- Trust-minimisation: The user does not blindly trust a single responder,
- Availability & reliability: Hopefully, at least 1 of N responders, will be online and will respond on the user’s behalf,
- Fair reward: Responders are rewarded for actually doing their job.
If a third party responding network can be pulled off, then it would truly be exciting. Easy to commoditise, decentralised and every day folk can earn money using their home machines.
As you can imagine, there are a few issues with the above approach if it were implemented. To help make it clear, let’s pose some simple questions:
What if the customer hires 22 responders and none of them respond? Touch luck. There is no evidence the customer hired any responders, there is no recourse or refund. The customer will simply lose out and no responder is on the hook. From a user experience perspective that sucks, a lot.
What if the customer hires 22 responders and they all try to respond at the same time? Only one responder is rewarded for claiming the on-chain bounty. The experience differs significantly in both Bitcoin and Ethereum.
In Bitcoin, the only immediate use-case for a responder is to watch replace-by-revocation lightning channel. The great thing about the UTXO design is that the user can send each responder a different pre-signed transaction. If there is a channel dispute, then only one pre-signed transaction will be accepted into the blockchain (i.e. because it is spending an output). All other pre-signed transactions are invalidated and dropped from the network (i.e. considered double-spend attempts).
This has a nice benefit for responders as the user pays the network fees upfront and there are no network fees for failed attempts. Of course, there is still a real financial cost for the responder. They need to pay the bandwidth and storage for the pre-signed transactions, which as we will see, it can be significant in Bitcoin’s lightning network.
In Ethereum, the story is a bit different. It is not safe for a responder to accept pre-signed transactions to relay due to the use of nonces for replay protection. (e.g. it is easy for the customer to mistakenly invalidate it by sending another transaction with the same nonce). Instead, the responder is responsible for wrapping the customer’s desired calldata in a transaction and delivering it to the blockchain.
There are two problems that pop up:
- Early response: No out-of-the-box method to prevent a responder sending the response before it is required.
- Pay for failure: Responder has to pay and manage the fees for a transaction.
The early response problem is bad for the user as the responder can accept the job and immediately submit it. At PISA, we want to build accountable service, so it is crucial the user can easily prove to the smart contract about this bad behaviour (or even better, if the smart contract can just prevent the submission). Check out our previous blog post about this problem.
Although there is a more subtle problem due to the lack of block re-org safety in Ethereum.
If we do not wait for the on-chain event to receive sufficient confirmations, then a miner can toy with the event transaction and the response transaction. What can a miner do? Just re-order them so the response is first. That just isn’t possible in Bitcoin.
What about the pay for failure problem? We think it could hinder the growth of a larger responder network. Why?
If the customer hires N responders, then 1 responder will win the on-chain bounty and the remaining N-1 responders will have to pay a network fee for the failed transaction.
This sucks, a lot. There is a financial risk for a responder as they may have to pay a network fee without getting an on-chain reward. So it is unrealistic to assume responders will perform this role without payment upfront to cover the cost. If this is the case, then the customer will have to pay a fee for every responder (22x fees).
One way to mitigate the above issues is to enforce segmentation of responses (and in fact, it is advocated in this paper).
What is the best way to segment responders, so there is a dedicated slot for each responder to respond?
Generally speaking, every time a new job is sent to the responders, they are shuffled into a list of slots. But there are two points to consider:
- Sufficient time: Responders should have enough time to get their response recorded during their slot.
- Enforcing slots: Is it necessary to have a smart contract verify that a responder completed the job during their slot?
Let’s first consider the sufficient time point. There are some great logistical issues. Should we require responders to only respond during a fixed slot? Should we require responders to respond before a certain deadline and potentially allow collisions? How many slots do we allocate for each job? Especially if the required response time varies (e.g. respond in 1 hour or in 1 day). There is no right answer to the above, its mostly to do with holding responders accountable and verifying they are doing their job.
The enforcing slots aspect is a bit more interesting. The slots can be built into the smart contract such that it can verify if the responder has sent their job during an allocated slot (or before time t). But this may increase the on-going financial cost of running a responding network (i.e. opcodes need to be executed!) and also it requires additional logic in the smart contract which can be tricky to get right.
In the long term, we want to build a responding pool, where the jobs are given to K of N responders, and only one of them have to respond. So far, we are opting for responders to perform a random coin-flip to determine their slot (i.e. potential collisions) without on-chain enforcement. The motivation is to simplify the contract (no need to deal with slots) and to reduce the on-going financial cost (no need to verify slot response).
But it remains an open research question on the best way to approach this problem, especially if all the responders have put down stake (skin-in-the-game) line and they can be penalised for failing to provide a quality of service.
What if a miner is part of the responder network?
As we mentioned earlier, the vision is to build a global responder network, where folks in their homes can participate and stack wei. But if it is an open and free-for-all network, then miners can single-handedly front-run an entire responding network.
In a way, they are the best positioned to perform the role as they can guarantee a quality of service to paying customers.
However this brings up ideological issues…
Should we provide miners with an additional function given they already yield an unreasonable amount of power to order transactions?
To be honest, I don’t know the right answer to that.
Typically in cryptography, it is best practice not to overload the usage of a key (e.g. signing key, encryption key, etc). I feel that principle should be applied here, certainly to uphold censorship-resistance of the network and discourage out-of-band fees.
Managing wallets for multiple responders
A wallet is just a list of keys and each key may (or may not) have a balance on the network. If a responder is responsible for sending transactions to the network (and paying the transaction fee), then there are two key points:
- Topping up up: The signing key used to broadcast transactions should have a sufficient balance on the network to pay the transaction fee,
- Multiple jobs: A responder may have to send two or more jobs using a single signing key. It is crucial that an earlier transaction cannot block a later transaction getting into the blockchain,
- Bumping fees: It is common for pending transactions to linger due to bad fee estimation and sometimes they need a little bump.
In Bitcoin, 1) and 2) are not really issues. The responder only deals with pre-signed transactions (and hopefully, within an encrypted blob) and their only job is to broadcast this pre-signed transaction. Since it is pre-signed and the fee is fixed upfront, then there is no need for any wallet management. As well, all transactions are independent of each other as they are spending a unique output. So there is no issue with handling concurrent transactions (although, afaik bitcoin core only allows up to 25 chained transactions, but not an issue at all for us).
While this simplifies the responder in Bitcoin, it also implies that a quality of service cannot be guaranteed so 3) is a problem. If the pre-signed transactions’ fee is not adequate, then it cannot get into the blockchain. The only way around this issue is to have the customer pre-sign several transactions with an increasing fee, or the customer to allocate an output to let the responder perform child-pays-for-parent (but this is awkward to deal with as we need to write logic that can verify we own one of the outputs and then subsequently spend it).
In Ethereum, the responder must pay the transaction fee. There is a pretty easy solution for 1) to handle topping up wallets (albeit, there is slight gas increase). Our PISA contract can be topped up with coins to automatically refund the responder the transaction fee after each response. That way, we do not necessarily have to keep track of balances in every wallet. However, due to problems with the EVM, it is pretty tricky to compensate the responder/caller the exact amount of gas used. Why? it is not straight-forward to verify the gas consumed by the transaction’s payload size (i.e. the calldata).
2) is a more problematic issue. If our implementation relies on a single Ethereum account (i.e. a single signing key), then all pending transactions must be chained according to the transaction nonce (e.g. nonce =1, nonce = 2) and each transaction must be accepted into the blockchain in order.
Why is this a problem?
If an earlier pending transaction (e.g. nonce = 1) has a low fee, then it will prevent ALL subsequent and dependent transactions getting into the blockchain. This is exactly the issue DefiSaver faced and it resulted in some CDPs getting liquidated.
So it is critical that a responder has multiple accounts that are used to relay transactions. Then it simply becomes a load-balancing problem to support concurrent responses and also to ensure a single account is not responsible for too many jobs. For example, depending on the node configurations, in parity it is 16/1% of mempool and geth is 64.
This brings us to 3) as our responder must bump the fee. The minimum fee bump for a transaction is around 12.5%. So care must be taken not to excessively bump the fee (and annoy our customers), while trying to ensure the transaction gets in the blockchain before the deadline. One nice feature that does arise is that we can re-order the pending transactions (when we bump the fee) so the response/job with the earliest deadline can be first in the queue.
Handling block re-organisation and congestion.
To offer a quality of service, the responder needs to verify the response got in the blockchain and it actually stayed in.
In Bitcoin, block re-organisations and forks are pretty rare. We estimate there is a 1 -block fork every month on two. If that happens, then the transaction should get confirmed, unconfirmed, and then it should eventually get re-confirmed as the transaction ends up back in the memory pool. Although due to restrictions on the memory pool size, this cannot be taken for granted if the transaction is not paying a reasonable fee. We try to check if the transaction is in the memory pool, but really the only way to try and support a quality of service is to periodically re-broadcast the transaction.
In Ethereum, around 6% of all blocks are uncle blocks (i.e. forks where the block was mined, but overtaken by another block). So handling block re-organisation is a priority for any responder implementation. We need to consider how to track our responses in the memory pool and bump the fee if we think it is going to get dropped. As well, we want to ensure transactions get in the blockchain before the deadline so we have had to consider some aggressive fee-bumping strategies.
Of course, the general heuristic is to wait for N confirmations before considering the transaction accepted into the blockchain is a must. For Bitcoin and Ethereum, the heuristic is pretty reasonable, but for less popular coins care must be taken (like the 100+ block reorg in Ethereum Classic). That is one of the reasons we need to be cautious over what chains we will support going forward.
What information that needs to be stored by a responder is mostly determined by the application.
In Bitcoin, the only viable application for a responder is the infamous WatchTower for the Lightning Network. The payment channel construction relies on replace-by-revocation for state updates and thus it has a pretty annoying storage problem. If you want to learn more, check out our cool slides here.
What is the issue? Well in Lightning, there is a single valid state and a set of revoked states. Only one state can be accepted into the blockchain. So if a revoked state gets in, then the responder needs to publish the corresponding justice transaction (‘evidence’) to prove it was indeed revoked. As a result, the watchtower must store O(N) jobs for every channel they are watching. But how bad does this get?
Let’s imagine a user tries to route a payment on the network and there are 13 failed routing attempts. The responder will be required to store ~25 pre-signed justice transactions. Why?
Every time the user tries to set up a new HTLC path and it fails, there are two updates in the channel (to create the HTLC and to remove the HTLC). As a result, we end up with 25 revoked states and 1 valid state (i.e. to remove the final failed HTLC). If every job is ~500 bytes, then it is ~12.5 kilobytes. This quickly blows up as we scale for multiple users and payments.
In Ethereum, there is a big misconception that responders are only useful for off-chain protocols. We are partly to blame, since that is what brought us towards building a responder. But it is the right time to make the message clear, so please repeat it with me:
Responders are useful for 95% of smart contracts* on Ethereum.
Generally, it is O(1) storage for most applications, and a responder is useful when a smart contract involves a two-step protocol (e.g. commit and reveal).
- You need to stay online to watch for CDP liquidation? Hire a responder
- You need to stay online to reveal your bid? Hire a responder
- You need to stay online to enforce a step-loss feature for a decentralised exchange? Hire a responder
This is a big deal.
There are several Ethereum projects that can benefit if there was a responder available for their users. From our understanding, projects tend not to build a responder simply because they need to focus on their core business (and application). It isn’t fun to build solutions for annoying technical problems, even if it were to improve the user experience.
In fact, one notable project I met with recently told me they spent 8-weeks building a simple relayer, never mind a responder. By the way, we will have a relayer on offer shortly :)
So our motivation is to offer a responder as a service. We hope it will speed up the dapp development process via a simple API and it will improve the user experience as users can go offline with full confidence the protocol execution will finish. With our infrastructure, it will be exciting to see the type of applications that will eventually be built on top of it.
Final notes. For something that sounds so simple, it really is a non-trivial beast to build and deploy.
The issues listed in this blog post are just tip of the iceberg. We haven’t discussed fee estimation, crash recovery and persistence, or protection from denial of service attacks. We have tried to cover the main distinctions we have encountered when building for both platforms and it is interesting to see the subtle differences rear their ugly heads and impact the final design of both responders.
Thanks for reading. If you made it this far, here is a motivational video for you to watch.
*we do not know the real percentage of smart contracts, but it is a lot and you are probably building one right now that can benefit!