Transparency Report

Regarding the SpankChain Hack on October 6th, 2018

Hunter Hillman
Connext
15 min readOct 18, 2018

--

Table of contents

I. Motivation

II. Introduction and Background

III. First Response

IV. Recreating the Hack

V. Results of Our Internal Review

VI. Changes Made

VII. Conclusion

I. Motivation

This report was prepared by Connext, SpankChain, and Kyokan in response to an attack on the payment channel hub in operation on SpankChain’s adult camsite that occurred on October 6th, 2018.

We are committed to providing an open account of the hack, how it happened, why it happened, and what can be done to prevent it in the future. We launched a thorough investigation of our code and our processes, which we will share in this report.

Our quality assurance and review processes must reflect our commitment to transparency, reliability, and security. Now and in the future, we are dedicated to improving our processes and our code to ensure that our technology fulfills its promise to create a better and more equitable payment system.

Please refer questions on this report to Connext and SpankChain’s respective community chats.

II. Introduction and Background

The hack occurred around 6pm PST on October 6th, 2018. The anonymous attacker drained 166.97 ETH (~$38k USD at time of hack) from the contract and trapped ~$4k of SpankChain’s dollar-pegged stablecoin (BOOTY) in the contract. Of those funds, 34.99 ETH (~$8,000 USD) and 1,271.88 BOOTY belonged to SpankChain users (~$9,300 USD in total).

The breach went unnoticed until 7pm PST on October 7th, 2018 because the SpankChain platform was being actively investigated for other, unrelated bugs. At that point, SpankChain’s camsite was taken down until further notice. See Section II (First Response) for information on the initial steps taken in response to the hack.

On October 11th, 2018, SpankChain was given access to the stolen funds via a private key, in return for a $5,000 bug bounty and the 5.5 ETH that the attacker used as a seed for the attack. While this is certainly a happy ending, we are committed to providing transparency with regards to the hack, its causes, and the changes that we are implementing to ensure that this never happens again.

We have all learned a hard lesson, but we wish to reassure our communities that our pecuniary duty to our users is our utmost priority. This report details the steps that we have taken, and are taking, to secure our processes and our code.

III. First Response

Because the SpankChain, Kyokan, and Connext teams were working through the deployment of major system upgrades, the hack went undiscovered for 25 hours.

Upon discovering the hack, SpankChain took Spank.live, their camsite platform, offline while we investigated what had occurred. On Monday the 8th of October, after getting a general idea of how the attack had occurred and determining the balances owed to the members of the community, SpankChain and Connext posted in their respective Discord servers disclosing the hack and the steps that had been taken so far. SpankChain subsequently posted on Medium providing more details.

In the course of our investigation, we produced the following chart showing the path of transactions from our contract to addresses controlled by the attacker. As you can see, the attacking address (0xcf26) created smart contract 0xc591 which used a reentrant call on SpankChain’s LedgerChannel.sol smart contract (0xf915) to drain 170.22 ETH (which includes 6.46 ETH used to seed the attack). The attacker deployed a second smart contract, 0xaaaa, which went on to drain a further 3.23 ETH (including 1.62 ETH used to seed the attack).

See accounts-graph.sql for the query used to generate this data.

After verifying that no attacker-controlled addresses existed in our database of off-chain state channel transactions, we determined that 34.92 ETH and 1,258 BOOTY were owned by users at the time of the attack, and 121.77 ETH and 2,695 BOOTY were SpankChain funds collateralizing the contract (see: ledger-channel-balances.sql)

The 16 ETH discrepancy between the 173.45 ETH transferred out of our smart contract, and the 157.42 ETH accounted for in our database comes from two places: first, 11 ETH (including both attacker and user ETH) was sent to our smart contract after our blockchain poller was taken offline. Second, there was a bug in an earlier version of our off-chain software which resulted in a 5 ETH discrepancy (this bug was fixed, and the small amount of lost user funds were reimbursed at the time of the fix).

Following these investigations, we started an internal review of all contract/test code and our security processes with the goal of providing transparency to the community on the changes that needed to be made to our system. We also outlined a plan to recover the remaining BOOTY that was trapped in the contract.

Then, in an interesting turn of events, SpankChain was able to contact the hacker and convince them to trade their black hat for a white one. They agreed to return all the funds used in the attack, and in exchange SpankChain rewarded them with a $5,000 bounty and the 5.5 ETH that they used as seed funding for the attack. But that’s not all — the hacker also described a second attack vector which would allow them to drain the ~4,000 frozen BOOTY. SpankChain offered the hacker the full price of $4,000 BOOTY if they could get it back within 24 hours. The hacker, working swiftly, managed to get all the BOOTY back for SpankChain within just one hour, and received a second reward of $4,000 USD in ETH.

After recovering the funds, we continued our internal investigation to identify all potential vulnerabilities. The following sections summarize the results of that investigation.

IV. Recreating the Hack

After assessing the value of stolen funds and notifying the SpankChain and Connext communities, we began a comprehensive study of the attack vector and were able to recreate the hack. Here, we describe the attack vector and its exploitation in detail.

The attacker used a reentrant call to LCOpenTimeout to drain funds from the contract. LCOpenTimeout is a function that allows a user to exit a channel in the event that a Hub does not automatically join it. After waiting the timeout period (set when calling openChannel), the user can call LCOpenTimeout which transfers the user’s ETH and tokens back to them.

The exploited contract code for LCOpenTimeout is shown below:

It is important to note that LCOpenTimeout calls the ERC20 transfer function of the token that the channel is opened with and that the channel state is only deleted after this transfer function is called. createChannel() takes in a channelID, partyI (hub address), initial ETH/ERC20 balances, a confirmTime and a token contract address as parameters. The function is shown below:

The attacker created a new channel with their own ERC20 contract and with a confirmTime of 0. This allowed the attacker to create the channel and call timeout atomically, ensuring that the Hub could not join the channel before the hack was executed. The attacker’s contract had a malicious token transfer function which also called LCOpenTimeout with the same parameters. The malicious contract recursively called LCOpenTimeout in the same call as the previous timeout function. Because the channel state in the contract was not deleted until after the ERC20 and ETH transfers took place, a recursive call to LCOpenTimeout would have used the same channel state as for the open channel. This would mean that LCOpenTimeout would have been called repeatedly, draining the same ETH value each time.

Our recreation of the malicious ERC20 contract:

Let us step through how this occurred:

  1. Attacker creates a Hack.sol ERC20 contract with a malicious transfer function along with a drainFunds function which atomically calls both createChannel and LCOpenTimeout in our LedgerChannel.sol contract.
  2. Calling drainFunds creates a channel in our payment channel hub using a confirmTime of 0, and the malicious Hack.sol contract. This function would have also deposited some ETH and “tokens” into the channel. For example: 5 ETH, 1 token.
  3. drainFunds would have subsequently also called LCOpenTimeout, passing in the channel ID of the channel that was created in the previous step.
  4. Our LCOpenTimeout function executed the following checks:
  5. Is the caller, msg.sender, the same as the creator of the channel? (Yes)
  6. Is the channel open? I.e. has it already been joined by the Hub? (No)
  7. Has the timeout expired? (Yes)
  8. After passing those checks, LCOpenTimeout transferred the party’s ETH Balance (5 ETH) back to them. Then it attempted to transfer its own 1 token out of the contract.
  9. Our LedgerChannel.sol contract calling transfer here called the malicious function of the same name on the attacker’s fake ERC20 contract. That function would have once again called LCOpenTimeout with the same parameters.
  10. Because this all happened in the same call and because the channel balance data was not deleted until after the ERC20 transfer, the second call of LCOpenTimeout would have occurred with our LedgerChannel.sol contract once again thinking that the balance owed to the user was 5 ETH and 1 token.
  11. And, upon reaching the ERC20 transfer in the second call, the process repeated.
  12. This loop continued until our LedgerChannel.sol was drained of ETH. The attacker checked the balance of the contract on each transfer call to make sure that there were still funds to be withdrawn. Otherwise, a failed transfer would have caused the whole drainFunds function to fail and all balances to be reverted.

V. Results of Contract Review

After recreating the hack, we conducted a line-by-line internal audit of our contract to find other vulnerabilities. A full recreation of all vulnerabilities that we found has been posted in this repository, and outlines of each bug can be found below. We looked for other re-enterable functions, vectors where malicious parties would be able to call functions out of order to leave channels in an unrecoverable state, and double spend attacks. We also looked for other behaviors which, although not necessarily beneficial to an attacker, were outside of the intended use of the contract. This section details all of our findings.

  1. [Original Vulnerability] Reentrancy on LCOpenTimeout

See Section IV above.

A full recreation and remediation can be found here.

2. Parties Able to Create Channels with Themselves

We found that because users were passing their own hub address in as a parameter, they were able to create channels with themselves. The hub address was originally left open to users in order to allow for the contract to be used for “normal” channels without needing to go through the Hub if a user wished. However, by not checking the hub address and letting users create channels with themselves, we opened up a path for attackers to put their channels into an unexpected state.

Further, this allowed the attacker to capitalize on other outlined bugs within the contract (see 4 and 5). This can be remedied by adding a require statement to check against this, or by allowing hubs to deploy contracts with designated signing and wallet addresses within the constructor.

A full recreation and remediation can be found here.

3. Hub Autojoining Channel with Malicious Token

The contract did not allow for the hub to specify which tokens it would process, and the hub was designed to listen to DidLCOpen events and join the channel without verifying the token address. This allowed for an attacker to force the hub into a channel with a malicious token. A malicious token could have taken advantage of other bugs within the system, notably 6. Reentrancy in byzantineClose.

A full recreation can be found here.

4. joinChannel Reentrancy

This is the attack vector that was used to drain the contract of ERC20 funds.

An attacker could create a channel with ETH and a small deposit of ERC20 tokens they wanted to drain. The channel could then be joined by a partyI address controlled by the attacker, where they would generate the appropriate consensusClose signatures and parameters. This partyI address could be the same ETH account as partyA (see 2), or it could be a separate account controlled by the attacker Once the channel was closed, the attacker could rejoin the closed channel and submit new parameters to the consensusClose function, thereby draining the contract of the ERC20 funds.

A full recreation and remediation can be found here.

5. deposit Doublespend

An attacker using a channel that they have opened with themselves using the same ETH address would have been able to doublespend their deposit. By calling the deposit function with ETH, the contract would have registered the msg.value for both sides of the channel, effectively recording double the amount of value that was originally sent to the contract. Then, the attacker could have closed the channel to get back twice what had originally been deposited and repeat the process until the contract was drained.

This worked because the deposit function used two if statements instead of an if/else. This had originally been written this way to allow for deposits to both sides of the channel at the same time, though we now realize that this is not a common enough use of the contract to warrant increasing the attack surface of this function.

A full recreation and remediation can be found here.

6. Reentrancy in byzantineCloseChannel

A reentrancy vulnerability in the byzantineCloseChannel function allowed an attacker to drain contract funds using a malicious token. A token contract could have been created that recursively called our byzantineCloseChannel function from its ERC20 transfer call. Since the onchain balances would have been zeroed out by this point, the attacker would not have been able to continuously drain contract funds. However, at the start of byzantineCloseChannel, any channel deposits added would have been moved from the deposits array into the final balances, meaning attackers would have been able to withdraw twice as much as their deposit.

A full recreation and remediation can be found here.

7. SafeMath

This version of the contracts was not using SafeMath. As such, it was possible to push the contract into unrecoverable or unexpected states with underflow errors. In our contract, the confirmTime was exploited by overflowing it so a comparison to the current block time would always work. (An example of overflow behavior in solidity can be found in this ETHFiddle.)

A full recreation and remediation can be found here.

8. Indisputable State on deposit

If attempting to dispute a channel after depositing, we found that users would have been pushed into a state that would not be able to be successfully disputed. Because updateLCstate checked balances but did not reflect the deposits into the channel, any offchain signed state updates would have been invalid according to the dispute functions. Parties would have been able to exit using a consensusClose, but would naturally be at the mercy of their counterparty.

A full recreation and remediation can be found here.

9. Indisputable State on joinChannel

If the counterparty to a channel was to go offline before signing an update at all, i.e. at the ‘0’ state of a joined channel, the channel would have been put into a similarly unrecoverable state. This is because all of the dispute methods require at least one double-signed update to be submitted.

A full recreation and remediation can be found here.

10. Unexpected State when Calling LCOpenTimeout after Closing a Channel

This vulnerability was a result of the reentrancy errors in the contract’s consensusClose function. The attack vector allows users to call LCOpenTimeout after a channel is closed. LCOpenTimeout primarily relied on the isOpen flag to verify the channel was not already in use. This check did not prevent a user from calling this function on a channel that was intentionally closed by the user.

This attack would have allowed the user to doublespend their initial channel deposits put in to the createChannel function. The doublespend would only be able to be executed once, as the LCOpenTimeout function deletes the channel before completing.

A full recreation and remediation can be found here.

VI. Changes Made

The hack was a failure of both our code and the process that led us to put flawed code in production. Accordingly, we conducted a thorough analysis of our communications, code review, and quality assurance processes. We identified three problem areas:

  1. Lack of clear communication regarding sections of the codebase

The protocol was designed and implemented collaboratively between Finality Labs, Connext, SpankChain, and Kyokan. In the couple of weeks leading up to launch, Connext assumed responsibility for writing all of the contract tests. Because of the rush to deploy offchain infrastructure, we (Connext) ended up being the only stakeholder to run and review the tests. As such, we take full responsibility for the failure to ensure that proper peer reviews had been conducted by all parties.

2. Insufficient review of changes

In addition to a lack of clear communication, the QA process of the smart contract was insufficiently formalized. More structure, more reviews, and more rigor were needed.

3. Lack of a professional external audit

A mutual decision was reached between Spankchain, Kyokan, and Connext to forgo an external audit. The rationale for this decision was that given the high cost of an audit (quoted at ~$50,000 USD) relative to the funds held by the contract (~$38,000 USD) it did not make economic sense to conduct one, especially because the code was only intended to stay in production for ~1 month while we iterated. In hindsight, protecting our reputation and retaining the trust of our users would have justified the cost of an audit.

To address these issues, we’re making the following changes to our processes:

  1. We will clearly delineate working groups for each discrete section of the codebase: contracts, hub, and client. All of our contract development will immediately be moved into an open source repository under Connext.
  2. When one group makes a change to another group’s code, they will submit a PR, which will undergo a standardized and documented peer review process.
  3. Each group will be responsible for reviewing their own changes in addition to any changes that have occurred in dependencies. Because the contracts are at the bottom of the stack, this means that all changes are iterative and will be reviewed at least three times before they are finalized.
  4. We will implement a continuous integration pipeline for all changes to the contract. Any new features added must have corresponding test cases which include failed attack vectors, wherever possible.
  5. The contract will undergo a maximum of one update to the master branch per month, emergency changes notwithstanding.
  6. We will do at least one professional external audit whenever substantial contract changes are made (in addition to continuous automated checks), and internal peer reviews for additional changes.

Because of the extent of the vulnerabilities that we found, we will also be rewriting our contracts from scratch to simplify and secure the codebase. While the underlying framework is still secure, we have learned a lot about user behavior within the last two months which has informed us of the tradeoffs that we can make with a restrictive/safe vs permissive/vulnerable design. We will use this data, in addition to the valuable security lessons we’ve learned, when building the new contracts.

VII. Conclusion

We strongly value transparency, reliability and community. As such, all three entities (Connext, Spankchain, and Kyokan) are committed to understanding how this happened, why it happened, and what can be done to prevent it in the future.

We undertook a rigorous investigation of our code and our processes, and identified several problem areas we will be addressing. We have put new structures in place to ensure responsibility over certain sections of code is clear, code receives multiple internal audits, contract iterations happen on a monthly cadence, and a professional external audit is conducted prior to any contract changes.

We are confident the changes we are making to our contract and practices will provide outstanding security for our users; however, we are also committed to ongoing review and improvement of our processes. We recognize that, as a technology that supports value transfer, security of user funds is paramount; our quality assurance and review processes must reflect that.

Moving forward, we will adhere to industry-leading quality assurance standards. While the systems we are building are trust-minimized, users do place faith in the underlying code. On this occasion, we did not live up to our own standards. We are dedicated to improving both our systems and our code to ensure that we fulfill our duty to our users and our communities.

Get in touch with us by:

And lastly, please share this with your friends in the community. We’re really excited to help projects in the space scale to the mainstream market!

Thanks for your support,

Team Connext

--

--