StarkNet Primer

Published in

Starknet Edu

7 min readJan 12, 2023

This primer is intended to cover introductory concepts upon which Cairo and StarkNet are built. The topics covered in this primer have been dissected in hundreds of ways by thousands of people, so wherever possible I will be linking to those resources. Let’s get to the good stuff.

What are we solving for?

The advent of blockchain technology has given the world computational systems with absolute transparency and inclusive accountability. In order to obtain these characteristics, blockchain systems have been forced to make large trade-offs which impact usability. Vitalik Buterin, summed up this issue in “The Blockchain Trilemma” stating:

blockchains are forced to make trade-offs that prevent them from being decentralized, scalable, and secure

StarkNet attempts to tackle the Blockchain Trilemma and provide a system that is inclusively accountable, decentralized, scalable, and secure through the use of zero-knowledge STARK proofs.

Evolution of Data Security

For a more concrete example of the trilemma, we can move outside of the blockchain context entirely. Say Alice has an important piece of data she needs access to. To start we will represent this data as ASCII characters in YAML format:

alice_account: 5.00

Let’s write it to a file on our computers disk and measure performance:

time echo "alice_account: 5.00" >> bank.yaml

Let’s read that information:

time cat bank.yaml

It’s obviously very fast to read and write this data from our local disk, and powerful database mechanisms can be applied to optimize access to the data, BUT if she drops her computer or gets too close to a large ACME magnet Alice loses her valuable bank account information.

💡💡 Let’s replicate Alice’s account on another computer 💡💡

If we replicate Alice’s bank account YAML file on multiple computers, when one fails we haven’t lost the data!

Sender Questions

How do I locate a receiving host to send to?
How do I know the receiving host successfully wrote Alice’s account data?
If I change Alice’s account value how will the receiving host know to update the same value?

Receiver Questions

Who will I receive data from?
If I change Alice’s account value how will the sending host know to update the same value?

Distributed Systems

These questions form the basis of distributed systems and distributed computing across a network and have been studied since the inception of the internet.

Let’s look briefly at how one of the more popular distributed databases CassandraDB handles these issues.

You can see when configuring the system you are required to whitelist the seed node IP Addresses that will form our trusted cluster that partakes in a limited peer-to-peer gossip. Although this is suitable for many traditional systems we strive to build inclusive and permission-less systems.

Once the distributed database is set up we gain “Fault Tolerance” for Alice’s valuable bank data. If someone accidentally brings their large ACME magnet into one data center, the data is easily accessible on redundant hosts. Similar to blockchains these distributed systems made trade-offs to the simple I/O example above. So what did we give up for this fault tolerance?

Banks Perspective

Network overhead impacts performance
Redundancy and replication impacts performance
Infrastructure maintenance ($$$$)

Alice’s delegates trust to the bank

Database is configured correctly
Operational security can handle attackers or intruders
Is not doing anything duplicitous
Costs typically get passed to Alice

💡💡 Let’s replicate Alice’s account on ANY computer 💡💡

Bitcoin

Bitcoin brings various computer science concepts together with game theory to create a truly peer-to-peer network and negates the need to delegate our trust to a central party.

The nodes trust the block producer based on its valid proof of work and the network collectively agrees on a set of canonical updates to the state of the Bitcoin ledger and the state of Alice’s account.

git clone https://github.com/starknet-edu/basecamp.git
cd basecamp/primer/bitcoin/proof_of_work/python
python main.py

The Bitcoin nodes themselves listen for and validate blocks of transactions that are broadcast to the network by the miner of that block. They form a data structure called a Merkle Tree to obtain a root hash corresponding to all the transactions (and their order) in that block. If one tx changes by even a single bit the Merkle root will be completely different.

# block verification example
cd basecamp/primer/bitcoin/block_verification/go && go mod tidy
go run main.go utils.go

Alice’s information gets formatted as a UTXO and is replicated on all of the nodes on the Bitcoin network. She can even validate that everything is accurate herself by rehashing the Merkle tree of every block of transactions from genesis to now.

🎉🎉 NO DELEGATION OF TRUST 🎉🎉

Let’s revisit the trilemma. What did we give up to get this trustless data security?

Miners expend energy as they attempt to get the nonce
Full trustless verification requires EACH node to replicate the canonical state by hashing the merkle tree of txs -> hashing the block headers

Full Node Size: ~405GB

For a naive demonstration of “The Evolution of Data Security” run the following:

cd basecamp/primer/bitcoin/block_verification/go && go mod tidy
go test ./... -bench=. -count 5

Smart Contracts

Smart contracts were first proposed by Nick Szabo as a transaction protocol that executes the terms of a contract, giving all parties transparency into the rule set and execution. Bitcoin facilitates a limited version of smart contracts, but the expressive smart contract model of Ethereum has been more widely adopted.

Ethereum

Ethereum provides a platform to implement these smart contracts with the use of the Ethereum Virtual Machine. In the Ethereum paradigm, Alice’s bank account information is stored in a 20-byte address called an account. Her account balance along with a few more fields (nonce, storageRoot, codeHash) becomes a “node” in a data structure called a Patricia Trie where PATRICIA stands for “Practical Algorithm to Retrieve Information Coded in Alphanumeric”.

This Trie is a specific type of tree that encodes a key as a path of common prefixes to its corresponding value. So Alice's Bank Account can be found at an address("key") that points to an account ("value") in Ethereum's World State (trie). The tree structure of the trie allows us to obtain a cryptographic hash of each node all the way up to a single hash corresponding to the root similar to the Merkle tree we saw in the Bitcoin block verification.

For an example of the MPT data structure you can use this diagram for reference:

and run the following:

cd basecamp/primer/ethereum/block_verification/go && go mod tidy
go run *.go

Ethereum then propagates its state by verifying transactions are well-formed and applying then to accounts. Alice has a public/private key pair to manage her “externally owned account” and can sign transactions that involve her balance or involve interacting with other contracts in the state.

In addition to EOAs Ethereum has “contract accounts” which are controlled by the contract code associated with them. Every time the contract account receives a message the bytecode that is stored as an RLP encoded value in the account storage trie begins to execute according to the rules of the EVM.

Trilemma visit: what did we give up to add expressivity?

Every transaction still needs to be processed by every node in the network.
With the addition of world state storage the blockchain can “bloat” leading to centralization risk
Alice may pay $100 to use the money in her account

Full Node Size: ~700 GB

Archive Node Size: ~10 TB

Rollups

As demand for block space increases the cost to execute on Layer 1 (full consensus protocols e.g. Bitcoin, Ethereum) will become increasingly expensive, and until certain state expiry mechanisms are implemented we can expect the state of the L1 to continue to bloat over time. This will require an increasingly robust machine to maintain the state and subsequently verify the blocks.

Rollups are one solution in which business logic is executed and stored in a protocol outside the Ethereum context and then proves its successful execution in the Ethereum context.

Typically this involves compressing a larger number of transactions at this Layer 2 and committing the state diffs to a smart contract deployed on L1. For full interoperability with the L1 rollups also typically implement a messaging component for deposits and withdrawals.

There are currently two types of rollups that are being widely adopted:

Optimistic Rollups
Zero-Knowledge Rollups

Vitalik provides a good comparison of the two here, and touches on the final pieces of our long trilemma journey:

No matter how large the computation, the proof can very quickly be verified on-chain.

This allows Alice to move her money freely between L1 and L2 (…soon to be L3) and operate on a low-cost blockchain layer. It allows devs to freely deploy their business logic with the expressive and PROVABLE Cairo programming language. All while inheriting the highest form of data security evolution from the L1 and not having to delegate trust to any centralized party!