Reverse Engineering Ethereum Smart Contract: Let’s Talk Assembly

Jonathan LEI
3 min readFeb 3, 2018

--

Back in 2009, Bitcoin was invented as a completely decentralized currency. We all appreciate such a huge advancement in terms of, well, money. However, it is not uncommon to argue that the underlying blockchain technology is even more important than currency itself, and I agree. Indeed, in recent years, we’ve been witnessing many blockchain projects popping up trying to change our lives.

I personally think that the single most important revolution of the blockchain technology is the invention of smart contracts by Ethereum. True, even before Ethereum smart contracts you could already build simple applications by using scripts in Bitcoin. However, scripting in Bitcoin is rather painful and limited.

In Ethereum, smart contracts are Turing-complete. Probably more importantly, it’s got a high-level language — Solidity. Well, we all know Solidity is still evolving and far from perfect. Nevertheless, as a high-level language, Solidity makes it so easy to program on blockchain that most decentralized applications nowadays are built on Ethereum. Of course, just like any high-level languages, Solidity codes need to be compiled into binary code first before being used on the blockchain.

People ask from time to time: “how do smart contracts work?” Well, the short answer: Ethereum smart contracts run in EVM (Ethereum Virtual Machine). But that doesn’t really answer the question, and leaves more questions yet unanswered: How do Solidity events work? How does the data structure “mapping” work? …

Well, these questions may not bother those who only want to read and understand contracts. However, if you want to master smart contract development, knowing how everything works behind the scene becomes almost a prerequisite. Here, I’ve listed some benefits of knowing EVM assembly:

  1. Understand any contract. Not all contracts are open-source. Closed-source contracts can contain security flaws and unwanted behavior.
  2. Even deeper debugging. Assembly-level debugging always provides deeper insights. Remix IDE supports assembly debugging.
  3. Extreme programming. Currently, smart contracts development focuses a lot more on security than efficiency (in terms of gas usage). However, as adoption grows, efficiency will eventually become an important factor. Sure, the compiler can do a certain level of optimization, but we all know it’s far from perfect, as least for now.

The list goes on and on. So, there comes the question: how do I master Ethereum assembly?

Well, understanding assembly codes (opcodes) is nothing too difficult, as all the technical specifications have been clearly (well, sort of) stated on the yellow paper. I personally don’t think that is enough. To me, mastering Ethereum assembly means that:

When you see Solidity code, you will immediately know how it will be compiled into assembly code; when you see assembly code, you will have a fairly accurate guess of the original Solidity code after some analysis.

Thus, I believe the best way to learn Ethereum assembly is through learning reverse engineering. I’m starting the series to help people who want to get started with this.

Please note that, this series is not a starter guide for Ethereum contract development. It’s assumed that readers can already both write simply contracts, and understand relatively complex contracts. If you’re not equipped with these skills yet, please learn them first. I may also start another series for those topics in the future.

Thanks for reading. See you in the next post.

--

--

Jonathan LEI

Blockchain protocol engineer, blockchain smart contract reverse engineer