A Practical Guide to Smart Contract Security Tools. Part 1: Introduction
Intro
As our MixBytes team performs smart contract security audits, the use of automated tools is very relevant. Are they the most efficient means to identify possible flaws? How should we use them? What are their functions? What are the specifics of working in this field?
These questions and related issues are the main focus of the article. I will describe our attempts to work on real contracts using the most interesting tools and share some tips on how to use this diverting type of software. At first, I wanted to put everything in one article but as the amount of data grew with time, I decided to make a series of articles, one per each autoanalyzer. You can find a list of tools I will refer to here: https://consensys.github.io/smart-contract-best-practices/security_tools/#static-and-dynamic-analysis. However, if I come upon other worthy tools, I will add them to the list, test and set them out for you.
I must confess that audit tasks turned out to be very entertaining because developers have not paid much attention to economic aspects of algorithms and internal optimization so far. Smart contract audit revealed some peculiar attack vectors to be considered while searching for errors. Moreover, there appeared quite a number of tools for automatic testing: static analyzers, bytecode analyzers, fuzzers and other good software.
The aim of the article is to promote contract code security and allow developers to quickly get rid of silly bugs that often annoy the most. The protocol may seem quite reliable and solve serious problems, but if you missed one minor bug during testing, it can seriously affect the whole project. Therefore, let’s at least learn how to use the tools that will help us easily avoid well-known problems.
Getting ahead of myself, I must say that high-level logic is the most common source of critical bugs in audits whereas typical vulnerabilities, such as access rights, integer overflow, reentrancy, etc. are less common or don’t have such impact that broken logic does. Сomplex and thorough audit of high-level contract logic, life cycle, operational aspects, and task compliance can be performed only by experienced developers that can test a contract both for common patterns and uncommon vulnerabilities and logic bugs.
Automatic analyzers are useful for detecting common errors, warnings, typical vulnerabilities, minor mistakes and code style recommendations. Sometimes they point to serious bugs, and they usually handle standard tasks better than humans — and that’s what we are going to check.
Specifics of Smart Contract Code Audit
Smart contact code audit is a tricky area. Although small in size, each Ethereum smart contract represents a comprehensive program that can generate complex forks, cycles, decision trees, etc. Even a basic smart contract for automating some deal requires taking into account all potential forks at each step. From this viewpoint, blockchain development is a low-level task that consumes a lot of effort and resources; it has much in common with system software and firmware development in C/C++ and assembler languages. Therefore, when interviewing developers, we welcome those familiar with low-level algorithms, network stack, high-load services, all who ever had to deal with low-level optimization and code audit.
From the developer’s viewpoint, Solidity is also quite specific, although almost any programmer can easily follow it and the first steps seem very simple. Solidity code is fairly easy to read; it is familiar to any developer who knows C/C++ syntax and the basics of object-oriented programming, for instance, JavaScript.
Code efficiency is crucial for the survival of a smart contract, that is why blockchain developers actively use low-level development tricks and make algorithms that allow to efficiently use resources and save memory: Merkle trees, Bloom filters, “lazy” loading of resources, loop unrolling, garbage collection, etc.
The small size of source code and output byte-code
Smart contract byte-code size is limited by the constant upper limit of gas. Now, the Ethereum blockchain can store about 10Kb of data at most — not much room to use. Here is an article about gas price and smart contract deployment costs: https://hackernoon.com/costs-of-a-real-world-ethereum-contract-2033511b3214. Operating on several dozens of short functions, a smart contract’s code cannot afford aggregation methods, additional structures (like indexes) or complex logic. For this purposes, a developer creates separate libraries, sets up a smart contract system, adds new steps to the deployment process. The only way to put a lot of code to blockchain is to split it to separate library classes with their own dedicated storages. These classes, in turn, can be conveniently placed into separate files. Therefore, thanks to a neat original structure, contract code is fairly nice to read. There is merely no other way to make a usable contracts system. Have a look at a nice example of ERC721 token contract in openzeppelin-solidity: https://github.com/OpenZeppelin/openzeppelin-solidity/tree/master/contracts/token/ERC721.
Gas, gas, gas
Gas brings an additional layer of logic to contract code execution, and it certainly requires audit. Moreover, the same code sequence may have different gas price. The EVM operation code table may help grasp the idea of gas limits: https://github.com/trailofbits/evm-opcodes.
To demonstrate why gas price assessment is so time consuming, we would suggest considering the following pseudocode snippet:
// function records the event code on the blockchain
function fixSomeAccountAction(uint _actionId) public onlyValidator {
// …
events[msg.sender].push(_actionId);
}// the user calls the function that sums up the rewards
// and pays them out for every type of action
function receivePaymentForSavedActions() {
// …
for (uint256 i = 0; i < events[msg.sender].length; i++) {
// take actionId from the array
uint actionId = events[msg.sender][i];
// calculate the action reward
uint payment = getPriceByEventId(actionId);
if (payment > 0) {
paymentAccumulators[msg.sender] += payment;
}
emit LogEventPaymentForAction(msg.sender, actionId, payment);
// …
// delete "events[msg.sender][i]" from array
}
}
The problem resides in the fact that the contract cycle is executed events[msg.sender].length times. Each iteration implies making writes in the blockchain(transfer() and emit()): transfer stores new balances and addresses, emit — saves log event. If an array is small, the cycle works several dozens of repetitions and everything goes fine. However, a large events[msg.sender] array requires numerous iterations and the gas price hits the hardcoded limit of about 8m. A transaction then fails and there is no way to pull it through, as the contract does not allow reducing the events[msg.sender] array. If, apart from computing a separate value, a cycle implies making a record in the blockchain (e.g. to pay fees, commissions, etc.), the number of available iterations is strictly limited. For example: recording a new 256-bit value takes 20K out of the 8m gas limit. Plainly put, you can only save or update a couple hundred of 256-bit addresses with some metadata.
Besides, the important fact is that updating the already existing data only costs 5k, so, for example, “transferring” tokens to an account that already stores tokens is four times cheaper (5k vs. 20k of gas for each write).
No surprise that the gas issue is closely connected to that of contract security: there is little difference for funds owner between the case when funds are forever stuck at the contract and a case of theft. Given that the ADD command costs 3 gas vs. 20 000 gas for the SSTORE command (saving to the storage), storage appears to be the most expensive resource in blockchain and contract code optimization tasks have much in common with low-level development tasks in C and ASM for embedded systems that also work with limited storage size.
Wonderful blockchain
From a smart contract auditor’s point of view, the blockchain technology is good for security aspects. Deterministic nature of smart contract code ensures debugging as well as bug and vulnerability reproduction. Technically, any contract function call can be fully reproduced on any platform and at any time, that in turn allows easy testing and its support, and reliable and undisputable investigation of incidents. We know who called the function, its parameters, the code that processed it, and the result. The algorithm is purely deterministic, i.e. can be reproduced anywhere, even on a web page in the JS language. In Ethereum, you can easily write any test case in JavaScript, include fuzzing parameters, and it will run everywhere on Node.js.
Sounds great, doesn’t it? However, we still should remember that the most common critical bugs involve contract logic, and determinism is a very good, but orthogonal feature.
Contract compilation environment
For this article I chose an old trial contract for booking accommodation made for Smartz platform: https://github.com/smartzplatform/constructor-eth-booking. The contract allows to create a record of the object (an apartment or a hotel room), set the booking price and dates. Then, if payment is received, the contract registers the fact of booking, keeping the funds until the guest checks in and confirms his arrival. At this point, the owner of the room receives the payment. In fact, the contract is a kind of a state machine, statuses and transitions of which can be viewed in Booking.sol. We wrote it pretty quickly and did not have time for thorough testing. Its compiler version is outdated but the logic is more or less worthy. Let’s see how analyzers will handle it and whether they will find any errors — and if not, we will add some.
Use of different solc versions
Various analyzers involve different usage patterns for Solidity compiler. One tool uses a Docker image, another one works with already compiled bytecode, and an auditor has to deal with many different contract sets and compiler versions. So you need to be able to easily change solc version in the host system, Docker image or in truffle environment. Here are some dirty hacks:
1. Within Truffle
No sweat. From Truffle 5.0.0 you can indicate the compiler version right in truffle.js like here https://github.com/smartzplatform/constructor-eth-booking/commit/62b0628b60de53e9267426ee92dae423878bd852
Truffle then downloads the required compiler and launches it. My compliments go to the Truffle team — Solidity is a young language, and serious changes occur quite often. Unlike developers, auditors cannot migrate to a newer version — this is a way to provoke new bugs and mask the previous ones.
2. Replacing /usr/bin/solc in the analyzer Docker container
When an analyzer is distributed as a Dockerfile, it can be replaced to the point where the docker image is built. It requires adding a line to the Dockerfile to take the required solc version from the image that imports and simply replaces /usr/bin/solc:
COPY --from=ethereum/solc:0.4.19 /usr/bin/solc /usr/bin
3. Replacing /usr/bin/solc in current system
The most dirty trick. When there is no other alternative, you can sneakily replace the /usr/bin/solc binary with a script, that runs a necessary version of solc directly from Docker image (don’t forget to backup original file!):
#!/bin/bash
# run Solidity compiler of given version, pass all parameters
# you can run "SOLC_DOCKER_VERSION=0.4.20 solc - version"
SOLC_DOCKER_VERSION="${SOLC_DOCKER_VERSION:-0.4.24}"
docker run \
-- entrypoint "" \
-- tmpfs /tmp \
-v $(pwd):/project \
-v $(pwd)/node_modules:/project/node_modules \
-w /project \
ethereum/solc:$SOLC_DOCKER_VERSION \
/usr/bin/solc \
"$@"
The script downloads and caches a Docker image with the required solc version. Then it navigates to the active catalog and launches /usr/bin/solc with the received parameters. It’s not a fair way but it allows to easily forge the compiler and its version without making serious changes in the host system.
Flattening code
Now, here we are to tackle the source code. Theoretically, automated tools (especially for static source analysis) are supposed to build a contract, import all dependencies, merge it all together and analyze. However, as I have mentioned before, version to version changes can be capital. I would again and again face the need to add a new catalog to Docker and configure paths within it just to make it correctly load the required imports. There are analyzers that can handle it, but some cannot. For analyzers that can handle a single file, an all-around way to avoid creating additional catalogs would be pooling all the data to a single file and have it analyzed.
Use the truffle-flattener for that: https://github.com/nomiclabs/truffle-flattener.
This is a standard easy-to-use npm module:
truffle-flattener contracts/Booking.sol > contracts/flattened.sol
If you need a customized flattening code, you can write your own flattener. For instance, sometimes we use a python-based variant: https://github.com/mixbytes/solidity-flattener
Compile and run tests
Let’s stick to the good old sample https://github.com/smartzplatform/constructor-eth-booking and go on with the analysis. The contract uses an old compiler version “0.4.20”. It is on purpose that I took an obsolete contract to handle the compiler issue. To make things even worse, an automated analyzer (e.g. the one tacking byte-code) can depend on this solc version and these version differences can have serious impact results or even cause a total failure. So, even if you always update everything to the latest versions, you still risk getting caught into an analyzer designed for an older compiler version.
Test compilation and launch
For starters, let’s just pull the project from github and try compile it:
git clone https://github.com/smartzplatform/constructor-eth-booking.git
cd constructor-eth-booking
npm install
truffle compile
Certainly you have compiler problems here. Automated analyzers have them too, therefore, do anything to get 0.4.20 compatible compiler and to build the project. What I did was simply indicating the necessary compiler version in truffle.js and ta-da, success!
Also launch
truffle-flattener contracts/Booking.sol > contracts/flattened.sol
As I mentioned in “Flattening code” section, it is contracts/flattened.sol that we will analyze.
Conclusion
Now that we have a flattened.sol file and an arbitrary version of solc, we can proceed to the analysis. I will omit the problems with running truffle and tests, there is a lot of documentation on the subject that you can sort out yourself. Without any doubt, tests must be run and performed successfully. Moreover, the auditor often has to add his own tests to check the logic for potential vulnerabilities, for example, analyze the contract functionality at the array boundaries, test all the variables, even those strictly intended for data storage, etc. Recommendations are numerous; besides it is the main service our company provides, so the logic audit is a purely human task.
We will examine the most interesting analyzers to see how they handle our contract with “fake” vulnerabilities we added on purpose. The next article deals with Slither, while our general plan is as follows:
Part 1. Introduction. Compilation, Flattening, Solidity Versions (this article)
Part 2. Slither
Part 3. Mythril
I formed such list of analyzers because the auditor must be able to jungle various approaches to conduct different types of analysis — static and dynamic. Our task is to learn the basic tools we should use for each type of analysis. During a detailed research, I may consider adding new candidates for review or change the order of articles, so stay tuned. To go to the next part, click here.
In case you have a specific question regarding the use of automatic smart contract security tools for your project, feel free to contact us onour website or via Telegram.