Ethereum: Why the code needs to be improved
Code is the basis for all blockchain technologies. With poor code, it might be even dangerous to use DLT systems. That is why we, the Institute for Crypto Code Review, analyzed the Ethereum codebase with an innovative and new method. We used specialized algorithms to determine the quality of the code and to detect errors. Despite its popularity, Ethereum has several code issues and needs to be improved. Our analysis also unveils that new releases of Ethereum have been a step backwards in terms of code quality.
Authors: Christian Flasshoff, Philipp Sandner
The Institute for Crypto Code Review (ICCR) analyzes the underlying codebase (e.g. C++, Java) of cryptocurrencies and other DLT solutions. We determine a score for the quality of the code, which is easy to understand and comparable. Further, we point out possibilities to improve the codebase but also potential risks. In the case of Ethereum, we analyzed over 30,000 lines of code and identified issues or possible improvements in over 25% of the code. In most cases the issues were minor and not critical but there are also sections of the code, which require improvements by developers. The institute aims to provide transparency, insights from reviewing the code of cryptocurrencies and to support investors, analysts and other blockchain enthusiasts with assessing crypto projects. Before jumping directly to the review of the Ethereum code, let us take a look at the blockchain technology itself. For everyone who is familiar with the concept of Ethereum, feel free to skip this section and continue reading in the next section.
The Ethereum technology
Ethereum is a blockchain solution, which goes far beyond the function of a cryptocurrency. The open-source platform allows developers to build and run custom decentralized applications. Similar to other blockchain solutions, the Ethereum network is decentralized and consists of many individual computers (nodes), which communicate with each other by the use of the Ethereum protocol. The interactions on the Ethereum network are cryptographically secure, which means that private data remains private and transactions or written data are immutable to undesired modifications. The Ethereum network is based on consensus, which means that the authenticity of transactions or data has to be verified by several nodes before it is permanently stored on the blockchain. This verification process is done by miners, who provide computing power and receive tradable crypto tokens (Ethers) in return. One major advantage over normal cryptocurrencies can be found in the usage of smart contracts. Smart contracts can be described as computer code, which can automatically execute the exchange of money, content, property, shares, or anything of value[i]. With smart contracts, complex transactions, like an amortizing loan, can be settled with only a few lines of code. Both parties have confidence in the fulfilment of the contract since the execution is automated and predefined after the smart contract was signed electronically. The smart contracts run on the blockchain and are therefore secured against fraud, downtime or any third-party interference. This feature of Ethereum is very powerful and clears the path for numerous new applications, which make economic transactions and operations more efficient and reliable. Ethereum is a first layer blockchain and all miners are rewarded by the native token (Ether) for validating transactions. Developers can also issue tokens, which are built on top of the existing Ethereum blockchain and extend the Ethereum protocol to specialized tasks.[ii]
Review of the Ethereum codebase
Ethereum is equipped with many functions, which are all defined in the Ethereum code. The code is the backbone of the Ethereum software, which runs on every node of the network. Therefore, the quality and the security of this codebase is crucial for the long-term success of the Ethereum project. We used an innovative and new approach to analyze the quality of the code. With the help of a software, we automatically analyzed the C++ codebase of Ethereum and determined the quality of the code. Since the Ethereum code is published under the open source GPL 3 license, the codebase is easy accessible on GitHub. [iii]
The specialized algorithms of the system automatically define a score for cryptocurrencies on a scale from -5 (worst score) to +5 (best score). This classification makes it easy to interpret the results and to compare it to other blockchain solutions. Technical remark: Our analysis is based on the Ethereum code version from 2018–03–13 and draws comparisons to an earlier release from 2018–02–15.
Analysis: Ethereum scores only 1.69 on a [-5,+5] scale
Most of you probably want to know where Ethereum ranks on this scale. The second largest cryptocurrency scores only at 1.69 and unveils potential for further improvements. In the further course of this article we will analyze this number in more detail. The score can be further subdivided into four categories: design, metrics, duplications and code issues. For each category, an individual score is calculated, which helps to identify the origin of the issues. Figure 2 visualizes all of the subcategories of the Ethereum version from 2018–03–13. In the following, each category is explained in detail and the results will be interpreted. This part might be a little bit more technical but still plausible for non-coders.
Design issues. In this category the codebase is analyzed in terms of the design of the code. A good code design is characterized by an easy to follow and efficient structure. Even though the functionality of software with well-designed code or less well-designed code might be the same, it is desirable to develop code that every programmer understands easily. The quality of code design can be analyzed automatically with help of algorithms and includes the detection of anti-patterns. Anti-patterns are sections of code that appears to work but are not optimal constructed. The patterns usually arise over time when new functionalities are added or when changing developers contribute to the code. Anti-patterns may result in errors and make maintenance of the code very difficult. There are several types of anti-patterns. One example is the God class (or Monster class), which contains many, often incoherent, functionalities in a single object. The result is a very complex code, which tries to solve a large problem at once, instead of breaking it into several smaller problems. Consequently, code refactoring becomes more complex and even small changes in the God class require system-wide tests of the effects. There are also several other anti-patterns but explaining all of them would be beyond the scope of this article. More important are the results of the design analysis. The Ethereum code has a total number of 157 anti-patterns. Most of the design issues can be found in the “libethereum” component (59), followed by the “libp2p” (22) and the “libdevcrypto” (17) components. The evaluation of the design issues concludes with a score of 2.29. Figure 3 shows an example of anti-patterns within the “libethereum” component. It is also important to mention, that the score decreased compared to an earlier version of the Ethereum code. The 2018–02–15 snapshot of the codebase shows only 50 anti-patterns and rates with a score of 4.18 in the design category. We can therefore conclude that the Ethereum codebase declined in terms of design quality over the last developments.
Metric violations. The next category tracks the quality of the code with software metrics. Such a metric is for example “number of methods” (NOM). This metric counts the total number of methods (functions) in one class. It is obvious that a higher number of methods makes the code more complex and increases the risk for errors. Other metrics are for example “lack of cohesion in methods” (LOCM), which measures the cohesiveness of a class or “access to foreign data” (ATFD), which measures the frequency of access to external attributes from other classes. In order to determine the quality of the code, the system we used reports when an undesirable threshold of a metric is exceeded and calculates a score for metric violations. The Ethereum code shows a total number of 968 metric violations, which can be translated into a score of 0.47. Compared to an earlier release of the Ethereum code (2018–02–15) with a metric score of 2.32, the quality of the code decreased between the two releases.
Duplications of code. As the name already implies, the category duplications searches for duplicated code. Duplicated code is usually undesirable, since it may increase the lines of code, lowers the performance or increases software vulnerability. The Ethereum codebase shows a desirable score of 4.12 and only 0.92% of the code is duplicated. This result is very similar compared to the 2018–02–15 version of Ethereum, with a score of 4.33.
Code Issues. The last category focuses on code issues. In contrast to design issues, code issues apply only to a local part of the code. Depending on the characteristics of the issue, the impacts on the performance of the software may vary. Therefore, it is important to classify the implications of the detected code issues. The innovative algorithm ranks each code issue between, low, medium, high and critical. Within the Ethereum codebase, there are 314 codes issues. The majority of the issues (90%) are classified as low or medium and many are located in the “libevm” component. An example for a low code issue would be an unused label within the code. Such an unused label does not interfere with the correct functionality of the software but could be removed to make the code more compact. The remaining 10% of code issues fall into the category of high or critical issues. Most of these issues are localized in the “utils” and “libethash” component of the Ethereum code. Figure 4 shows examples of critical code issues in the “libethash” component. High and critical code issues do not necessarily lead to a dysfunction of the software but increase the risk of undesired behavior of the code. Software with less code issues tends to run more stable and should be the goal of every programmer. The occurrence of code issues is common in computer programming and is part of the development process. Nevertheless, the removal of code issues is necessary to improve the software. It is important to mention that the Ethereum code receives only a score of -0.05 in the category of code issues. Even though this score improved slightly compared to the earlier release (2018–02–15), further improvements are definitely necessary.
Summary: Ethereum code needs to be improved
We, the Institute for Crypto Code Review, automatically analyzed the Ethereum code and the algorithms unveiled several areas, which should be improved. The system classified the results into hotspots according to the urgency of the issue. As Figure 5 shows, almost 5% of the total Ethereum code is flagged as “critical” and needs further attention by developers. The problem can be narrowed down to the “libethereum” component, where over 1,200 lines of code are affected. But also, other components contain hotspots with the classification “high”. Even though the code seems to function correctly, also these parts should be reviewed by developers. Since Ethereum is open-source, different programmers can contribute to the project. This makes it difficult to develop a code with a consistent design and to avoid anti-patterns. The analysis also unveils a decrease in code quality between two releases of the Ethereum code. Therefore, a new version of Ethereum does not necessarily mean an improvement in the code quality. Obviously, there are also good news, over 20,000 lines of code have no issues and demonstrate high quality.
In the future it will be interesting to monitor new developments of the Ethereum code and to see whether the code will be improved in terms of code quality. In addition, a comparison of these results to other blockchain solutions are also necessary. The Institute for Crypto Code Review will follow the development of Ethereum and extend the analysis to other blockchain solutions in the future.
If you like this article, we would be happy if you forward it to your colleagues or share it on social networks.
Institute for Crypto Code Review (ICCR) analyzes the underlying codebase (e.g. C++, Java) of cryptocurrencies and other DLT solutions. For example, we determine a score for the quality of the code, which is easy to understand and comparable. Further, we point out possibilities to improve the codebase but also potential risks. The institute aims to provide transparency, insights from reviewing the code of cryptocurrencies and to support investors, analysts and other blockchain enthusiasts with assessing crypto projects. The ICCR was founded in 2018 in Germany.
Prof. Dr. Philipp Sandner is head of the Frankfurt School Blockchain Center. You can contact him via mail (firstname.lastname@example.org), via LinkedIn (https://www.linkedin.com/in/philippsandner/) or follow him on Twitter (@philippsandner).
Christian Flasshoff is research fellow at the Frankfurt School Blockchain Center and Alumni of the Frankfurt School of Finance & Management. You can connect with him on LinkedIn (www.linkedin.com/in/christian-flasshoff) or contact him via mail (email@example.com).
Disclaimer: The results shown in this paper are based upon an automatic analysis of the code. Please note that this analysis does neither represent financial advice, nor is it supposed to be understood or interpreted as solicitation to buy or sell any securities, coins or tokens.
[ii] Second layer blockchains: https://medium.com/blockchannel/investing-in-tokens-and-decentralized-business-models-e7629efa5d9b