Quantifying Complexity in Smart Contracts: The Adventures of Uncle Bob and the Clean Code Crew

12 min readJun 10, 2024

Quantifying code complexity is…complex. When it comes to smart contracts, especially Solidity contracts, it’s even more challenging due to considerations like immutability, composability, and gas optimization. In this post, we look at three metrics historically used to assess the complexity and risk of enterprise applications. With the help of Slither printers, we will use these complexity measurements on well-known Solidity codebases OpenZeppelin, Solmate, and UniswapV3.

Alright I’ll bite, what’s a Slither printer?

Before we talk about complexity, let’s talk briefly about Slither printers. Slither printers are data visualization tools that plug directly into the Slither API. They give you the power to slice and dice a codebase any way you want. I often use them to get an overview of a new codebase or when looking for specific information such as a contract’s entry points or storage writes.

The printers themselves are easy to create and offer a lot of flexibility.

Snippet from the ***modifiers*** printer.

Run selected printers by passing the — **print** clarg.

Results of the “modifiers” printer run on OpenZeppelin’s UUPSUpgradeable contract.

In this post we will use three printers: ck, martin, and halstead to calculate the metrics on some well-known Solidity codebases. Now that you know what printers are, let’s dive into some complexity metrics.

McCabe’s Cyclomatic Complexity

This article will not focus on McCabe’s cyclomatic complexity, a metric that measures the number of linearly independent paths through a function.

Cyclomatic complexity is commonly used outside of web3, in fact, I would almost call it mainstream. But, in my experience, cyclomatic complexity doesn’t translate well to Solidity smart contracts. Part of the reason is because factors such as inheritance and coupling are more impactful drivers of complexity in Solidity. Additionally, due to design constraints like the Spurious Dragon limit, smart contract functions tend to be simpler with less branching so cyclomatic complexities are consistently lower.

We won’t be discussing it here, but if you’re interested in learning more about cyclomatic complexity, there’s plenty of information available. There’s even a Slither printer for it.

Next up, we begin our deep dive on the three metrics highlighted in this article: CK, Martin, and Halstead.

Note: The following discussions are technical and may be a bit dry for some readers. If you’d prefer to skip ahead to the more interesting conclusions, scroll to the Putting it all together section below.

Now, let’s dive into the first set of complexity metrics.

Nothing comes between me and my CK.

The CK metrics were developed in the 90s by Shyam Chidamber and Chris Kemerer. They focus primarily on the structural attributes of classes and their relationships within a software system. The calculated metric values estimate the risk associated with specific parts of a codebase in an attempt to identify potential design flaws, maintainability issues, or areas that require refactoring.

One nice thing about the Slither CK printer is that it provides some additional tables with helpful information about the contracts like number of storage variables and state mutating functions.

The actual CK metrics are found in the last table:

Results of the “ck” printer run on Solmate contracts

Response For a Class (RFC): Number of external functions + number of external contracts called. This metric gives a sense of complexity in terms of interactions and potential responses to external stimuli. For example, the ERC4626 contract has a much higher RFC than the ERC20 contract because it has a lot more external functions.

Number of Children (NOC): Number of immediate child contracts. This metric counts the number of direct descendant contracts.

Depth of Inheritance Tree (DIT): Number of levels of inheritance to the most distant ancestor. The DIT values in the example above tend to be low which is expected since Solmate contracts are generally flatter and don’t offer as many variations. Whereas the DIT values on OpenZeppelin contracts can get as high as 4 or 5. Fun fact: “đít” means “butt” in Vietnamese.

Coupling Between Object Classes (CBO): Number of dependents + number of dependencies. A “dependent” is a contract that calls the target contract, and a “dependency” is a contract that the target contract calls. These are the same ways we will define Martin’s concepts of Afferent Coupling (Ca) and Efferent Coupling (Ce) discussed in the next section.

Inheritance, coupling, and cohesion, the specific attributes measured by the CK metrics, are all directly relevant to smart contract systems. Deeply nested inheritance structures in Solidity can be a primary obfuscator to finding security issues. Coupling, via excessive calls to other contracts within the system, can also hinder readability and hide certain categories of bugs. Cohesion, as described by “Uncle Bob” in the Single-responsibility principle, is a key factor used to assess the complexity management category on the Trail of Bits — Code Maturity Table.

Speaking of Uncle Bob, let’s talk about the man himself, and the metrics he created.

The Martin Metrics

Robert Martin, affectionately known as ‘Uncle Bob,’ is a self-taught software engineer who entered the field as a teenager in the 1970s, a time when most software engineers were academics and long before teenage devs in crypto became the norm. Passionate about security, he established many core software design principles you are undoubtedly familiar with today. He was one of the first to advocate for test-driven development (TDD), he created the Single-responsibility principle, and he was a co-author of the Agile Manifesto which, love it or hate it, has been highly influential on the software industry over the past quarter century.

The Slither martin printer uses some of Bob’s main metrics for quantifying complexity. Two of the key components of these metrics are the contracts’ dependents and the dependencies, which he refers to as Afferent Coupling (Ca) and Efferent Coupling (Ce) respectively.

In addition, the Martin metrics include a concept of Abstractness which can be directly applied to abstract contracts in Solidity.

Results of the “martin” printer run on UniswapV3 core contracts

Afferent Coupling (Ca): Number of dependents. A “dependent” is a contract that calls the target contract. Looking at the UniV3 results above the FullMath library is used by the most other contracts.

Efferent Coupling (Ce): Number of dependencies. A “dependency” is a contract that the target contract calls. In the example above, the pool contract has no dependents but it has 16 dependencies meaning it calls on the other contracts more than it gets called. The results here would change if we included the periphery contracts in the computation.

Instability (I): Ce / (Ce + Ca) Ratio of efferent coupling to total coupling. The more dependencies a contract has, the greater the chance that contract will have to be changed in the future because one of the dependencies changed. This metric helps identify which contracts require change more often.

Abstractness (A): Number of abstract contracts / total number of contracts. Junior devs preach DRY (Don’t Repeat Yourself). Senior devs know WET code is based.

Distance from the Main Sequence (D): abs(A + I — 1). This one has a bit of an odd name. The goal is to identify contracts that are too far away from the “expected” ratio of abstractness to instability.

Using coupling as a metric to identify complexity in smart contracts makes sense. Anyone who’s reviewed a contract in VSCode and had to hit f12 (“jump to definition” hotkey) five or six times only to forget where they started in the first place can surely relate to this. Excessive coupling adds complexity and layers of obfuscation.

Pro tip: In VSCode, ctrl - takes you back up a level from where you hit f12.

And finally, the Halstead complexity measures…

If you’ve made it this far, congratulations! You really are a nerd ❤️. We saved the best for last. Well, if not the “best” at least the most interesting.

Introduced by Maurice Howard Halstead in 1977, the goal of the Halstead complexity measures is to identify measurable properties of software and the relationships between them. This is similar to, for example, the identification of physical properties of a gas (volume, mass, and pressure) and the relationships described in the Ideal Gas Law.

It starts by counting the number of operators. In Solidity, operators would be the language’s reserved keywords, symbols, and functions used in the contract (e.g. “if”, “require”, “return”, “+”, “=”). The operands are the objects these operators use. So in a line of code that says return 1; the operator is “return”, and the operand is “1”.

All Halstead metrics are derived from these 4 core statistics of a given program (or smart contract):

total operators
unique operators
total operands
unique operands

Results of the “halstead” printer run on Solmate contracts

Core metrics:
𝜂1: the number of distinct operators
𝜂2: the number of distinct operands
N1: the total number of operators
N2: the total number of operands

Basic metrics:
Program vocabulary (𝜂): 𝜂1 + 𝜂2. This is the number of unique operators and unique operands used by a contract. The minimum number of terms that must be understood to understand the code.

Program length (N): N1 + N2. This is related to the total length of the code and directly proportional to the SLOC. In the example above, the ERC4626 contract is longer than the ERC20 it inherits from so it makes sense the Program length would be higher.

Estimated program length: 𝜂1 * log2(𝜂1) + 𝜂2 * log2(𝜂2). This is an abstract concept that uses logarithms to reflect the idea that the cognitive effort required to understand a program increases with the number of unique elements but not linearly. It suggests that understanding each additional unique operator or operand adds less to the overall understanding effort as their number increases.

Volume (V): N * log2(𝜂). The Volume metric attempts to measure the informational content of a program and uses logarithm to reflect the human perception of complexity, taking into account both the size and variety of the program’s elements.

Those are some of the basic Halstead metrics, but there’s more. Hold on to your hats, because this next set of Halstead metrics gets pretty wild.

Extended metrics

Difficulty (D): (𝜂1 / 2) * (N2 / 𝜂2). The Difficulty metric tries to quantify the cognitive complexity involved in understanding the contract. It takes into account the ratio of unique operators to operands and their total occurrences.

Effort (E): D * V. The Effort metric combines the difficulty of the contract with its volume, offering a theoretical measure of the total work required to understand the code.

Time required to program (T): E / 18 (seconds). You still quoting LOEs at 250 sloc / day? That’s a boomer metric, bruh. With Halstead I can tell you exactly how long it will take to review a contract right down to the millisecond! 😉

Example: In the figure above, we see Halstead estimating the time for a review of the ERC1155TokenReceiver contract at 2.0 seconds. Looking at the contract below, it checks out!

And now onto the final Halstead metric, the metric to end all metrics.

Question: When you are reviewing code do you ever get the feeling there’s still more bugs you haven’t found? Well, fear no more, now you can know for sure!

Number of bugs (B): (E^(2/3)) / 3000

This metric uses the effort (E) to predict potential errors in the contract, employing a formula that scales with the two-thirds power of the effort. This estimates the likely bug density based on the other characteristics. Where do they get the 2/3 and the 3000? Don’t worry about it.

I feel the skepticism oozing out of your pores now. Fair. Personally, I think it’s quite interesting, but then again I was the kind of kid who took the time to learn his friend’s made-up secret language back in grade school.

Whether these metrics can actually provide a realistic estimate of the number of bugs a contract I will leave to the data scientists. Where I think they can be useful is in comparing one codebase against another, so let’s do that!

Putting it all together

Let’s look at these metrics when applied to the UniswapV3 Pool, Solmate ERC-20, and OpenZeppelin ERC-20 contracts.

These numbers seem to check out. The UniV3 pool contract makes 69 external calls compared with both ERC20 contracts that make 0. So we’d expect RFC to be much higher. Since most of these external calls will take extra time to review, we should increase our assessment of complexity and up the LOE for a security review by a little.

UniV3 is a fully derived contract, no other contracts inherit it, hence NOC of 0. Whereas the ERC20 contract in the OpenZeppelin repo is the basis for 77 other contracts. This is useful information. We can quickly see that any change to the OZ ERC20 contract will have a major impact on many other contracts and should be carefully considered. Interestingly, it’s almost a self-enforced type of immutability.

The DIT on these three contracts is not bad. In my experience, contracts with a DIT of 5 or 6 can be much more time consuming to review than those in the 0–2 range.

The CK Metrics seem valuable. Looking ahead, let’s skip the Martin metrics for these contracts as they are not that illuminating due to the low coupling found in all three contracts. Let’s move on to the Halstead metrics.

This is also interesting. Both ERC20 contracts have similar metrics which makes sense since they are implementations of the same contract. The Solmate contract is slightly higher in Effort, Time, and Bugs which kind of checks out intuitively — more inline assembly used for gas optimization — although personally I find Solmate easier to read.

The Uniswap contract is higher on all metrics which is not suprising since it’s over 5 times the length. Also, since the UniV3Pool contains more external calls we would expect a lot more operands and therefore a higher Difficulty.

However the gap between the metrics for the ERC-20 and the pool contracts seems quite wide. While we can all agree that reviewing the UniV3Pool contract would take more time than the simple ERC20’s, is it really 40–50x more? I think not. In this case, it would seem the formulas are either overestimating the difficulty of larger, more complex contracts, or underestimating the simpler ones.

As for the actual time values, they also seem a little off. I could see getting through the ERC20 contracts in half an hour if you were just reading for understanding. But when performing a security review it could easily take ten times this for the first pass. The Uniswap contract on the other hand, would probably take less than 26 hours for a read through, and 2–3 times as long for a security review.

The thing I’m most impressed with is that these values are not completely disjointed from reality, they seem roughly within an order of magnitude. It’s hard to draw conclusions from this one simplistic example, but they do seem to be able to express some aspects of the complexity found in the contracts.

Final thoughts

When we analyze the complexity of smart contracts, we are inherently addressing security concerns. Complex code obscures logic errors and vulnerabilities. While it might be funny to “predict the number of bugs” in a code base, I think there are legitimate uses for these metrics such as identifying which contracts deserve extra scrutiny during a security review.

I originally developed these printers as a way to assist with determining LOEs on security reviews. It remains to be seen whether or not this will provide any valuable insight long term but it’s something I will be tracking. I’ll be sure to report back in the future if I discover any alpha. 🫡