Introducing SOLTIX

9 min readMar 20, 2019

We are pleased to announce SOLTIX — a scalable automated framework to test the Solidity smart contract compiler — on github. The research and development of SOLTIX started at the ICE Center, ETH Zurich as the Master’s thesis project of Nils Weller under the supervision of Dr. Petar Tsankov and Prof. Martin Vechev. The project is now supported by the Ethereum Foundation and welcomes contributions from the Ethereum community.

The framework has already found bugs in the official Solidity compiler (solc) and Ganache CLI.

Solidity compiler bugs:

Exponentiation bug - fixed in version 0.4.25.
Internal compiler error bug - fixed in version 0.5.1

Ganache CLI bugs:

Shift and exponentiation crashes - not fixed yet

The framework’s installation and usage, as well as an introduction to technical details are described in the README.

What is it?

SOLTIX is a highly automated testing framework for Ethereum’s Solidity smart contract compiler (solc). It generates and executes Solidity smart contracts (using truffle/Ganache CLI). As part of this process, SOLTIX checks for unexpected smart contract behavior that may indicate mis-compilations and/or execution errors.

How does it work?

Given a few input parameters, the framework can randomly generate smart contract code — producing complex combinations of variables and computations on them. It is possible to produce entirely new smart contracts, and to transform existing smart contracts to new variants. Existing real-world smart contracts can thus be used for testing as well. Three techniques are applied to ensure that generated code has known semantics against which a compiled smart contract can be validated.

Testing via contract synthesis

The first technique is a limited form of differential testing: the framework can produce a type of smart contract that is interpreted during its generation and annotated with built-in correctness checks to verify whether the interpretation produced the same values as the compiled program. An expression evaluator simulates Solidity computations to achieve this. This results in comparatively strong correctness indications, as the framework precisely validates computations against expectations (a major caveat lies in the possibility that the expression evaluator may also simulate incorrect behavior). An example code fragment from this type of contract is shown below.

function __returnFunction15(bool arg0) public returns (bytes3) {return (arg0? bytes3(0x5FF97F): bytes3(0x6A62BC)); }
function __returnFunction16(bool arg0, bytes16 arg1) public returns (int8) {return ((arg1 <= ((!arg0)? bytes16(0x69E32EEB82CCF88A5AD5BF7409E79B78): bytes16(0x314188FCBE5D5E0C3B044C4640E65FF2)))? int8(0): int8(-44)); }
function __returnFunction17(bool arg0) public returns (int176) {return (arg0? int176(0): int176(-219)); }function f1(int136 a0, int160 a1, int48 a2, bytes4 a3, bytes31 a4) public payable
{
    v25 = ((((uint208((v0.m0++)) >= v6) != (((!(v5 >= uint24(uint256(bytes32(v38))))) == ((((-__returnFunction14((~a4), (!(v29.m0 == (v20? true: true))))) % int224(2)) == int224(1))? true: false))? false: false)) != ((((uint120((~(-v18))) / v35) % uint120(2)) == uint120(1))? true: false))? bytes21(0x5B72F86FDAC46540E577B46D6554BB157EAF720E77): bytes21(0x2EC14A55401E5765BC8384119CF5546DAD4B4C5A72));
    if (v25 != bytes21(0x5B72F86FDAC46540E577B46D6554BB157EAF720E77)) 
       emit EXPR_ERROR(20);    v35 = uint120((((-v6) - uint208(uint256(bytes32(__returnFunction15(((!((~v38) <= ((v11 > bytes19(bytes32(uint256(v15))))? bytes7(0x2A5570E11F8947): bytes7(0x25F0F81BBF1E76)))) != (((int56(int256(bytes32((~(~(bytes4(v36.m0.m1.m0) ^ a3)))))) % int56(2)) == int56(1))? true: false))))))) & ((!(int120((((v34.m1.m0++) + ((v28 == (v21? bytes16(0x0C1CACFDC2C2D179884A168C9D6354E5): bytes16(0xA1B516071781F4E4D67A49AB7E8E2F99)))? int224(0): int224(-60))) * int224(int256(bytes32(v14))))) >= (int120((int184((v37--)) ^ (v31.m0 % (int184(-136) + (v20? int184(-8): int184(0)))))) - v30)))? uint208(0): uint208(13))));
    if (v35 != uint120(1))
        emit EXPR_ERROR(21);    v2 = (((!(v9.m2 != int184((((~(int120((~__returnFunction16((v1.m0 < ((!v32)? uint192(0): uint192(107))), (~(bytes16((~a3)) & v10))))) - (-(-(int120((v34.m0 ^ int72(((bytes19((~(~v33))) <= v11)? s19(int72(59), s20(int224(12)), uint208(5)): s19(int72(28), s20(int224(2591091128145778057405768957531670142752866176585550924395168103945)), uint208(148))).m0))) ^ (int120(v23.m1) % v30)))))) == int120(-165))? s6(bytes14(0x592CE53057A8B751ED06A1E2572E), bytes22(0x734098CCCB20B64E07EFDB0985A1162DF0BF049B96FC), int184(-11526586694460336622566325220618163525109336824341634196)): s6(bytes14(0x4368929C474A074147F6930A79AE), bytes22(0xE2570C5EE4D7AD78E40A9830477261055BB9809FDD61), int184(175))).m2))) == (((__returnFunction17((!(!(v29.m0 == (((!v21)? s17(false): s17(true)).m0? false: false))))) % int176(2)) == int176(1))? true: false))? bytes8(0x83ED39C45945DEF8): bytes8(0x751AA8E06E0FD186));
    if (v2 != bytes8(0x83ED39C45945DEF8)) 
        emit EXPR_ERROR(22);

A second type of contract mixes in control structures to increase code complexity and coverage of language features — shown below. This type of contract has no built-in correctness checks, but can be used as input for the techniques described below to obtain meaningful results.

function f0(int216 a0, uint144 a1) public payable
{
    if ((((uint200(v7) != (uint200(((v35--) % ((((!(v32 && (((int32(int256(bytes32(v1.m1.m0))) % int32(2)) == int32(1))? true: false)))? uint120(0): uint120(1097699404698248093558544665916153779)) != uint120(0))? uint120(2): uint120(2)))) / (((-(--v22)) != uint200(0))? uint200(2): uint200(4)))) || ((!(!((~v26) != ((((v0.m0 == uint248(((v24 == uint40(179))? s0(uint248(136), int240(250), true): s0(uint248(127), int240(500734067894338333886547741216458024858756682797117786320369187227033845), true)).m1)) == (((v34.m1.m0 % int224(2)) == int224(1))? true: false)) == (((((-v6) - uint208(uint256(bytes32(v19)))) % uint208(2)) == uint208(1))? true: false))? bytes5(0xC4E385E55D): bytes5(0xF894CCC506)))))? true: true))? true: true))
    {
         _internalLoopCounter0 = 0;
         for (; (((!(!((int224((uint208(v8.m2) + (v34.m2--))) != v7) != (((a0 > int216(int256(bytes32(v23.m3.m2)))) == (((__returnFunction0((++v3), (!(v24 == (v32? uint40(467394013285): uint40(0))))) % uint216(2)) == uint216(1))? true: false))? false: true)))) && ((((v15 + ((!v17)? uint224(15195432762090883449105328963814971127608106126169511717262317255535): uint224(0))) % uint224(2)) == uint224(1))? true: false))? false: true); ((((((((!v32) != (((int128(int256(bytes32(v38))) % int128(2)) == int128(1))? true: false)) || ((!v17)? false: true)) != ((!(!((v10 & bytes16(bytes32(int256(v23.m2)))) < bytes16(bytes32(int256(v30))))))? true: true)) == (((-(v9.m2++)) >= (v21? int184(0): int184(241)))? true: true)) != ((((-a1) % uint144(2)) == uint144(1))? true: false)) && (((v8.m2 % uint40(2)) == uint40(1))? true: false))? false: true))
         {
             if (++_internalLoopCounter0 == 3) break;
             if (((__returnFunction1((-(~(int184(v35) + v18)))) >= ((!(!((v9.m0 >= bytes14(bytes32(int256((((uint200((++v27)) / (((-v12) != uint200(0))? uint200(2): uint200(2))) == uint200(1532681028396710215629055563667245411361441957000316750247090))? s6(bytes14(0x9789BEE4BF6C59C27E1391C08E8A), bytes22(0xAA08653DE52F806F1E0EAE74BBD70960EA2E31F92ECE), int184(-224)): s6(bytes14(0x6C91440E65CE395F594B738A1677), bytes22(0xD0B4EB872DE32F893558B8027522238313CAE4875E53), int184(197))).m2)))) != ((((-((v30 & ((v17 == (((v13.m2 % uint128(2)) == uint128(1))? true: false))? int120(-236): int120(0))) ^ ((!(v4 != (v21? true: true)))? int120(-183): int120(0)))) % int120(2)) == int120(1))? true: false))))? bytes21(0x6B0097C01E5418F7BDA53794A388091DBA84DBFB78): bytes21(0xD798A421C61FA2757742458106927313852572D097)))? false: true))
             {
                 _internalLoopCounter1 = 0;
                 while (((((v31.m3 / ((((!((uint32((~v36.m1)) * (-v27)) <= ((!((v16.m3 - int208(int256(bytes32(v2)))) >= int208(int256(bytes32((~(~v38)))))))? uint32(0): uint32(162))))? uint64(118): uint64(0)) != uint64(0))? uint64(5): uint64(2))) % uint64(2)) == uint64(1))? true: false))
                 {
                     if (++_internalLoopCounter1 == 3) break;

Testing via synthesis of equivalent contracts

The second approach to achieve known smart contract semantics is an implementation of the Live Equivalence Modulo Inputs (semantically-equivalent transformations) compiler testing technique[1], which has been applied to C compilers with much success. It is based on the idea of treating the behavior of some supplied smart contract program with some input as reference semantics, and producing program variants that should be semantically equivalent as per the language definition, but may exhibit divergent behavior due to compilation or execution flaws. The correctness indication is weaker than for differential testing, because some initial behavior of unknown correctness is fixed as reference semantics.

Example code for one transformation is shown below. At the bottom, it contains an assignment to “v31.m2”, which is the only line of code that was part of the original (generated) program. As part of the transformation process, this assignment statement was randomly chosen to be prepended with a generated mutation code snippet that writes to structure variable “v34” (and integer variable “v5”) in a manner which preserves its original value, but is non-transparent to the compiler and could thus provoke faulty optimizations. Variable state knowledge obtained from profiling the program is used to generate a tautological if-condition around the “_internalBackupStructs19 = v34” assignment, thereby ensuring that v34 remains unchanged after the “v34 = _internalBackupStructs19” assignment.

{
_internalBackupSideEffectVariable0 = uint256(v5);
_internalBackupStructs19 = (((!(v29.m0 == ((((--v5) == uint24(7765742))? s17(false): s17(true)).m0? true: false))) || (((int24(int256(bytes32((v23.m3.m2 | bytes26(bytes32(uint256(((uint32((-(~(v0.m1 | int240(int256(bytes32(v10))))))) / (~(~((uint32(v22) - v31.m2) & uint32(uint256(bytes32(a4))))))) / (((-a0) >= (v39? int136(0): int136(-1845104727260726632244472778478804523425)))? uint32(1360013758): uint32(0)))))))))) % int24(2)) == int24(1))? true: false))? s19(int72(51), s20(int224(-12435976235219794609945472935092809869506966147453295968819059208879)), uint208(183612221083257879836453593859670458977276121533255759026476097)): s19(int72(-213), s20(int224(-11290602255487394632126063322082619399222355407867304321897660225023)), uint208(64)));
if (v5 != uint24(_internalBackupSideEffectVariable0)) v5 = uint24(_internalBackupSideEffectVariable0);
     
if ((((v3 >= int136(165)) && (v37 >= uint192(5891595882202249158202636970388682289957726167987096376254))) && (!(v3 >= int136(166)))))
{
    _internalBackupStructs19 = v34;
}
v34 = _internalBackupStructs19;
v31.m2 = uint32((v34.m1.m0 - (((!(v29.m0 != (((((~v26) ^ (((~v28) != bytes16(bytes32(uint256(v35))))? bytes5(0x72C0370CB4): bytes5(0x7A724ED747))) == bytes5(0x8F39FE724C))? s17(false): s17(true)).m0? false: true))) || ((((v30 - (v4? int120(-182): int120(0))) % int120(2)) == int120(1))? true: false))? int224(0): int224(5941477337187807215949378889172187968192486307390348647302388207876))));
}

[1] Finding Compiler Bugs via Live Code Mutation (Sun et al., 2014)

Testing via Different Optimization Levels

As a third approach to validate program correctness, Different Optimization Levels can be used to compare the behavior of a smart contract compiled with optimization against the behavior of the same contract compiled without optimization, thereby indicating whether or not optimizations correctly preserve behavior. This type of test is applicable to all generated and externally supplied contracts.

What kind of code can be generated?

The framework supports elementary computational operations in Solidity, such as expressions and control structures. Integer, bool, bytes and structure types and all associated operators, as well as basic function calls are included. These work on storage variables, function arguments and constants. As evidenced by the sample code shown above, the code generation approach exercises the compiler’s handling of expressions in various important ways that diverge from typical real-world smart contracts:

The whole range of supported types is used, including “odd” ones like bytes5 or int240
Computations involve complex combinations of types and operations
Random data generation produces large and negative numbers
Unusually long functions with complex statements can be generated
Generated code is designed to handle generated input well

Thus, unusually diverse and complex computations can stress the compiler’s resource allocation, its realization of arithmetic operators — particularly for edge cases -, and data flow and control flow analysis.

Notable unsupported language features are blockchain-specific operations such as value transfer, string, array and mapping types, and other Solidity constructs such as memory variables, inheritance, modifiers, tuples, or interactions between multiple contracts.

What are our findings?

We exercised the compiler in a wide variety of test constellations — involving generated smart contracts of various types, as well as a real-world smart contract test suite supplied by ChainSecurity AG. Tests were conducted with and without semantically-equivalent transformations, and with and without optimizations, using various different code generation parameters. We focused on various solc versions from the 0.4 and 0.5 lines, using the original compiler binaries and the solcjs variant.

As listed in the introduction, our efforts have detected 2 solc compiler bugs and 2 Ganache CI execution bugs so far. Of the solc bugs, only the exponentiation bug appears to be significant, and has been deemed by the compiler developers as unlikely to have occurred in real world smart contracts. While the limitations of SOLTIX code generation must be kept in mind, the results inspire confidence in the correctness of solc’s compilation of foundational computational operations, basic function calling conventions and storage organization.

As another encouraging signal on the part of SOLTIX, we quickly detected the documented breaking change on signed shift right operations rounding to negative infinity as a bug candidate as soon as we started testing with solc 0.5. On a side note, the language documentation exhibits some of the many immaturities in the young ecosystem: we were often compelled to reverse-engineer compiler behavior due to holes in the documentation, which exposes SOLTIX to the risk of copying incorrect language semantics.

The absence of any optimization-related solc bugs is another key insight from our test process. We performed Different Optimization Levels testing on various types of generated contracts, applied the semantically-equivalent transformations technique with and without optimization, and executed generated contracts with built-in checks with optimization. In no instance did we detect an unexpected result that could be attributed to compiler optimizations, although one crash in the Ganache CLI EVM implementation appears to be more likely to occur with optimized code. Similarly, our implementation of the semantically-equivalent transformations technique did not find any bugs, which is consistent with a lack of overly aggressive optimization in solc.

Our tests touched on the experimental ABIEncoder2 and its novel support for structure function arguments as well, but did not detect any correctness problems in this area either. A significant performance overhead was observed with this encoder, however, and performance problems limiting code size and throughput on a test system has been a common theme affecting all components during our tests: the compiler, the truffle execution environment, and admittedly our own code generator.

Work in progress / what’s next?

Within the scope of supported language constructs, there are various minor missing constructs that should be supported soon. Support for blockchain backends other than Ganache CLI — such as geth and aleth, and potentially involving eWASM — is highly desirable. Differential testing elements to compare solc-compiled program behavior against interpretation results could be expanded to more language constructs. The CONTRIBUTING.md file lists various additional ideas.

How can I contribute?

To help the SOLTIX project, you can:

Report an issue: https://github.com/eth-sri/soltix/issues
Contribute to the SOLTIX code base, available here: https://github.com/eth-sri/soltix
Join our Discord channel and discuss ideas: https://discord.gg/XKSVavS

About the ICE center

ICE is an interdisciplinary and interdepartmental research center at ETH Zurich conducting fundamental research with a strong practical component. The mission of ICE is to advance the construction of secure and reliable systems (such as blockchains, networks, data centers).