Inside Lighting Cat: A Generative AI Framework for Smart Contract Audits
One of the most comprehensive results about applying generative AI to smart contract security.
The rapid race of generative AI has sparked the imagination of web3 developers about its many applications to the blockchain space. Smart contract audits via AI agents have been among the most prominent use cases that are constantly mentioned at the intersection of AI and web3. Recently, a group of AI researchers published a paper in Nature detailing Lighting Cat, a framework to use generative AI models for smart contract audits.
Traditional threat detection methods for smart contracts include manual reviews, static analysis, fuzz testing, and formal verification. Tools such as Oyente, Mythril, Securify, Slither, and Smartcheck are widely used for this purpose. They scan contract code for common security flaws like reentrancy issues, authorization errors using tx.origin, dependencies on timestamps, and unhandled exceptions. Yet, these tools are not foolproof, often generating false positives or missing vulnerabilities due to their reliance on preset rules and a limited understanding of complex code.
Lightning Cat relies on generative AI to enhance smart contract vulnerability detection. It incorporates three advanced deep learning models: an optimized version of CodeBERT, an Optimized-LSTM, and an Optimized-CNN. These models are specifically trained to identify vulnerabilities within smart contracts. The process involves analyzing code snippets containing vulnerabilities to pinpoint critical features.
The CodeBERT model excels in its ability to understand the nuances of programming languages. It bridges the gap between natural language and programming syntax, showing significant promise in detecting software vulnerabilities. In contrast to other models like Word2Vec, FastText, and GloVe, CodeBERT has shown higher accuracy rates in this field. For smart contracts written in Solidity, an optimized version of CodeBERT is employed in this research. Alongside, CNN and LSTM models, known for their proficiency in processing text and image data and their capability to handle long text sequences, are used for comparison. Previous studies have demonstrated their effectiveness in identifying code vulnerabilities.
Lightning Cat’s development process comprises three key stages. The initial phase involves compiling and preparing a dataset of vulnerable Solidity code. The next stage is dedicated to training the three models and comparing their effectiveness. The final stage tests the chosen model against the Sodifi-benchmark dataset to evaluate its ability to accurately detect vulnerabilities in smart contracts.
The Data Collection Process
The data used in this study is derived from three primary sources, combining to form a comprehensive training set. This dataset includes 10,000 contracts from the Slither Audited Smart Contracts Dataset, 20,000 from smartbugs-wild, and an additional 1,000 contracts known for their vulnerabilities as identified by expert audits. In total, the dataset encompasses 31,000 smart contracts.
In processing this data, the research focuses on the SolidiFI-benchmark test set, which includes three static detection tools: Slither, Mythril, and Smatcheck. This set also covers four prevalent types of vulnerabilities found in smart contracts: Re-entrancy, Timestamp-Dependency, Unhandled-Exception, and tx.origin.
One challenge in handling this data is the variable length of smart contracts, which is often influenced by their complexity and functionality. Some of the more complex contracts can span several thousand tokens.
To manage this variability, the dataset is segmented into smaller parts. The approach involves dividing the data into blocks of 510 tokens each. These segments are then uniformly labeled. For instance, if a section of code demonstrating a Re-entrancy vulnerability is 2000 tokens long, it would be split into four parts, with each part comprising 510 tokens.
The Models
The current phase of this research involves the application of three distinct machine learning models: Optimized-CodeBERT, Optimized-LSTM, and Optimized-CNN. The CodeBERT model has been specifically adjusted to better suit the task of detecting vulnerabilities in smart contracts. It takes preprocessed input IDs and attention masks as its input. On the other hand, the Optimized-LSTM and Optimized-CNN models do not use the CodeBERT model for data preprocessing.
The first model, Optimized-CodeBERT, leverages the Transformer model to learn representations in code-related tasks. This study focuses on adapting CodeBERT for smart contract vulnerability detection. Based on the Transformer architecture, which includes multiple encoder layers, the model processes input data through an embedding stage before it reaches these encoders. After encoding, fully connected layers are added for classification.
The second model, Optimized-LSTM, is adept at handling sequential data, recognizing temporal dependencies and syntactic-semantic information. For detecting vulnerabilities in smart contracts, this model serializes Solidity source code, taking into account the order of statements and function calls. It understands the code’s syntax, semantics, and dependencies, providing insight into its logical structure and flow. The Optimized-LSTM model, with its gated cell mechanism, effectively addresses the challenges of vanishing or exploding gradients in long sequences, a common issue in traditional RNNs.
Finally, the third model, Optimized-CNN, is a convolutional neural network well-suited for processing two-dimensional data. In this case, the code token sequence is transformed into a matrix format. The CNN efficiently extracts local features and captures the spatial structure of the code, including syntax, relationships between code blocks, and key patterns.
The Results
The provided figure offers a comparative analysis of the recall results from different classification models. These results measure each model’s ability to correctly identify true positive samples. The comparison includes six methods: Mythril, Smartcheck, Slither, Optimized-CodeBERT, Optimized-LSTM, and Optimized-CNN. Among these, the Optimized-CodeBERT model stands out with the highest recall rate of 93.55%, which is 11.85% higher than that of Slither. This superior recall rate underlines the Optimized-CodeBERT model’s effectiveness and reliability in accurately detecting true positive samples.
In contrast, the Optimized-LSTM and Optimized-CNN models show lower recall rates, at 64.06% and 71.36% respectively. This indicates that they might face challenges or have limitations in consistently recognizing true positive samples.
Significantly, the Optimized-CodeBERT model also excels over traditional static detection tools. It achieves an impressive f1-score of 93.53%, demonstrating its strong ability to understand both the syntax and semantics of the code. This performance solidifies its position as an effective tool for auditing blockchain code.