The Anatomy of Dart Code Analysis: Understanding Key Entities

Sergey Aliyev
10 min readDec 10, 2023

--

At the dawn of the computer era, when each computer was a marvel of its time, occupying the space of an entire room, software development was a complex and expensive endeavor. Code analysis did not occupy a central place in the list of programmers problems then. The main difficulties lay in limited resources, high equipment costs, and the complexity of the programming process itself. The capabilities of that time were modest, and everything came down to manual code analysis. Programmers, preliminarily waiting their turn for strictly time-limited sessions, entered code on punch cards and ran compilers like Fortran, where successful compilation was more a matter of luck. Each failure in compilation forced them to start the entire process over again, vividly demonstrating the need for more advanced means of code analysis.

With the development and spread of computer technologies, when software and hardware became accessible not only to the academic community but also to business, a natural need arose to maintain high standards of code quality and reduce development costs. At this point, the first tools for static code analysis, or linters, appeared. These were CLI tools capable of generating detailed reports on stylistic, syntactic, and semantic problems in the code. One of the first such tools, Lint for the C language, became the forerunner of modern linters.

The integration of linters into IDEs opened a new chapter in the history of code error detection and correction. This process was further enhanced with the advent of CI pipelines, which made automatic code analysis an integral part of modern software development. These pipelines not only simplify the detection of problems in the code in real-time but also integrate code checking as one of the key stages in the process of developing quality software. In the context of modern programming languages, such as Dart, this underscores the critical role of linters in maintaining high standards of quality and efficiency of code, providing developers with tools for a more accurate and organized software creation process.

The Role of AST in Dart’s Functionality

The Abstract Syntax Tree (AST) holds a key position in the process of static code analysis in Dart, offering a clear representation of the relationships between the main nodes of the language. This concept is not exclusive to Dart; it is universally found in almost all modern programming languages: JS, Python, Java, Kotlin, etc. AST plays an important role not only in static analysis but also in the optimization of compiled code. In the context of Flutter, it is also critically important for implementing Hot reload. For a deeper dive into this topic, I recommend reading an article by Vyacheslav Egorov, as discussing these topics is beyond the scope of this article. To understand how Dart transforms code into AST, let’s follow this process step by step.

Lexical Analysis

Before your program becomes executable, it undergoes a significant transformation from raw source code to a representation understandable by the virtual machine. The first step in this process is lexical analysis, which takes a linear stream of characters as input and outputs a linear list of tokens. These tokens contain the minimal syntactic information necessary for subsequent steps: the value of the token and its type (keyword, identifier, literal, etc.). A direct analogy here would be the transformation of a stream of letters into a stream of words, ignoring comments and indentations in the code which are significant for human understanding during development but meaningless to the machine executing the code.

Let’s consider this with a specific example, suppose we have the following line of code:

final rectSquare = a * b;

The improvised linear stream of characters fed into the scanner can be represented as follows:

Linear Stream of Characters fed into the Lexical Analyzer

As a result of the lexical analyzer’s operation, a similar linear stream of tokens is obtained:

Linear Stream of tokens

At this stage, we obtain a semantically meaningless linear list of tokens or lexemes, which will subsequently be passed to the syntactic parser.

Syntactic Parsing (Scanning).

The next stage of transformation is the parsing of the linear list of tokens and understanding them in the context of the language’s grammar. The lexemes are parsed, forming a hierarchical tree that describes the grammatical structure of the token stream. This structure is justified by the nested nature of syntactic units: classes contain fields, methods, and constructors; methods consist of declarations and bodies in the form of a block or expression, which in turn are also broken down into syntactic units. Such trees are usually called abstract syntax trees, or AST for short.

It is worth mentioning that in the context of the Dart language, semantic analysis also takes place at this stage, which determines the semantic consistency with the language definitions. If during scanning it is found that the original program is syntactically incorrect or semantically defective (for example, explicit type mismatches or the absence of certain definitions), the user should receive informative messages about this so that they can correct the detected errors. Also, during semantic analysis, type casting is performed. For example, a binary arithmetic operator can be applied to a pair of an integer and a floating-point number, as a result of this operation, the integer will be converted to the type of a floating-point number.

To briefly touch on the topic of syntactic parsing in general, stepping back from the specifics of any particular language, let’s analyze an example with a simple algebraic expression:

25 + 4 / b

At first glance, nothing seems amiss, but immediately the following questions arise:

  1. How to ensure the order of operations? Indeed, each mathematical operator has its own execution priority, and sequential calculation directly makes no sense.
  2. At the scanning stage, we need to make sure that the expression itself is mathematically and syntactically valid, and if not, to issue a compilation error.
  3. How to represent the string representation of a mathematical expression in such a way that it can be calculated? For example, before executing the expression, it is necessary to represent the string representation of the number 25 as an integer, i.e., calculate the expression 2 * 10 + 5.
  4. Is the variable or constant b declared at all, and where? Is it compatible with the binary arithmetic multiplication operator?
  5. How feasible is it to work with different types of data? Obviously, the result of division can be any real number, which will subsequently need to be added to an integer.

As seen from this example, even on such a simple mathematical expression, a whole range of problems can be identified that need to be addressed when writing your own programming language or even a simple expression calculator.

In general, all the stages described above constitute the front-end part of almost any modern compiler. In the case of the Dart language, this part of the language is known as the Common Front-End, or CFE for short. You can review the source code of CFE at this GitHub link. The CFE takes the source code in Dart as input and returns Kernel binaries, which contain serialized ASTs in the .dill format. After the generation of the Kernel binaries, they are sent to the virtual machine, where they are transformed into the necessary objects in a certain way and also participate in the process of optimizing code compilation.

Now that we have briefly familiarized ourselves with how the syntactic tree is formed, we can proceed to directly study the structure of the AST and the information that the nodes store.

AST Structure

The most fundamental entity within the syntax tree is the `AstNode`. This class contains everything necessary for efficient navigation through the tree nodes: it knows about both its parent and its children. Moreover, it includes the `accept` method, which is part of the contract for applying the Visitor pattern, providing flexibility and power in AST processing.

Each AST tree node also implements the `SyntacticEntity` contract, which describes all the necessary information about the entity’s offset from the beginning of the source file (offset from the beginning of the file, number of characters, offset of the last character of the syntactic entity). This information can be useful for identifying the location of any entity within the source file, for example, for localizing a problem during file analysis for compliance with certain linter rules.

It is worth mentioning separately that a pure syntax tree node does not store any semantic information in itself. This information is contained and aggregated in the tree nodes as a separate entity — `Element`. In simple terms, an element is any definition in the source code: explicit or implicit. This entity contains all the necessary information about:

  1. The specific place where the entity definition is described;
  2. Information about annotations;
  3. Whether the definition is private or public;
  4. Information about the synthetic nature of the element. For example, if the definition is an implicitly defined constructor;
  5. The library in which the definition is made;
  6. Information about the type of definition.

All specific nodes of the AST tree in some way implement the aforementioned interfaces, while describing more specific entities in detail. For example, if you examine this file, you can find such definitions for any syntactic unit defined in Dart. For instance, the ClassDeclaration, which encapsulates all necessary semantic and syntactic information about a class.

For traversing the AST tree, the analyzer package’s API provides a set of `AstVisitors` based on the eponymous pattern. As a gentleman’s set, the following visitor implementations can be highlighted:

  1. RecursiveAstVisitor — recursively traverses all nodes of the tree in depth.
  2. BreadthFirstVisitor — recursively traverses all nodes of the tree and differs from the previous one in the order of traversal, preferring breadth-first traversal.
  3. GeneralizingAstVisitor — recursively traverses all the nodes of the tree in the same manner as RecursiveAstVisitor, but also calls visit methods for nodes that are parents of the current node.
  4. DelegatingAstVisitor — allows delegating work to other visitors, sequentially calling the visit method for each aggregated visitor.
  5. UnifyingAstVisitor — recursively traverses all the nodes of the tree in the same manner as `RecursiveAstVisitor`, but also calls the visitNode method for each node.

The last entity to be acquainted with is the `CompilationUnit`. This entity represents the main unit of analysis and is the highest-level entity. The `CompilationUnit` provides the structure for parsing source code at the file level, allowing both syntactic and semantic analysis, containing all other nodes of the AST tree.

Let’s consolidate this knowledge in practice, specifically by syntactically analyzing a certain string representation of a simple class of the following type and finding out what nodes are present here:

class Person {
final String firstName;
final String lastName;

Person(this.firstName, this.lastName);

String get fullName => '$firstName $lastName';
}

To conduct the syntax analysis, we will need the following code:

import 'package:analyzer/dart/analysis/utilities.dart';
import 'package:analyzer/dart/ast/ast.dart';
import 'package:analyzer/dart/ast/visitor.dart';
import 'package:analyzer/dart/element/element.dart';

void main() {
const sourceCode = '''
class Person {
final String firstName;
final String lastName;

Person(this.firstName, this.lastName);

String get fullName => '\$firstName \$lastName';
}
''';

// Parses the source code and get the AST without semantic information
final parseResult = parseString(content: sourceCode);

// Visits each AST node
final visitor = PlainAstVisitor();

// Starts the process of traversing the tree using the visitor
parseResult.unit.visitChildren(visitor);
}

class PlainAstVisitor extends GeneralizingAstVisitor {
// Visits each node in the tree and performs some action
@override
void visitNode(AstNode node) {
Element? element;

// With some exceptions, it is during the semantic analysis stage
// that specific elements are assigned to the definitions of certain entities.
if (node is Declaration) {
element = node.declaredElement;
}

print(
'type: ${node.runtimeType.toString()} | '
'source: ${node.toSource()} | '
'hasElement: ${element != null}',
);

// Do not forget to call the method of the parent class,
// since without this call the traversal will end on the first node
super.visitNode(node);
}
}

If you run the given code, you will notice that none of the nodes are associated with an element, which will be a hindrance to identifying the relationships between definitions. For instance, we will not be able to determine the exact location where a particular entity is defined. This happens because we have conducted purely syntactic analysis, omitting the semantic analysis stage. If we also need to link entities together, a slightly more complex code will be required:

import 'dart:io';

import 'package:analyzer/dart/analysis/analysis_context_collection.dart';
import 'package:analyzer/dart/analysis/results.dart';
import 'package:analyzer/dart/ast/ast.dart';
import 'package:analyzer/dart/ast/visitor.dart';

void main(List<String> arguments) async {
// Check if a file name argument is provided
if (arguments.isEmpty) {
// If no argument is provided, throw an error
stderr.writeln('Error: No file name provided.');
exit(1);
}

// To work with semantic analysis, we need absolute paths to files
var currentDirectory = Directory.current.path;

// The file name provided as a command line argument
String fileName = arguments[0];

// Initialize a newly created collection of analysis contexts that can
// analyze the files that are included by the list of included paths.
var collection = AnalysisContextCollection(includedPaths: [currentDirectory]);

// A representation of a body of code and the context in which
// the code is to be analyzed.
var context = collection.contextFor('$currentDirectory/$fileName');

// A consistent view of the results of analyzing one or more files
var session = context.currentSession;

// Start semantic and synctatic analysis
var result = await session.getResolvedLibrary('$currentDirectory/$fileName');

// Iterate through compilation units
if (result is ResolvedLibraryResult) {
for (var parsedUnit in result.units) {
parsedUnit.unit.visitChildren(PlainAstVisitor());
}
}
}

class PlainAstVisitor extends GeneralizingAstVisitor {
@override
void visitNode(AstNode node) {
if (node is Declaration) {

// prints the location where an entity is defined
print(node.declaredElement?.enclosingElement);
}

super.visitNode(node);
}
}

If you run this code and pass any Dart source code file as a parameter, you will find that some nodes now have elements, allowing us to conduct semantic analysis of any complexity. Through simple manipulations, we have obtained a powerful tool for analysis and familiarized ourselves with the main primitives. By combining the acquired knowledge, you can write rules for a linter of any complexity. Here, details of logging certain events are intentionally omitted, work with CLI is simplified, and the focus is on working with individual files or even raw strings.

For more advanced practice, I recommend familiarizing yourself with the analyzer_plugin API and ways to interact with the Analysis server. The developers of our favorite programming language have taken care of us and written an excellent tutorial, which contains all the necessary nuances of working with the described tools. However, I want to warn against using more than one custom plugin in production and give preference to CLI tools integrated into CI pipelines, especially if you work under the following conditions:

  • You use a development machine with less than 16 GB of memory.
  • You use a mono-repo with more than 10 pubspec.yaml and analysis_options.yaml files.

Enabling an analyzer plugin increases how much memory the analyzer uses.

--

--