Cool Stuff With Go’s AST Package Pt 1

The Package for Dissecting the Go Language

Cooper Thompson
The Startup
7 min readSep 12, 2020

--

Photo by engin akyurt on Unsplash

Let’s explore the awesome builtin packages that ship with Go, and the cool stuff you can do with them. I will pick some of the more “obscure”, “complex”, and/or special purpose Go packages, especially ones that set the language’s builtin library apart from other languages.

The first stop is the go/ast package. This package is used to explore the syntax tree representation of a Go package, and can be used to perform static analysis, code linting, metaprogramming, and anything that requires a structured interpretation of Go source code.

This walkthrough is broken up into three parts. We will explore traversing the AST tree, type assertions, extracting literal values from the code, comment extraction, and some advanced struct reflection. By the end of it, we will have a utility capable of extracting documentation from a NATS publishing microservice that details the topics and message types that it produces. The output will be similar to a Swagger specification, but for event messaging systems.

Part 1 of this article dives into the basics of the go/ast package. We will cover what an AST is, tree traversal, visitor functions, type assertions, and how the different types work.

Let’s start off simple. What is an AST, and what makes it a valuable tool?

Abstract Syntax Trees

Abstract syntax trees, or AST for short, are tree representations of the syntax in a programming language’s source code. ASTs are used as a step in compilation, and are produced by the compilers syntax analysis phase. ASTs are similar to parse trees, however they are at a higher level in that they don’t include every detail of the syntax. For example, ASTs will provide more structure and context around parenthetical groupings, if/else statements, and looping constructs. In addition to this, ASTs will remove unnecessary information such as symbols, operators, and other tokens and instead, each node will represent a specific operation within the language. ASTs provide a semantic representation of a program, or a representation with meaning and contextual information, rather than just a structured representation of the tokens/text present in the source code.

Compilers will typically perform the following first steps when compiling source code. We will stop after the syntax/semantic analysis phase as this is the level at which Go’s ast package peers into.

  1. Lexing: Given a table of all the tokens (symbols/words) that makeup the language, a lexer will tokens together and label accordingly.
  2. Syntax Analysis: Each language has specific syntax rules it must follow. Given the output from the lexer, the syntax analyzer will build a parse tree that represents the operations and syntax of the language. For example, the following tokens “2,+,2” will be grouped into a parse tree as follows:
Simple Parse Tree for Addition
  1. Syntax Analysis/Semantic Analysis: The compiler will then verify the parse tree to make sure it has meaning in the context of the language, and will typically add semantics and context to the parse tree, as well as removing unnecessary constructs. This is where the AST is created. An AST of the above expression can be seen below:
Simple AST for Addition

Notice how the AST identifies the structure as a binary expression and labels the left-hand operand, the operator, and the right-hand operand accordingly?

Go’s AST package makes it very simple to traverse the language’s abstract syntax tree. These low level operations on the language constructs can be used to create powerful developer productivity tooling, and enhance testing and quality assurance efforts.

Structured Documentation Extraction

Use Case

Being able to automatically extract documentation from source code is a powerful tool that can lead to increased developer productivity and end-user satisfaction. By using a tool that can auto-generate documentation from source code, developers can focus more on writing code and less on writing docs, while still providing knowledge for the end-user to consume. The below blog post dives into this concept on readability of source code if you are interested in the topic of self-documenting code.

Go has a built-in tool called godoc that is able to automatically extract comments out of Go source code and build a web page that documents functions and types (example here). This tool is extremely powerful despite its simplicity, but what if we wanted something that provides more context and descriptions around functionality? Let’s leverage the go/ast package to document a NATS microservice.

NATS

NATS is an event messaging system powered by Go that is lightweight, fault tolerant, and an absolute joy to use. One of the things that makes NATS so great is how simple it is to use. As such, NATS is used behind the scenes with microservices to perform message passing and request/response interactions between services. Applications can sometimes have upwards of hundreds of microservices, and so it can be easy to lose track of what microservices do what. Let’s build a utility that can parse a microservice’s source code, extract any NATS message publishers, and provide detail on the topics and messages they produce. We will essentially be building a “Swagger-like” specification for event producers. If you are interested in how NATS can be used, checkout the below post:

The Code

We will use the following sample code as our microservice we want to document. This code will publish a message on a NATS topic every 5 seconds that says “Hello world!”. Probably the least exciting microservice ever created. But hey, it could be the basis for a heartbeat topic to guarantee service uptime right? :)

Starting off Simple

The first step is determining the topic the microserivce publishes to. We will start by traversing all function calls in this file since the “Publish” method is a function call, and the topic will be provided in the first argument of the call. With the contextual and semantic awareness of Go’s AST package, this is quite easy.

Let’s break this code apart line-by-line. Starting off with the imports, we see that there are few more packages imported than initially advertised, however they are all part of Go’s compiler toolchain.

  • go/ast provides types and methods for exploring Go’s abstract syntax tree
  • go/parser provides methods for parsing source files and generating abstract syntax trees
  • go/token provides types and methods for Go’s lexer process (tokenization)

We start off by creating a FileSet. The FileSet provides facilities for tokenization and offsets (positions) in a group of source files. (https://golang.org/pkg/go/token/#FileSet).

We then use the go/parser package’s ParseFile function to parse the source code into an AST. In this case, the AST is available in the “file” variable, which is essentially a root node for the AST representing the entire source file.

The next part of the code is where things start to get interesting. The AST package exposes an “inspect” function that takes in a root node and a visitor function. A visitor function accepts a node as its parameter and returns a boolean. The inspect function allows us to traverse the entire AST of the source file, without having to worry about depth first or breadth first searching ourselves. Because let’s face it, the thing we hate the most about recursion is what we hate the most about recursion.

In the above visitor function, we attempt to cast the node to a “CallExpr” type which is a call expression. A call expression is when another function gets called from the current node. If the type matches, we print the function call.

Enhancing

Now let’s enhance our visitor function to further inspect the function being called, as well as the parameters.

A lot going here now! But if we break apart the pieces, it becomes very simple. We can see that we are still checking if the node is a call expression, but then we start going a bit further.

On line 22 we check if the function call is a selector expression — an expression that selects an identifier from a base expression in the from <expression>.<identifier>. Since our calls to the NATS encoded connection are methods on an object, we want to make sure the function we are inspecting is a selector expression.

Line 24–26 will then make sure the expression we are selecting on is an identifier (so we don’t get type casting errors), and then we check to make sure we are working with a publishing function. This is done by checking the name of our selector to make sure it is the Publish method, and we also check to make sure the identifier we are selecting on is the “ec” variable, or our encoded connection. There are some assumptions made here, like the Publish method is always called directly (not wrapped) and the variable holding our encoded connection is named “ec”.

Once we are able to confirm the function is the “ec.Publish” method, we can print out the topic. Using a bit more node traversal and type assertion, we take the first arg and cast it to a basic literal and get the value. However, we could increase the robustness here by using a type switch like the following:

Although in this example we aren’t further exploring the case when the topic is specified by an identifier (variable, constant, object, etc), it is possible to explore further down the tree and get the actual value if derived from a literal. If you want to get really fancy, you can even use an additional visitor function along with the inspect method to further explore a node and dig deeper into the tree until you hit the node that has the literal topic value.

Next Steps

Now that we have the topic of the Publisher extracted, we will move onto providing additional context around the microservice. We can accomplish this with structured code comments and the parser’s capability of associating groups of comments with Go’s various types.

--

--

Cooper Thompson
The Startup

I am a software engineer with a passion for brainstorming and ideation. I believe everybody has a set of skills that can be the seeds for future businesses.