Cool Stuff With Go’s AST Package Pt 2

Extracting Value from Comments

Cooper Thompson
The Startup
6 min readSep 20, 2020

--

Photo by Fabrice Villard on Unsplash

This is the second part of the walkthrough “Cool Stuff with Go’s AST Package.” If you have not yet had the opportunity to review the first article, I recommend starting there to become acquainted with abstract syntax trees, the Go package itself, and the facilities provided by the package for traversing Go’s AST.

What we Will be Covering

In the first article we introduced NATS and how it can be used to create a microservice that publishes regular heartbeats on a message topic. One side-effect of implementing a microservices architecture based on NATS is that all of the various topics, subscribers, publishers, and message types can become blurred, and it is easy to lose track of them.

We discussed the idea of creating a self-documenting microservice using the source code itself. The program will operate similar to other tools that produce OpenAPI specification documents from APIs (Swagger) and will be driven by the source code, without having to document anything externally to the system.

In the first article we covered extracting out the topic name from the publish method, as well as some basic traversal techniques and type switching based on the different nodes found in the abstract syntax tree. Now that we have a topic, we can move onto providing some additional context around the topic, as well as signaling the type of message the topic is producing. By the end of this article we will have additional functionality that will allow us to extract value from the following structured comment:

In the comment preceding the function call, we specify a description as well as the message type that will be published on the “heartbeat” topic. Let’s dive into how this will work.

Godoc

Go ships with a tool called “godoc” that makes easy work out of extracting comments from Go source code and automatically documenting types, structs, functions, and methods. Godoc is a perfect solution for most use cases, and is extremely useful for documenting packages and tools. An example of a web page made with godoc can be seen here.

Under the hood, godoc makes use of the go/ast package, as well as the way the Go compiler toolchain handles comments. The Go compiler will recognize comments that immediately precede (no blank lines or whitespace) declarations as belonging to that declaration, and grouping them accordingly. The CommentGroup is accessible using the associated declaration type from the go/ast package. Notice the keyword declarations. In our particular use case we are concerned with a function call, or expression. So in order to associate the function call with a comment group, we will utilize a CommentMap. The go/ast comment map will allow us to associate all nodes to comment groups, not just the declarations.

Setting up the Comment Map

A CommentMap will use the fileset (created with the parser package from previous article) and the comments extracted during the parsing to associate comments with their respective nodes in the AST. This feature makes Go very powerful for self-documenting code since it does all of the heavy lifting of attaching comments to source code functionality.

The below code creates the CommentMap, and then attempts to pull the CommentGroup for the node when we hit the right node to work with (a call to the publish function on our encoded NATS connection):

You may notice a difference from the code in the previous article that we added an additional check on the node before checking for the call expression.

Why did we add this? As mentioned, the Go AST package and Go parser are able to associate comments with their respective nodes. Even though we are looking for a specific node at a lower branch in the syntax tree, the comment might be attached to an earlier node.

Visualization of the AST tree

For the above reason, we need to put an additional check to start at the outermost level (root) of the expression we want to look at. Typically, the Go AST can be broken into either expressions or declarations at the highest level. Then, those are combinations of function calls, type declarations, variable assignments, etc.

Accounting for All Cases

So you may be thinking, “This is a very specific set of type casting and checks that will only work in certain expressions,” and that is correct. In the above set of code, we are looking for a very specific syntax and way of calling the publish function. We can change this around though, and simplify it by using the position system of our fileset and Go’s token package.

When the Go compiler performs the tokenization (lexing) phase, it also maintains the positions of each node within the source code itself. This position is what is used when alerting bugs in the code or compiler errors, such as:

We can use this position tracking feature of the Go parser to pull in the comment that ends at the line right above our function call. This means we won’t have to use the node to comment relationship, and will instead just look for the line above the publishing function call.

Position

All Go AST types that extend the base ast.Node type have a Pos( ) and a End( ) method. The Pos method will return the position of the start of the node’s source code, and the End will return the position of the last character in the node’s source code. See more here.

The return of the Pos and End methods on the nodes are an encoded representation of the token’s position. However, the fileset (what we create with the go/token package, and what is used when parsing to track tokenization and lexing) provides a method for turning this encoded position into more useful values. An example of this can be seen below:

This function will return the line number of the call to the NATS publish method. Let’s put all of this together to get the correct comment group for the publish call.

Using the Position and File CommentsTogether

Given that we have a list of all the comments parsed out of the Go file, as well as method for getting the position of a node and the associated line, let’s put the two together. We will remove the check for the expression that we added earlier on in this article so that we aren’t bound to the strict checking of the node type, and instead just look for the function call. We will also remove the comment map as we do not need to associate nodes to comments since we will be using line numbers.

Let’s rewrite the NATS publisher to look like the following, adding the comment format we showed previously so that we can get a full description and the type of message the publisher will produce:

If we run the AST utility in its current state against this new publisher, we should get the following output:

Conclusion

In this article we continued our exploration of the Go AST package by adding functionality onto our NATS publisher auto-documentation tool. We learned how the Go parser will automatically associated AST nodes and comments, and how the godoc tool makes use of this functionality. We then saw how this can require some fancy type casting and intense checking to find the outermost node that has the actual comment attached. Finally, we used the go/token package’s position tracking to instead use line numbers to pull in the associated comments.

In the final article (part 3), we will parse the above comment and use the provided type in the “Publishes:” value to retrieve the struct declaration and pull in descriptions and names for the fields. Feel free to give me a follow and stay updated when the next part drops. Let me know if you have any questions about the Go AST package or have comments on a better way to do this! I am always open to improvement, criticism, and learning more.

--

--

Cooper Thompson
The Startup

I am a software engineer with a passion for brainstorming and ideation. I believe everybody has a set of skills that can be the seeds for future businesses.